IntegrationsData WarehousesDatabricks (SQL Warehouse)

Databricks (SQL Warehouse)

Datafly Signal writes events into a Databricks Delta Lake table using the SQL Statement Execution API, executed against a SQL warehouse. Pick this mode when you already have a running SQL warehouse and want batch-friendly inserts into Unity Catalog.

For lower-latency real-time ingestion, see Databricks (Zerobus Ingest), which bypasses SQL warehouses and writes directly via the Zerobus streaming API.

Prerequisites

Before configuring this integration, you need a Databricks workspace with Unity Catalog, a SQL warehouse, a target table, and a personal access token (or service principal token).

Create a Databricks Workspace

  1. Sign up at databricks.com or provision a workspace through your cloud provider (AWS, Azure, or GCP).
  2. Complete the workspace setup wizard.
  3. Note the Workspace URL (e.g. https://abc-12345.cloud.databricks.com).

Create a SQL Warehouse

  1. In the Databricks workspace, go to SQL Warehouses in the left sidebar.
  2. Click Create SQL warehouse.
  3. Enter a name (e.g. datafly-warehouse).
  4. Choose the Size (2X-Small is sufficient for low to moderate event volume).
  5. Set Auto stop to manage cost.
  6. Click Create.
  7. Note the Warehouse ID from the warehouse details page (visible in the URL and under Connection details > HTTP path — the last segment).

Create a Catalog, Schema, and Table

In the SQL editor or a notebook, run:

CREATE CATALOG IF NOT EXISTS datafly;
CREATE SCHEMA IF NOT EXISTS datafly.events;
 
CREATE TABLE IF NOT EXISTS datafly.events.raw (
  event_id STRING NOT NULL,
  type STRING,
  event STRING,
  anonymous_id STRING,
  user_id STRING,
  timestamp TIMESTAMP,
  received_at TIMESTAMP,
  sent_at TIMESTAMP,
  context STRING,
  properties STRING,
  traits STRING,
  source_id STRING,
  integration_id STRING
);

Databricks stores data in Delta Lake format by default, providing ACID transactions, time travel, and schema evolution.

Generate a Personal Access Token

  1. Click your profile icon (top right) > Settings.
  2. Go to Developer > Access tokens.
  3. Click Generate new token.
  4. Enter a Comment (e.g. Datafly Signal) and a Lifetime.
  5. Click Generate.
  6. Copy the token immediately — it is only shown once.
⚠️

For production deployments, prefer a service principal token over a personal access token. Grant the service principal USE CATALOG, USE SCHEMA, and MODIFY on the target table via Unity Catalog.

Configuration

FieldTypeRequiredDescription
workspace_urlstringYesWorkspace URL (e.g. https://abc-12345.cloud.databricks.com).
access_tokensecretYesPersonal access token or service principal token.
warehouse_idstringYesThe SQL warehouse ID to execute statements against.
catalogstringYesThe Unity Catalog name.
schemastringYesThe schema within the catalog.
tablestringYesThe target table name.

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Databricks (SQL Warehouse) or filter by Warehouse.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/databricks_sql/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Databricks (SQL Warehouse)",
    "variant": "default",
    "config": {
      "workspace_url": "https://abc-12345.cloud.databricks.com",
      "access_token": "dapi1234567890abcdef",
      "warehouse_id": "abc123def456",
      "catalog": "datafly",
      "schema": "events",
      "table": "raw"
    },
    "delivery_mode": "server_side"
  }'

Schema

Signal writes the standard event envelope. The default table definition uses:

ColumnDatabricks typeNotes
event_idSTRING NOT NULLUnique per event.
typeSTRINGtrack, page, identify, etc.
eventSTRINGEvent name (snake_case).
anonymous_idSTRINGFirst-party visitor identifier.
user_idSTRINGLogged-in user identifier (nullable).
timestampTIMESTAMPClient event time.
received_atTIMESTAMPTime Signal received the event.
sent_atTIMESTAMPTime the row was delivered.
contextSTRINGJSON document — page, device, consent metadata.
propertiesSTRINGJSON document — custom event properties.
traitsSTRINGJSON document — user traits.
source_idSTRINGPipeline source identifier.
integration_idSTRINGSignal integration identifier.

Query JSON columns with : notation or parse_json() (e.g. properties:total::DOUBLE). For native JSON typing, declare columns as VARIANT (Databricks Runtime 15.3+).

Databricks is a first-party destination under your control. The default blueprint forwards all events. Apply consent filtering in pipeline transforms, or use downstream views that gate on context.consent if your governance requires it.

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. In the Databricks SQL editor, query the target table:
SELECT * FROM datafly.events.raw ORDER BY timestamp DESC LIMIT 10;
  1. Verify that event rows are appearing with correct data.
  2. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in the tableVerify the workspace URL, catalog, schema, table, and warehouse ID.
Unauthorized (401)The access token is invalid or expired. Generate a new token.
Forbidden (403)The user lacks MODIFY on the table. Grant Unity Catalog permissions: USE CATALOG, USE SCHEMA, MODIFY.
Warehouse not runningThe SQL warehouse auto-stopped. It will auto-start on the next request — the first event may have higher latency.
TABLE_OR_VIEW_NOT_FOUNDVerify the fully-qualified path catalog.schema.table.
Connection timeoutEnsure Signal can reach the workspace URL on port 443. Check network policies and IP allowlists.
Token expiredRotate the personal access token or refresh the service principal token.

Visit Databricks SQL Statement Execution API docs for endpoint reference and rate-limit guidance.

See also