Databricks (SQL Warehouse)
Datafly Signal writes events into a Databricks Delta Lake table using the SQL Statement Execution API, executed against a SQL warehouse. Pick this mode when you already have a running SQL warehouse and want batch-friendly inserts into Unity Catalog.
For lower-latency real-time ingestion, see Databricks (Zerobus Ingest), which bypasses SQL warehouses and writes directly via the Zerobus streaming API.
Prerequisites
Before configuring this integration, you need a Databricks workspace with Unity Catalog, a SQL warehouse, a target table, and a personal access token (or service principal token).
Create a Databricks Workspace
- Sign up at databricks.com or provision a workspace through your cloud provider (AWS, Azure, or GCP).
- Complete the workspace setup wizard.
- Note the Workspace URL (e.g.
https://abc-12345.cloud.databricks.com).
Create a SQL Warehouse
- In the Databricks workspace, go to SQL Warehouses in the left sidebar.
- Click Create SQL warehouse.
- Enter a name (e.g.
datafly-warehouse). - Choose the Size (2X-Small is sufficient for low to moderate event volume).
- Set Auto stop to manage cost.
- Click Create.
- Note the Warehouse ID from the warehouse details page (visible in the URL and under Connection details > HTTP path — the last segment).
Create a Catalog, Schema, and Table
In the SQL editor or a notebook, run:
CREATE CATALOG IF NOT EXISTS datafly;
CREATE SCHEMA IF NOT EXISTS datafly.events;
CREATE TABLE IF NOT EXISTS datafly.events.raw (
event_id STRING NOT NULL,
type STRING,
event STRING,
anonymous_id STRING,
user_id STRING,
timestamp TIMESTAMP,
received_at TIMESTAMP,
sent_at TIMESTAMP,
context STRING,
properties STRING,
traits STRING,
source_id STRING,
integration_id STRING
);Databricks stores data in Delta Lake format by default, providing ACID transactions, time travel, and schema evolution.
Generate a Personal Access Token
- Click your profile icon (top right) > Settings.
- Go to Developer > Access tokens.
- Click Generate new token.
- Enter a Comment (e.g.
Datafly Signal) and a Lifetime. - Click Generate.
- Copy the token immediately — it is only shown once.
For production deployments, prefer a service principal token over a personal access token. Grant the service principal USE CATALOG, USE SCHEMA, and MODIFY on the target table via Unity Catalog.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
workspace_url | string | Yes | Workspace URL (e.g. https://abc-12345.cloud.databricks.com). |
access_token | secret | Yes | Personal access token or service principal token. |
warehouse_id | string | Yes | The SQL warehouse ID to execute statements against. |
catalog | string | Yes | The Unity Catalog name. |
schema | string | Yes | The schema within the catalog. |
table | string | Yes | The target table name. |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Databricks (SQL Warehouse) or filter by Warehouse.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/databricks_sql/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Databricks (SQL Warehouse)",
"variant": "default",
"config": {
"workspace_url": "https://abc-12345.cloud.databricks.com",
"access_token": "dapi1234567890abcdef",
"warehouse_id": "abc123def456",
"catalog": "datafly",
"schema": "events",
"table": "raw"
},
"delivery_mode": "server_side"
}'Schema
Signal writes the standard event envelope. The default table definition uses:
| Column | Databricks type | Notes |
|---|---|---|
event_id | STRING NOT NULL | Unique per event. |
type | STRING | track, page, identify, etc. |
event | STRING | Event name (snake_case). |
anonymous_id | STRING | First-party visitor identifier. |
user_id | STRING | Logged-in user identifier (nullable). |
timestamp | TIMESTAMP | Client event time. |
received_at | TIMESTAMP | Time Signal received the event. |
sent_at | TIMESTAMP | Time the row was delivered. |
context | STRING | JSON document — page, device, consent metadata. |
properties | STRING | JSON document — custom event properties. |
traits | STRING | JSON document — user traits. |
source_id | STRING | Pipeline source identifier. |
integration_id | STRING | Signal integration identifier. |
Query JSON columns with : notation or parse_json() (e.g. properties:total::DOUBLE). For native JSON typing, declare columns as VARIANT (Databricks Runtime 15.3+).
Consent
Databricks is a first-party destination under your control. The default blueprint forwards all events. Apply consent filtering in pipeline transforms, or use downstream views that gate on context.consent if your governance requires it.
Testing
- Enable the integration in Signal and trigger a test event on your website.
- In the Databricks SQL editor, query the target table:
SELECT * FROM datafly.events.raw ORDER BY timestamp DESC LIMIT 10;- Verify that event rows are appearing with correct data.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in the table | Verify the workspace URL, catalog, schema, table, and warehouse ID. |
Unauthorized (401) | The access token is invalid or expired. Generate a new token. |
Forbidden (403) | The user lacks MODIFY on the table. Grant Unity Catalog permissions: USE CATALOG, USE SCHEMA, MODIFY. |
| Warehouse not running | The SQL warehouse auto-stopped. It will auto-start on the next request — the first event may have higher latency. |
TABLE_OR_VIEW_NOT_FOUND | Verify the fully-qualified path catalog.schema.table. |
| Connection timeout | Ensure Signal can reach the workspace URL on port 443. Check network policies and IP allowlists. |
| Token expired | Rotate the personal access token or refresh the service principal token. |
Visit Databricks SQL Statement Execution API docs for endpoint reference and rate-limit guidance.