Databricks Lakehouse
Datafly Signal delivers events to Databricks for unified analytics, data engineering, and machine learning on a lakehouse platform with Delta Lake storage.
This integration is currently in beta. Configuration and behaviour may change.
Prerequisites
Before configuring Databricks in Signal, you need a Databricks workspace with a Unity Catalog, a table, a SQL warehouse, and a personal access token.
Create a Databricks Workspace
- Sign up at databricks.com or provision a workspace through your cloud provider (AWS, Azure, or GCP).
- Complete the workspace setup wizard.
- Note the Workspace URL (e.g.
https://abc-12345.cloud.databricks.com).
Create a SQL Warehouse
- In the Databricks workspace, go to SQL Warehouses in the left sidebar.
- Click Create SQL warehouse.
- Enter a name (e.g.
datafly-warehouse). - Choose the Size (2X-Small is sufficient for low to moderate event volume).
- Set Auto stop to save costs when idle.
- Click Create.
- Note the Warehouse ID from the warehouse details page (under Connection details > HTTP path, it is the last segment).
Create a Catalog, Schema, and Table
- In the SQL editor or a notebook, run:
-- Create catalog (if not using an existing one)
CREATE CATALOG IF NOT EXISTS datafly;
-- Create schema
CREATE SCHEMA IF NOT EXISTS datafly.events;
-- Create table
CREATE TABLE IF NOT EXISTS datafly.events.raw (
event_id STRING NOT NULL,
type STRING,
event STRING,
anonymous_id STRING,
user_id STRING,
timestamp TIMESTAMP,
received_at TIMESTAMP,
sent_at TIMESTAMP,
context STRING,
properties STRING,
traits STRING,
source_id STRING,
integration_id STRING
);Databricks stores data in Delta Lake format by default, which provides ACID transactions, time travel, and schema evolution.
Generate a Personal Access Token
- In the Databricks workspace, click your profile icon (top right) > Settings.
- Go to Developer > Access tokens.
- Click Generate new token.
- Enter a Comment (e.g.
Datafly Signal) and set a Lifetime (or leave blank for no expiry). - Click Generate.
- Copy the token immediately — it is only shown once.
Store the access token securely. For production, consider using a Databricks service principal with OAuth instead of a personal access token.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
workspace_url | string | Yes | The Databricks workspace URL (e.g. https://abc-12345.cloud.databricks.com). |
access_token | secret | Yes | Personal access token or service principal token for authentication. |
catalog | string | Yes | The Unity Catalog name. |
schema | string | Yes | The schema name within the catalog. |
table | string | Yes | The target table name to insert rows into. |
warehouse_id | string | Yes | The SQL warehouse ID to execute queries against. |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Databricks or filter by Cloud Storage.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/databricks/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Databricks Lakehouse",
"variant": "default",
"config": {
"workspace_url": "https://abc-12345.cloud.databricks.com",
"access_token": "dapi1234567890abcdef",
"catalog": "datafly",
"schema": "events",
"table": "raw",
"warehouse_id": "abc123def456"
},
"delivery_mode": "server_side"
}'Testing
- Enable the integration in Signal and trigger a test event on your website.
- In the Databricks SQL editor, query the target table:
SELECT * FROM datafly.events.raw ORDER BY timestamp DESC LIMIT 10;- Verify that event rows are appearing with correct data.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in the table | Verify the workspace URL, catalog, schema, table, and warehouse ID are correct. |
Unauthorized (401) | The access token is invalid or expired. Generate a new token. |
Forbidden (403) | The token user lacks INSERT permission on the table. Grant MODIFY on the table via Unity Catalog permissions. |
| Warehouse not running | The SQL warehouse may have auto-stopped. It will auto-start on the next request, but the first event may have higher latency. |
TABLE_OR_VIEW_NOT_FOUND | The catalog, schema, or table does not exist. Verify the full table path: catalog.schema.table. |
| Connection timeout | Ensure Signal can reach the Databricks workspace URL. Check network policies and firewall rules. |
| Token expired | Generate a new personal access token or refresh the service principal OAuth token. |
Visit Databricks documentation for full SQL reference, Unity Catalog management, and Delta Lake best practices.