Databricks (Zerobus Ingest)
Datafly Signal streams events directly into a Databricks Delta Lake table via the Zerobus Ingest API — a serverless, real-time ingestion path that bypasses SQL warehouses. Pick this mode for the lowest-latency delivery and to keep SQL warehouses free for analytics workloads.
Zerobus is currently in public preview on Databricks. Availability and API behaviour may change. For the stable SQL-based path, see Databricks (SQL Warehouse).
Prerequisites
Before configuring this integration you need:
- A Databricks workspace with Unity Catalog
- Zerobus Ingest enabled on the workspace (request via your Databricks account team if not enabled)
- A target Delta Lake table
- A service principal with OAuth client credentials and
MODIFYpermission on the table
Enable Zerobus on the Workspace
Zerobus is a serverless ingestion service. If you don’t see it in your workspace, contact your Databricks account team to enable the public preview.
Create the Target Table
In the SQL editor or a notebook, create a Delta Lake table with columns that match the fields produced by the default blueprint:
CREATE CATALOG IF NOT EXISTS datafly;
CREATE SCHEMA IF NOT EXISTS datafly.events;
CREATE TABLE IF NOT EXISTS datafly.events.stream (
event_name STRING,
event_type STRING,
message_id STRING,
user_id STRING,
anonymous_id STRING,
event_timestamp TIMESTAMP,
ip_address STRING,
user_agent STRING,
locale STRING,
timezone STRING,
page_url STRING,
page_referrer STRING,
source_id STRING
)
USING DELTA;You can add additional columns to match any custom properties produced by your blueprint (e.g. page_title, search_query).
Create a Service Principal and OAuth Credentials
- In the Databricks Account Console, go to User management > Service principals.
- Click Add service principal and give it a name (e.g.
datafly-signal-zerobus). - Add the service principal to your workspace.
- In the workspace, generate OAuth client credentials for the service principal:
- Settings > Identity and access > Service principals > select the SP > OAuth secrets > Generate secret.
- Copy the Client ID and Client secret — the secret is shown once.
Grant Permissions on the Table
In the SQL editor, grant the service principal access:
GRANT USE CATALOG ON CATALOG datafly TO `<service_principal_application_id>`;
GRANT USE SCHEMA ON SCHEMA datafly.events TO `<service_principal_application_id>`;
GRANT SELECT, MODIFY ON TABLE datafly.events.stream TO `<service_principal_application_id>`;Configuration
| Field | Type | Required | Description |
|---|---|---|---|
workspace_url | string | Yes | Workspace URL (e.g. https://abc-12345.cloud.databricks.com). |
client_id | string | Yes | OAuth client ID for the service principal. |
client_secret | secret | Yes | OAuth client secret for the service principal. |
catalog | string | Yes | The Unity Catalog name. |
schema | string | Yes | The schema within the catalog. |
table | string | Yes | The Delta table receiving events. |
Authentication uses OAuth 2.0 client credentials against {{workspace_url}}/oidc/v1/token with scope all-apis. Events are POSTed in batches to /api/2.0/streaming-ingestion/ingest.
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Databricks (Zerobus Ingest) or filter by Warehouse.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with the default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/databricks_zerobus/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Databricks (Zerobus Ingest)",
"variant": "default",
"config": {
"workspace_url": "https://abc-12345.cloud.databricks.com",
"client_id": "00000000-0000-0000-0000-000000000000",
"client_secret": "YOUR_CLIENT_SECRET",
"catalog": "datafly",
"schema": "events",
"table": "stream"
},
"delivery_mode": "server_side"
}'Schema
The default blueprint maps Signal’s canonical event envelope to a column-per-field shape suited to Delta Lake. Every event becomes one row with the columns below:
| Column | Source | Notes |
|---|---|---|
event_name | event | Event name (snake_case). |
event_type | type | track, page, identify, etc. |
message_id | message_id | Unique per event. |
user_id | user_id | Logged-in user identifier (nullable). |
anonymous_id | anonymous_id | First-party visitor identifier. |
event_timestamp | timestamp | Client event time. |
ip_address | context.ip | Client IP. |
user_agent | context.user_agent | Client user agent. |
locale | context.locale | Browser locale. |
timezone | context.timezone | Browser timezone. |
page_url | context.page.url | Page URL. |
page_referrer | context.page.referrer | Referrer URL. |
source_id | source_id | Pipeline source identifier. |
Per-event types add their own columns (e.g. page adds page_title; Products Searched adds search_query). You can extend or override mappings in your pipeline blueprint.
Consent
Databricks is a first-party destination under your control. The default blueprint forwards all events. Apply consent filtering in pipeline transforms, or via downstream views over the context columns if your governance requires it.
Testing
- Enable the integration in Signal and trigger a test event on your website.
- In the Databricks SQL editor, query the target table:
SELECT * FROM datafly.events.stream
ORDER BY event_timestamp DESC
LIMIT 10;- Verify rows are appearing — Zerobus typically commits within a few seconds.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
401 Unauthorized from /oidc/v1/token | Client ID or client secret incorrect, or the OAuth secret has been rotated. Generate a new secret. |
403 Forbidden on ingest | The service principal lacks MODIFY on the table. Grant USE CATALOG, USE SCHEMA, and MODIFY. |
404 Not Found on ingest | Verify the fully-qualified table path catalog.schema.table. Check that Zerobus is enabled on the workspace. |
| Events not appearing | Confirm the column names in the table match the blueprint targets exactly. Add missing columns or update the blueprint mappings. |
400 schema-mismatch errors | The blueprint is emitting a column that doesn’t exist on the table (or has a different type). Add the column or remove the mapping. |
| Connection timeout | Ensure Signal can reach the workspace URL on port 443. Check network policies and IP allowlists. |
Visit Databricks Zerobus documentation for the latest API reference and preview status.