Databricks Lakehouse

Datafly Signal delivers events to Databricks for unified analytics, data engineering, and machine learning on a lakehouse platform with Delta Lake storage.

⚠️

This integration is currently in beta. Configuration and behaviour may change.

Prerequisites

Before configuring Databricks in Signal, you need a Databricks workspace with a Unity Catalog, a table, a SQL warehouse, and a personal access token.

Create a Databricks Workspace

  1. Sign up at databricks.com or provision a workspace through your cloud provider (AWS, Azure, or GCP).
  2. Complete the workspace setup wizard.
  3. Note the Workspace URL (e.g. https://abc-12345.cloud.databricks.com).

Create a SQL Warehouse

  1. In the Databricks workspace, go to SQL Warehouses in the left sidebar.
  2. Click Create SQL warehouse.
  3. Enter a name (e.g. datafly-warehouse).
  4. Choose the Size (2X-Small is sufficient for low to moderate event volume).
  5. Set Auto stop to save costs when idle.
  6. Click Create.
  7. Note the Warehouse ID from the warehouse details page (under Connection details > HTTP path, it is the last segment).

Create a Catalog, Schema, and Table

  1. In the SQL editor or a notebook, run:
-- Create catalog (if not using an existing one)
CREATE CATALOG IF NOT EXISTS datafly;
 
-- Create schema
CREATE SCHEMA IF NOT EXISTS datafly.events;
 
-- Create table
CREATE TABLE IF NOT EXISTS datafly.events.raw (
  event_id STRING NOT NULL,
  type STRING,
  event STRING,
  anonymous_id STRING,
  user_id STRING,
  timestamp TIMESTAMP,
  received_at TIMESTAMP,
  sent_at TIMESTAMP,
  context STRING,
  properties STRING,
  traits STRING,
  source_id STRING,
  integration_id STRING
);

Databricks stores data in Delta Lake format by default, which provides ACID transactions, time travel, and schema evolution.

Generate a Personal Access Token

  1. In the Databricks workspace, click your profile icon (top right) > Settings.
  2. Go to Developer > Access tokens.
  3. Click Generate new token.
  4. Enter a Comment (e.g. Datafly Signal) and set a Lifetime (or leave blank for no expiry).
  5. Click Generate.
  6. Copy the token immediately — it is only shown once.
⚠️

Store the access token securely. For production, consider using a Databricks service principal with OAuth instead of a personal access token.

Configuration

FieldTypeRequiredDescription
workspace_urlstringYesThe Databricks workspace URL (e.g. https://abc-12345.cloud.databricks.com).
access_tokensecretYesPersonal access token or service principal token for authentication.
catalogstringYesThe Unity Catalog name.
schemastringYesThe schema name within the catalog.
tablestringYesThe target table name to insert rows into.
warehouse_idstringYesThe SQL warehouse ID to execute queries against.

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Databricks or filter by Cloud Storage.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/databricks/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Databricks Lakehouse",
    "variant": "default",
    "config": {
      "workspace_url": "https://abc-12345.cloud.databricks.com",
      "access_token": "dapi1234567890abcdef",
      "catalog": "datafly",
      "schema": "events",
      "table": "raw",
      "warehouse_id": "abc123def456"
    },
    "delivery_mode": "server_side"
  }'

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. In the Databricks SQL editor, query the target table:
SELECT * FROM datafly.events.raw ORDER BY timestamp DESC LIMIT 10;
  1. Verify that event rows are appearing with correct data.
  2. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in the tableVerify the workspace URL, catalog, schema, table, and warehouse ID are correct.
Unauthorized (401)The access token is invalid or expired. Generate a new token.
Forbidden (403)The token user lacks INSERT permission on the table. Grant MODIFY on the table via Unity Catalog permissions.
Warehouse not runningThe SQL warehouse may have auto-stopped. It will auto-start on the next request, but the first event may have higher latency.
TABLE_OR_VIEW_NOT_FOUNDThe catalog, schema, or table does not exist. Verify the full table path: catalog.schema.table.
Connection timeoutEnsure Signal can reach the Databricks workspace URL. Check network policies and firewall rules.
Token expiredGenerate a new personal access token or refresh the service principal OAuth token.

Visit Databricks documentation for full SQL reference, Unity Catalog management, and Delta Lake best practices.