Azure Data Lake Storage Gen2

Datafly Signal writes batches of first-party events to ADLS Gen2 — a Blob Storage account with hierarchical namespace enabled — ideal for Spark, Databricks, and Synapse workloads.

Prerequisites

Before configuring Azure Data Lake Storage Gen2 in Signal, you need an Azure account with a Storage Account that has hierarchical namespace enabled, and a container for storing events.

Create an Azure Account

If you don’t already have one, sign up at azure.microsoft.com.

Create a Storage Account with Hierarchical Namespace

  1. In the Azure portal, search for Storage accounts and click Create.
  2. Select your Subscription and Resource group.
  3. Enter a Storage account name (e.g. dataflydatalake). Names must be globally unique, 3-24 characters, lowercase letters and numbers only.
  4. Select a Region close to your Signal deployment.
  5. Choose Performance: Standard (recommended).
  6. On the Advanced tab, under Data Lake Storage Gen2, enable Hierarchical namespace. This is required for Data Lake Gen2 functionality.
  7. Click Review + Create > Create.
⚠️

Hierarchical namespace cannot be enabled after creation. You must enable it during Storage Account creation. If you have an existing account without it, you will need to create a new one.

Create a Container

  1. Open your Storage Account in the Azure portal.
  2. In the left sidebar, click Containers (under Data storage).
  3. Click + Container.
  4. Enter a Name (e.g. datafly-events).
  5. Set Public access level to Private.
  6. Click Create.

Get the Access Key

  1. In your Storage Account, go to Security + networking > Access keys.
  2. Click Show next to Key1.
  3. Copy the Key value (not the connection string).
⚠️

Store the access key securely. For production environments, consider using Azure RBAC with a service principal or managed identity instead of account keys.

Configuration

FieldTypeRequiredDescription
account_namestringYesThe Azure Storage account name. Also accepts storage_account.
containerstringYesThe Data Lake container (filesystem) name. Also accepts container_name.
account_keysecretOne of theseStorage account access key. Also accepts connection_string.
sas_tokensecretOne of theseOptional SAS token granting Write + Create on the container — preferred over the account key for production.
prefixstringNoOptional path prefix for stored objects (e.g. events/). Include a trailing slash.

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Azure Data Lake Storage Gen2 or filter by Cloud Storage.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/azure_data_lake/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Azure Data Lake Storage Gen2",
    "variant": "default",
    "config": {
      "account_name": "dataflydatalake",
      "container": "datafly-events",
      "prefix": "events/",
      "account_key": "YOUR_STORAGE_ACCOUNT_KEY"
    },
    "delivery_mode": "server_side"
  }'

Schema

Signal writes batched events as newline-delimited JSON (NDJSON) files. Each line is one event using the canonical envelope (event_id, type, event, anonymous_id, user_id, timestamp, received_at, sent_at, context, properties, traits, source_id, integration_id).

Files are written under <prefix>YYYY/MM/DD/HH/<batch-uuid>.json.gz, mirroring a Hive-style partition layout for easy mounting in Spark/Synapse.

ADLS Gen2 is a first-party destination in your own Azure subscription. The default blueprint forwards all events. Apply consent-aware filtering or partitioning via pipeline transforms on context.consent if needed.

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. In the Azure portal, open your Storage Account > Containers > your container.
  3. Navigate through the path prefix directory structure to find event files.
  4. Download a file and inspect the event data.
  5. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in the containerVerify the account name, container name, and access key are correct.
AuthenticationFailedThe access key is invalid or has been regenerated. Get the current key from Access Keys.
ContainerNotFoundThe container does not exist. Verify the container name in the Azure portal.
AccountRequiresHttpsEnsure the connection is using HTTPS. Check the storage account settings.
Files appearing but emptyCheck the batch settings. Events are buffered before flushing to files.
Hierarchical namespace errorsVerify that hierarchical namespace was enabled on the Storage Account. It cannot be enabled after creation.
Network access deniedCheck the Storage Account firewall rules under Networking > Firewalls and virtual networks. Add Signal’s IP addresses if firewall is enabled.

Visit Azure Data Lake Storage Gen2 documentation for full API reference, access control, and performance tuning guides.

See also