Azure Data Lake Storage Gen2
Datafly Signal writes batches of first-party events to ADLS Gen2 — a Blob Storage account with hierarchical namespace enabled — ideal for Spark, Databricks, and Synapse workloads.
Prerequisites
Before configuring Azure Data Lake Storage Gen2 in Signal, you need an Azure account with a Storage Account that has hierarchical namespace enabled, and a container for storing events.
Create an Azure Account
If you don’t already have one, sign up at azure.microsoft.com.
Create a Storage Account with Hierarchical Namespace
- In the Azure portal, search for Storage accounts and click Create.
- Select your Subscription and Resource group.
- Enter a Storage account name (e.g.
dataflydatalake). Names must be globally unique, 3-24 characters, lowercase letters and numbers only. - Select a Region close to your Signal deployment.
- Choose Performance: Standard (recommended).
- On the Advanced tab, under Data Lake Storage Gen2, enable Hierarchical namespace. This is required for Data Lake Gen2 functionality.
- Click Review + Create > Create.
Hierarchical namespace cannot be enabled after creation. You must enable it during Storage Account creation. If you have an existing account without it, you will need to create a new one.
Create a Container
- Open your Storage Account in the Azure portal.
- In the left sidebar, click Containers (under Data storage).
- Click + Container.
- Enter a Name (e.g.
datafly-events). - Set Public access level to Private.
- Click Create.
Get the Access Key
- In your Storage Account, go to Security + networking > Access keys.
- Click Show next to Key1.
- Copy the Key value (not the connection string).
Store the access key securely. For production environments, consider using Azure RBAC with a service principal or managed identity instead of account keys.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
account_name | string | Yes | The Azure Storage account name. Also accepts storage_account. |
container | string | Yes | The Data Lake container (filesystem) name. Also accepts container_name. |
account_key | secret | One of these | Storage account access key. Also accepts connection_string. |
sas_token | secret | One of these | Optional SAS token granting Write + Create on the container — preferred over the account key for production. |
prefix | string | No | Optional path prefix for stored objects (e.g. events/). Include a trailing slash. |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Azure Data Lake Storage Gen2 or filter by Cloud Storage.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/azure_data_lake/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Azure Data Lake Storage Gen2",
"variant": "default",
"config": {
"account_name": "dataflydatalake",
"container": "datafly-events",
"prefix": "events/",
"account_key": "YOUR_STORAGE_ACCOUNT_KEY"
},
"delivery_mode": "server_side"
}'Schema
Signal writes batched events as newline-delimited JSON (NDJSON) files. Each line is one event using the canonical envelope (event_id, type, event, anonymous_id, user_id, timestamp, received_at, sent_at, context, properties, traits, source_id, integration_id).
Files are written under <prefix>YYYY/MM/DD/HH/<batch-uuid>.json.gz, mirroring a Hive-style partition layout for easy mounting in Spark/Synapse.
Consent
ADLS Gen2 is a first-party destination in your own Azure subscription. The default blueprint forwards all events. Apply consent-aware filtering or partitioning via pipeline transforms on context.consent if needed.
Testing
- Enable the integration in Signal and trigger a test event on your website.
- In the Azure portal, open your Storage Account > Containers > your container.
- Navigate through the path prefix directory structure to find event files.
- Download a file and inspect the event data.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in the container | Verify the account name, container name, and access key are correct. |
AuthenticationFailed | The access key is invalid or has been regenerated. Get the current key from Access Keys. |
ContainerNotFound | The container does not exist. Verify the container name in the Azure portal. |
AccountRequiresHttps | Ensure the connection is using HTTPS. Check the storage account settings. |
| Files appearing but empty | Check the batch settings. Events are buffered before flushing to files. |
| Hierarchical namespace errors | Verify that hierarchical namespace was enabled on the Storage Account. It cannot be enabled after creation. |
| Network access denied | Check the Storage Account firewall rules under Networking > Firewalls and virtual networks. Add Signal’s IP addresses if firewall is enabled. |
Visit Azure Data Lake Storage Gen2 documentation for full API reference, access control, and performance tuning guides.