Azure Data Lake Storage Gen2

Datafly Signal delivers events to Azure Data Lake Storage Gen2 for scalable, cost-effective data lake storage with hierarchical namespace support for analytics frameworks like Spark, Databricks, and Synapse.

Prerequisites

Before configuring Azure Data Lake Storage Gen2 in Signal, you need an Azure account with a Storage Account that has hierarchical namespace enabled, and a container for storing events.

Create an Azure Account

If you don’t already have one, sign up at azure.microsoft.com.

Create a Storage Account with Hierarchical Namespace

  1. In the Azure portal, search for Storage accounts and click Create.
  2. Select your Subscription and Resource group.
  3. Enter a Storage account name (e.g. dataflydatalake). Names must be globally unique, 3-24 characters, lowercase letters and numbers only.
  4. Select a Region close to your Signal deployment.
  5. Choose Performance: Standard (recommended).
  6. On the Advanced tab, under Data Lake Storage Gen2, enable Hierarchical namespace. This is required for Data Lake Gen2 functionality.
  7. Click Review + Create > Create.
⚠️

Hierarchical namespace cannot be enabled after creation. You must enable it during Storage Account creation. If you have an existing account without it, you will need to create a new one.

Create a Container

  1. Open your Storage Account in the Azure portal.
  2. In the left sidebar, click Containers (under Data storage).
  3. Click + Container.
  4. Enter a Name (e.g. datafly-events).
  5. Set Public access level to Private.
  6. Click Create.

Get the Access Key

  1. In your Storage Account, go to Security + networking > Access keys.
  2. Click Show next to Key1.
  3. Copy the Key value (not the connection string).
⚠️

Store the access key securely. For production environments, consider using Azure RBAC with a service principal or managed identity instead of account keys.

Configuration

FieldTypeRequiredDescription
account_namestringYesThe Azure Storage account name.
containerstringYesThe Data Lake container (filesystem) name.
path_prefixstringNoOptional path prefix for stored objects. Defaults to events/. Include a trailing slash.
access_keysecretYesThe Azure Storage account access key.

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Azure Data Lake Storage Gen2 or filter by Cloud Storage.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/azure_data_lake/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Azure Data Lake Storage Gen2",
    "variant": "default",
    "config": {
      "account_name": "dataflydatalake",
      "container": "datafly-events",
      "path_prefix": "events/",
      "access_key": "YOUR_STORAGE_ACCOUNT_KEY"
    },
    "delivery_mode": "server_side"
  }'

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. In the Azure portal, open your Storage Account > Containers > your container.
  3. Navigate through the path prefix directory structure to find event files.
  4. Download a file and inspect the event data.
  5. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in the containerVerify the account name, container name, and access key are correct.
AuthenticationFailedThe access key is invalid or has been regenerated. Get the current key from Access Keys.
ContainerNotFoundThe container does not exist. Verify the container name in the Azure portal.
AccountRequiresHttpsEnsure the connection is using HTTPS. Check the storage account settings.
Files appearing but emptyCheck the batch settings. Events are buffered before flushing to files.
Hierarchical namespace errorsVerify that hierarchical namespace was enabled on the Storage Account. It cannot be enabled after creation.
Network access deniedCheck the Storage Account firewall rules under Networking > Firewalls and virtual networks. Add Signal’s IP addresses if firewall is enabled.

Visit Azure Data Lake Storage Gen2 documentation for full API reference, access control, and performance tuning guides.