Amazon S3
Datafly Signal writes batches of first-party events as objects into an S3 bucket — ready for query with Athena, Glue, Spark, Presto, or any downstream lakehouse tooling.
Prerequisites
Before configuring Amazon S3 in Signal, you need an AWS account with an S3 bucket and an IAM user with write permissions.
Create an AWS Account
If you don’t already have one, sign up at aws.amazon.com.
Create an S3 Bucket
- Open the S3 console.
- Click Create bucket.
- Enter a Bucket name (e.g.
datafly-events-production). Bucket names must be globally unique. - Select the AWS Region closest to your Signal deployment.
- Leave Block Public Access settings enabled (recommended).
- Optionally enable Bucket Versioning for data protection.
- Click Create bucket.
Choose a region close to your Signal infrastructure to minimise latency and data transfer costs.
Create an IAM User for Signal
- Open the IAM console.
- Go to Users > Create user.
- Enter a username (e.g.
datafly-signal-s3). - Attach a custom policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::datafly-events-production/*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::datafly-events-production"
}
]
}- Replace the bucket name with your value.
Generate Access Keys
- On the IAM user detail page, go to Security credentials.
- Under Access keys, click Create access key.
- Select Application running outside AWS.
- Copy the Access Key ID and Secret Access Key immediately — the secret is only shown once.
Store these credentials securely. If you lose the secret access key, you must create a new key pair.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | The S3 bucket name. Also accepts bucket_name. |
region | select | Yes | The AWS region of the bucket (e.g. us-east-1). |
access_key_id | secret | Yes | AWS access key ID with s3:PutObject on the bucket. |
secret_access_key | secret | Yes | Secret access key matching the access key ID. |
session_token | secret | No | Optional STS session token for temporary credentials. |
prefix | string | No | Optional prefix (folder path) prepended to all object keys. Include a trailing slash (e.g. events/). |
storage_class | select | No | Optional S3 storage class (e.g. STANDARD, STANDARD_IA, INTELLIGENT_TIERING). Defaults to STANDARD. |
server_side_encryption | select | No | Optional SSE algorithm: AES256 or aws:kms. |
kms_key_id | string | No | KMS key ARN — required when server_side_encryption is aws:kms. |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Amazon S3 or filter by Cloud Storage.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/amazon_s3/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Amazon S3",
"variant": "default",
"config": {
"bucket": "datafly-events-production",
"region": "us-east-1",
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"prefix": "events/",
"storage_class": "STANDARD",
"server_side_encryption": "AES256"
},
"delivery_mode": "server_side"
}'Schema
Signal writes batched events as newline-delimited JSON (NDJSON) objects. Each line is one event using the canonical envelope:
{"event_id":"...", "type":"track", "event":"order_completed",
"anonymous_id":"...", "user_id":"...",
"timestamp":"2026-05-12T10:00:00Z", "received_at":"...", "sent_at":"...",
"context":{...}, "properties":{...}, "traits":{...},
"source_id":"...", "integration_id":"..."}Object keys follow the pattern <prefix>YYYY/MM/DD/HH/<batch-uuid>.json.gz, suitable for Athena partition projection on date components.
Consent
S3 is a first-party destination in your own AWS account. The default blueprint forwards all events. If you need consent-aware partitioning, branch on context.consent in your pipeline transforms before delivery.
Testing
- Enable the integration in Signal and trigger a test event on your website.
- Open the S3 console and navigate to your bucket.
- Browse to the prefix path and verify that event files are appearing.
- Download a file and inspect the contents to confirm the event data is correct.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in S3 | Verify the bucket name, region, and prefix are correct. Check that the bucket exists. |
AccessDenied | The IAM user lacks s3:PutObject permission on the bucket. Update the IAM policy. |
NoSuchBucket | The bucket does not exist. Verify the bucket name (names are case-sensitive and globally unique). |
InvalidAccessKeyId | The access key ID is incorrect or the IAM user has been deleted. Verify credentials. |
SignatureDoesNotMatch | The secret access key is incorrect. Regenerate the access key pair. |
| Files appearing but empty | Check the batch settings. Events are buffered before flushing to files. |
| Wrong region error | S3 bucket region must match the region config field. Check the bucket’s actual region in the S3 console. |
Visit Amazon S3 documentation for full API reference, lifecycle policies, and cost optimisation guides.