AWS Deployment

This guide walks through deploying Datafly Signal on Amazon Web Services using EKS, MSK, ElastiCache, and RDS.

Prerequisites

Install the following tools on your workstation:

ToolVersionPurpose
AWS CLIv2+AWS account access
eksctlv0.170+EKS cluster creation
kubectlv1.28+Kubernetes management
Helmv3.14+Chart installation

Ensure your AWS CLI is configured with credentials that have permissions to create EKS clusters, MSK clusters, ElastiCache, RDS instances, and IAM roles.

Architecture

                    Internet

                ┌──────▼──────┐
                │  AWS ALB     │
                │  (Ingress)   │
                └──────┬──────┘

              ┌────────▼────────┐
              │   Amazon EKS    │
              │  ┌────────────┐ │
              │  │ Datafly    │ │
              │  │ Services   │ │
              │  └──────┬─────┘ │
              └─────────┼───────┘
           ┌────────────┼────────────┐
           │            │            │
    ┌──────▼──────┐ ┌───▼───┐ ┌─────▼─────┐
    │ Amazon MSK  │ │  RDS  │ │ElastiCache│
    │ (Kafka)     │ │(PgSQL)│ │  (Redis)  │
    └─────────────┘ └───────┘ └───────────┘

Step 1: Create the EKS Cluster

Choose a node instance type based on your sizing tier:

TierInstance TypeNode CountvCPU/NodeMemory/Node
Smallt3.xlarge3416 GB
Mediumm5.xlarge3416 GB
Largem5.2xlarge5832 GB
XLm5.4xlarge81664 GB
eksctl create cluster \
  --name datafly-cluster \
  --region eu-west-1 \
  --version 1.30 \
  --nodegroup-name datafly-nodes \
  --node-type m5.xlarge \
  --nodes 3 \
  --nodes-min 3 \
  --nodes-max 6 \
  --managed

Verify the cluster is ready:

kubectl get nodes

Install the AWS ALB Ingress Controller

# Install the AWS Load Balancer Controller
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  --namespace kube-system \
  --set clusterName=datafly-cluster \
  --set serviceAccount.create=true \
  --set serviceAccount.name=aws-load-balancer-controller

The ALB Controller requires an IAM policy. Follow the AWS documentation for the full IAM setup.

Step 2: Provision Managed Services

Amazon MSK (Kafka)

Create an MSK cluster in the same VPC as your EKS cluster:

aws kafka create-cluster \
  --cluster-name datafly-kafka \
  --kafka-version 3.6.0 \
  --number-of-broker-nodes 2 \
  --broker-node-group-info \
    InstanceType=kafka.m5.large,\
    ClientSubnets=subnet-xxx,subnet-yyy,\
    SecurityGroups=sg-kafka \
  --encryption-info '{"EncryptionInTransit":{"ClientBroker":"TLS"}}'

Record the bootstrap brokers endpoint:

aws kafka get-bootstrap-brokers --cluster-arn arn:aws:kafka:...

Amazon ElastiCache (Redis)

aws elasticache create-replication-group \
  --replication-group-id datafly-redis \
  --replication-group-description "Datafly Signal Redis" \
  --engine redis \
  --engine-version 7.1 \
  --cache-node-type cache.t3.medium \
  --num-cache-clusters 1 \
  --security-group-ids sg-redis \
  --cache-subnet-group-name datafly-subnet-group \
  --transit-encryption-enabled \
  --auth-token "your-redis-auth-token"

Amazon RDS (PostgreSQL)

aws rds create-db-instance \
  --db-instance-identifier datafly-postgres \
  --db-instance-class db.t3.medium \
  --engine postgres \
  --engine-version 16.4 \
  --master-username datafly \
  --master-user-password "your-db-password" \
  --allocated-storage 20 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-postgres \
  --db-subnet-group-name datafly-subnet-group \
  --storage-encrypted \
  --backup-retention-period 7 \
  --db-name datafly
⚠️

Ensure the MSK, ElastiCache, and RDS security groups allow inbound traffic from the EKS node security group. All services must be in the same VPC (or peered VPCs).

Step 3: Configure Secrets

Store connection strings in AWS Secrets Manager:

aws secretsmanager create-secret \
  --name datafly/prod/database-url \
  --secret-string "postgresql://datafly:password@datafly-postgres.xxx.rds.amazonaws.com:5432/datafly?sslmode=require"
 
aws secretsmanager create-secret \
  --name datafly/prod/jwt-secret \
  --secret-string "$(openssl rand -hex 32)"
 
aws secretsmanager create-secret \
  --name datafly/prod/encryption-key \
  --secret-string "$(openssl rand -hex 32)"
 
aws secretsmanager create-secret \
  --name datafly/prod/hmac-secret \
  --secret-string "$(openssl rand -hex 32)"
 
aws secretsmanager create-secret \
  --name datafly/prod/licence-key \
  --secret-string "lic_your_licence_key"

Install the External Secrets Operator:

helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
  --namespace external-secrets --create-namespace

Then create a ClusterSecretStore for AWS:

apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: aws-secrets-manager
spec:
  provider:
    aws:
      service: SecretsManager
      region: eu-west-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
            namespace: external-secrets

Option B: Kubernetes Secrets (Simpler)

kubectl create namespace datafly
 
kubectl create secret generic datafly-secrets \
  --namespace datafly \
  --from-literal=DATABASE_URL="postgresql://datafly:password@datafly-postgres.xxx.rds.amazonaws.com:5432/datafly?sslmode=require" \
  --from-literal=REDIS_URL="rediss://:auth-token@datafly-redis.xxx.cache.amazonaws.com:6379" \
  --from-literal=KAFKA_BROKERS="b-1.datafly-kafka.xxx.kafka.eu-west-1.amazonaws.com:9094" \
  --from-literal=JWT_SECRET="$(openssl rand -hex 32)" \
  --from-literal=ENCRYPTION_KEY="$(openssl rand -hex 32)" \
  --from-literal=HMAC_SECRET="$(openssl rand -hex 32)" \
  --from-literal=DATAFLY_LICENCE_KEY="lic_your_licence_key"
 
kubectl create configmap datafly-config \
  --namespace datafly \
  --from-literal=ENVIRONMENT="prod" \
  --from-literal=LOG_LEVEL="info"

Step 4: Configure DNS and TLS

Route 53 DNS

Create DNS records pointing to the ALB:

# After Helm install, get the ALB address:
kubectl get ingress -n datafly
 
# Create Route 53 records (A record alias to ALB):
# data.yourdomain.com  → ALB
# app.yourdomain.com   → ALB
# api.yourdomain.com   → ALB

TLS with cert-manager

Install cert-manager:

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

Step 5: Install Datafly Signal

Use the AWS example values file as a starting point:

# Download the example values file
curl -O https://raw.githubusercontent.com/datafly/signal/main/deployments/helm/datafly/values-aws.yaml
 
# Edit values-aws.yaml to set your domain, secrets, and sizing tier
# Then install:
helm install datafly oci://ghcr.io/datafly/charts/datafly \
  --namespace datafly --create-namespace \
  --values values-aws.yaml \
  --set licenceKey=lic_your_licence_key

Key values to customise in values-aws.yaml:

ingress:
  hosts:
    - host: data.yourdomain.com    # Your data collection subdomain
      paths:
        - path: /v1
          pathType: Prefix
          service: ingestion-gateway
          port: 8080
        - path: /d.js
          pathType: Exact
          service: ingestion-gateway
          port: 8080
    - host: app.yourdomain.com     # Management UI
      paths:
        - path: /
          pathType: Prefix
          service: management-ui
          port: 3000
    - host: api.yourdomain.com     # Management API
      paths:
        - path: /
          pathType: Prefix
          service: management-api
          port: 8083
  tls:
    - secretName: datafly-tls
      hosts:
        - data.yourdomain.com
        - app.yourdomain.com
        - api.yourdomain.com
 
externalSecrets:
  enabled: true
  provider: aws
  secretStore: aws-secrets-manager
  keys:
    databaseUrl: datafly/prod/database-url
    jwtSecret: datafly/prod/jwt-secret
    encryptionKey: datafly/prod/encryption-key
    hmacSecret: datafly/prod/hmac-secret
    licenceKey: datafly/prod/licence-key

Step 6: Verify the Deployment

Check pod status

kubectl get pods -n datafly

All pods should show Running with 1/1 ready.

Check the ingress

kubectl get ingress -n datafly

The ADDRESS column should show the ALB DNS name.

Test event ingestion

curl -X POST https://data.yourdomain.com/v1/t \
  -H "Content-Type: application/json" \
  -d '{"type":"track","event":"Test Event","properties":{"source":"deployment-test"}}'

Access the Management UI

Open https://app.yourdomain.com in your browser. Log in with the default admin credentials from the seed data, or create a new admin user via the Management API.

Check logs

kubectl logs -n datafly -l app.kubernetes.io/name=ingestion-gateway --tail=50
kubectl logs -n datafly -l app.kubernetes.io/name=event-processor --tail=50

Cost Estimate (Small Tier)

ServiceInstanceMonthly Cost
EKS Control Plane~$73
EC2 Nodes (3x m5.xlarge)On-Demand~$420
MSK (2 brokers, kafka.m5.large)~$260
ElastiCache (cache.t3.medium)~$50
RDS (db.t3.medium, 20 GB)~$55
ALB~$25
Total~$883/mo

These are approximate on-demand costs for eu-west-1. Reserved instances and savings plans can reduce costs by 30-60%. Use the Sizing Calculator for detailed estimates by tier.

Next Steps