Customer-Hosted Deployment
Datafly Signal supports deployment into a customer’s own cloud account (VPC). This model is designed for organisations with strict compliance requirements where event data must never leave their own infrastructure.
Overview
In a customer-hosted deployment, the entire Datafly Signal stack runs within the customer’s cloud account:
Customer's VPC
├── Kubernetes Cluster
│ ├── Ingestion Gateway
│ ├── Event Processor
│ ├── Delivery Workers
│ ├── Identity Hub
│ ├── Management API
│ └── Management UI
├── Managed Kafka (MSK / Confluent)
├── Managed Redis (ElastiCache / Memorystore)
└── Managed PostgreSQL (RDS / Cloud SQL)Benefits
- Data sovereignty — event data never leaves the customer’s VPC. All processing happens within their cloud account.
- Compliance — meets requirements for GDPR, HIPAA, SOC 2, and industry-specific regulations that mandate data residency.
- Network control — the customer controls all firewall rules, VPC peering, and network policies.
- Audit — the customer has full visibility into infrastructure logs, network traffic, and resource usage.
Requirements
Kubernetes Cluster
| Requirement | Minimum | Recommended |
|---|---|---|
| Kubernetes version | 1.28+ | 1.30+ |
| Worker nodes | 3 | 5+ |
| Node size | 4 vCPU / 8 GB | 8 vCPU / 16 GB |
| Ingress controller | Any (nginx, ALB, Traefik) | nginx-ingress or AWS ALB |
Managed Services
| Service | Requirement |
|---|---|
| Apache Kafka | AWS MSK, Confluent Cloud, or self-managed Kafka 3.6+ |
| Redis | AWS ElastiCache, GCP Memorystore, or self-managed Redis 7+ |
| PostgreSQL | AWS RDS, GCP Cloud SQL, or self-managed PostgreSQL 16+ |
All managed services must be accessible from the Kubernetes cluster within the same VPC or via VPC peering.
TLS Certificates
A valid TLS certificate is required for the customer’s data collection subdomain (e.g., data.customer.com). This can be provisioned via:
- AWS Certificate Manager (ACM)
- Let’s Encrypt (cert-manager in Kubernetes)
- Customer’s existing certificate authority
Network Configuration
DNS
The customer creates a DNS A record (or CNAME) pointing their data collection subdomain to the Kubernetes ingress controller’s external IP or load balancer:
data.customer.com → A → <ingress-controller-external-ip>
app.customer.com → A → <ingress-controller-external-ip>
api.customer.com → A → <ingress-controller-external-ip>Firewall Rules
| Direction | Source | Destination | Port | Purpose |
|---|---|---|---|---|
| Inbound | Internet | Ingress Controller | 443 | Browser event collection |
| Inbound | Admin IPs | Ingress Controller | 443 | Management UI and API access |
| Outbound | Delivery Workers | Vendor APIs | 443 | Server-to-server delivery |
| Internal | Kubernetes pods | Kafka | 9092 | Event streaming |
| Internal | Kubernetes pods | Redis | 6379 | Caching |
| Internal | Kubernetes pods | PostgreSQL | 5432 | Configuration storage |
Delivery Workers must have outbound HTTPS access to vendor API endpoints (e.g., www.google-analytics.com, graph.facebook.com). Ensure your firewall or NAT gateway allows outbound traffic on port 443.
Installation
1. Provision infrastructure
Create the managed Kafka, Redis, and PostgreSQL instances within the customer’s VPC. Record the connection endpoints.
2. Install the Helm chart
helm install datafly ./deployments/helm/datafly \
--namespace datafly \
--create-namespace \
--values values-customer.yaml3. Run database migrations
kubectl run migrations --rm -it \
--namespace datafly \
--image=ghcr.io/datafly/migrations:v1.2.0 \
--env="DATABASE_URL=postgresql://..." \
-- migrate-up4. Configure DNS
Point the customer’s subdomain to the ingress controller’s external IP.
5. Verify
# Check all pods are running
kubectl get pods -n datafly
# Test the ingestion endpoint
curl -X POST https://data.customer.com/v1/t \
-H "Authorization: Bearer dk_live_..." \
-H "Content-Type: application/json" \
-d '{"type":"track","event":"Test Event","properties":{}}'Updates
Updates are delivered as new Helm chart versions. The customer (or Datafly operations team with access) runs a Helm upgrade:
helm upgrade datafly ./deployments/helm/datafly \
--namespace datafly \
--values values-customer.yamlHelm chart upgrades follow semantic versioning. Minor and patch versions are backward-compatible. Major versions may require migration steps documented in the release notes.
Update Process
- Review the release notes for the new version.
- Run database migrations if required (
make migrate-upor the migration job). - Run
helm upgradewith the new chart version. - Verify all pods are healthy:
kubectl get pods -n datafly. - Run a test event to confirm end-to-end delivery.
Monitoring
The customer is responsible for monitoring their own infrastructure. Datafly Signal services expose:
| Endpoint | Port | Description |
|---|---|---|
/health | Service port | Health check (returns 200 OK when healthy) |
/metrics | Service port | Prometheus-compatible metrics endpoint |
Recommended monitoring stack:
- Prometheus for metrics collection
- Grafana for dashboards
- AlertManager for alerting on service health, Kafka consumer lag, and delivery failures
Datafly provides sample Grafana dashboards as part of the Helm chart. Enable them with monitoring.grafana.dashboards.enabled: true in the values file.