DeploymentCoolify (Production)

Coolify (Production)

Deploy Datafly Signal to a VPS using Coolify as the deployment platform. This guide splits the platform into individually deployable resources for zero-downtime updates.

Architecture

The deployment uses 7 individual Coolify resources:

ResourceTypeSourcePurpose
InfrastructureDocker Compose EmptyPaste YAMLPostgreSQL, Redis, Kafka, Zookeeper
migrateDockerfile (GitHub)database/DockerfileDatabase migrations + admin seed
ingestion-gatewayDockerfile (GitHub)ingestion-gateway/DockerfileReceives events from Datafly.js
event-processorDockerfile (GitHub)event-processor/DockerfileProcesses events, routes to delivery topics
delivery-workersDockerfile (GitHub)delivery-workers/DockerfileDelivers events to vendor APIs
management-apiDockerfile (GitHub)management-api/DockerfileAdmin API for the management UI
management-uiDockerfile (GitHub)management-ui/DockerfileNext.js dashboard

This means you can redeploy a single service (e.g. management-api) without touching infrastructure or other services.

All resources must have “Connect to Predefined Network” enabled in Coolify so containers can communicate by hostname.

⚠️

Infrastructure hostnames are prefixed with dfsignal- (e.g. dfsignal-redis, dfsignal-kafka) to avoid conflicts with Coolify’s internal services (coolify-redis, coolify-db). All application env vars must use these prefixed hostnames.

Prerequisites

RequirementDetails
VPS8+ vCPU, 16+ GB RAM (IONOS VPS XXL recommended)
OSUbuntu 24.04
Coolifyv4.x installed
GitHubRepository connected to Coolify via GitHub App
DNSA records pointing to VPS IP for each domain

Step 1: Infrastructure (Docker Compose Empty)

The infrastructure stack uses only pre-built images — no Git source needed.

Create a new resource in Coolify

  • Click + New Resource in your project/environment
  • Type: Docker Compose Empty

Paste the compose file

Paste the following into the Docker Compose editor:

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.0
    hostname: dfsignal-zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
      KAFKA_OPTS: "-Dzookeeper.4lw.commands.whitelist=ruok,srvr,stat"
    volumes:
      - zookeeper-data:/var/lib/zookeeper/data
      - zookeeper-logs:/var/lib/zookeeper/log
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 512M
 
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    hostname: dfsignal-kafka
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: dfsignal-zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://dfsignal-kafka:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
      KAFKA_NUM_PARTITIONS: 3
      KAFKA_DEFAULT_REPLICATION_FACTOR: 1
      KAFKA_LOG_RETENTION_HOURS: 168
      KAFKA_LOG_RETENTION_BYTES: 1073741824
    volumes:
      - kafka-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "kafka-topics --bootstrap-server localhost:29092 --list"]
      interval: 15s
      timeout: 10s
      retries: 10
      start_period: 30s
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 2G
 
  redis:
    image: redis:7-alpine
    hostname: dfsignal-redis
    command: >
      redis-server
      --appendonly yes
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
      --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 5s
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 512M
 
  postgresql:
    image: postgres:16-alpine
    hostname: dfsignal-postgresql
    environment:
      POSTGRES_DB: ${DB_NAME}
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ${DB_NAME}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 2G
 
volumes:
  postgres-data:
  redis-data:
  kafka-data:
  zookeeper-data:
  zookeeper-logs:

Enable networking

In the resource settings, enable “Connect to Predefined Network”.

Set environment variables

DB_NAME=datafly
DB_USER=datafly
DB_PASSWORD=<strong-password>
REDIS_PASSWORD=<strong-password>

Deploy

Click Deploy. Wait for all services to show healthy (PostgreSQL, Redis, Kafka).

Step 2: Database Migrations

The migrate resource runs schema migrations and seeds the admin user. Deploy this after infrastructure is healthy, and redeploy whenever migration files change.

Create a new resource in Coolify

  • Type: Dockerfile (select your GitHub source and repository)
  • Base Directory: /Application/database
  • Dockerfile Location: Dockerfile

Enable networking

Enable “Connect to Predefined Network” so it can reach PostgreSQL.

Set environment variables

DATABASE_URL=postgres://datafly:<db-password>@dfsignal-postgresql:5432/datafly?sslmode=disable
ADMIN_EMAIL=admin@customer.com
ADMIN_PASSWORD=<strong-password>
ADMIN_ORG_NAME=Customer Name

Replace <db-password> with the actual DB_PASSWORD from Step 1.

The migrate container runs once and exits. Coolify will show it as “Exited” — this is expected. Stop the resource after it completes to prevent repeated restarts. It does not need a domain or health check.

Deploy

Click Deploy. Check the logs to confirm migrations ran successfully and the admin user was seeded.

Verify

In the Coolify terminal (select the dfsignal-postgresql container from the infrastructure resource):

psql -U datafly -d datafly -c "SELECT email, role FROM users;"

You should see the admin user with role org_admin.

Step 3: Application Services

Create each application service as an individual Dockerfile resource in Coolify, all pointing to the same GitHub repository.

Coolify settings per service

ServiceBase DirectoryDockerfile LocationPortDomain
ingestion-gateway/Applicationingestion-gateway/Dockerfile8080collect.customer.com
event-processor/Applicationevent-processor/Dockerfile8080(none — internal only)
delivery-workers/Applicationdelivery-workers/Dockerfile8080(none — internal only)
management-api/Applicationmanagement-api/Dockerfile8080api.customer.com
management-ui/Application/management-uiDockerfile3000app.customer.com
⚠️

Go services (ingestion-gateway, event-processor, delivery-workers, management-api) need Base Directory set to /Application because their Dockerfiles copy the shared/ module from the parent directory. The management-ui is self-contained and uses /Application/management-ui.

For each service:

  1. Create a new Coolify resource with type Dockerfile
  2. Select your GitHub source and repository
  3. Set Base Directory and Dockerfile Location per the table above
  4. Enable “Connect to Predefined Network”
  5. Set the port number
  6. Assign a domain (for public-facing services)
  7. Add environment variables (see below)
  8. Set health check path to /healthz (Go services) or / (management-ui)

Environment Variables

Shared across all Go services

DATAFLY_ENV=production
LOG_LEVEL=info
PORT=8080
KAFKA_BROKERS=dfsignal-kafka:29092
REDIS_ADDR=dfsignal-redis:6379
REDIS_PASSWORD=<same-as-infrastructure>
DATABASE_URL=postgres://datafly:<db-password>@dfsignal-postgresql:5432/datafly?sslmode=disable
HMAC_SECRET=<64-char-hex>

Additional per service

ingestion-gateway — no additional vars needed.

event-processor:

ENCRYPTION_KEY=<64-char-hex>

delivery-workers:

ENCRYPTION_KEY=<64-char-hex>
VENDOR_TYPE=webhook
WEBHOOK_URL=http://localhost:9999/noop

The WEBHOOK_URL default is a placeholder. Delivery workers idle on the Kafka topic until pipelines with integrations are configured. When adding a real webhook integration, set the actual URL here.

management-api:

ENCRYPTION_KEY=<64-char-hex>
JWT_SECRET=<64-char-hex>
SERVICE_URL_INGESTION_GATEWAY=https://collect.customer.com

management-ui:

NEXT_PUBLIC_API_URL=https://api.customer.com/v1
⚠️

NEXT_PUBLIC_API_URL is baked into the UI at build time. If you change the API domain, you must redeploy the management-ui for the change to take effect. Do not set NODE_ENV=production as a build-time variable — the Dockerfile handles this in the runtime stage. Setting it at build time causes npm install to skip devDependencies (including TypeScript), breaking the build.

Generating Secrets

Generate secure random hex strings for JWT_SECRET, HMAC_SECRET, and ENCRYPTION_KEY:

openssl rand -hex 32

Each secret should be a unique 64-character hex string.

Deploy Order

For a fresh deployment:

  1. Deploy Infrastructure — wait for PostgreSQL and Kafka to be healthy
  2. Deploy migrate — wait for it to complete (check logs), then stop the resource
  3. Deploy management-api and ingestion-gateway
  4. Deploy management-ui
  5. Deploy event-processor and delivery-workers

For updates, deploy only the resource that changed. For schema changes, redeploy migrate first, then the affected services.

DNS Configuration

Point these A records to your VPS IP:

DomainService
app.customer.commanagement-ui
api.customer.commanagement-api
collect.customer.comingestion-gateway
⚠️

If using Cloudflare, set DNS records to DNS only (grey cloud, not proxied) so that Coolify/Traefik can issue SSL certificates via Let’s Encrypt.

Verifying the Deployment

Check all services are running

In the Coolify terminal or via SSH:

docker ps --format "table {{.Names}}\t{{.Status}}" | grep -v coolify

All application containers should show Up with (healthy).

Test the ingestion gateway

curl -s https://collect.customer.com/healthz

Should return {"status":"ok"}.

Test the management API

curl -s https://api.customer.com/v1/health

Log in to the management UI

Navigate to https://app.customer.com and log in with the admin email and password from the migrate resource’s environment variables.

Troubleshooting

Service can’t reach Kafka/Redis/PostgreSQL

Ensure “Connect to Predefined Network” is enabled on both the infrastructure compose and the individual service resource. All containers must be on the same Coolify network.

Verify hostname resolution from inside a service container:

# From the Coolify terminal, select the service container
nslookup dfsignal-kafka
nslookup dfsignal-redis
nslookup dfsignal-postgresql
⚠️

Do not use generic hostnames like redis, kafka, or postgresql — these can resolve to Coolify’s own internal services (e.g. coolify-redis). Always use the dfsignal- prefixed hostnames.

Management UI shows API errors

Check that NEXT_PUBLIC_API_URL was set correctly at build time. This is baked into the JavaScript bundle — changing the env var alone won’t work; you must redeploy the management-ui.

Management UI build fails (Cannot find module ‘typescript’)

Ensure NODE_ENV=production is not set as a build-time variable. Coolify’s “Inject Build Args” feature passes env vars at build time, which causes npm install to skip devDependencies. The Dockerfile sets NODE_ENV=production only in the runtime stage.

Delivery workers crash looping

Check the logs for missing environment variables. Each vendor type requires its own config (e.g. WEBHOOK_URL for webhook, GA4_MEASUREMENT_ID for GA4). If no integrations are configured yet, set VENDOR_TYPE=webhook with the default placeholder URL.

Migration stuck in dirty state

If the migrate container fails with a dirty migration, connect to PostgreSQL and check the state:

SELECT * FROM schema_migrations;

The migrate container includes automatic dirty state recovery — it forces the dirty version clean and retries. If this still fails, manually fix the state:

UPDATE schema_migrations SET dirty = false;

Then redeploy the migrate resource.

Migrate container keeps restarting

This is expected. The migrate container runs once and exits. Coolify’s default restart policy keeps restarting it. After confirming migrations ran successfully in the logs, click Stop on the migrate resource in Coolify.