IngestionOverview

Ingestion

The Ingestion Gateway is the entry point for all event data flowing into Datafly Signal. It is a Go service (port 8080) that receives events from multiple sources, validates authentication, enriches the payload with identity data, and publishes each event to the Kafka raw-events topic for downstream processing.

How It Works

Sources (browser, server, pixel, webhook)
  → Ingestion Gateway (authentication, identity, enrichment)
    → Kafka raw-events topic
      → Event Processor

Every event — regardless of how it arrives — is normalised into a canonical event envelope before being published. This means the processing layer and delivery workers never need to care about how the event was collected.

Input Methods

Datafly Signal supports five methods of ingesting events:

MethodEndpointAuthenticationUse Case
Browser (Datafly.js)POST /v1/t, /v1/p, /v1/i, /v1/gPipeline keyStandard website/app tracking
Server-SidePOST /v1/eventsHMAC-SHA256Backend event submission
BatchPOST /v1/batchHMAC or pipeline keyBulk imports, historical backfills
Tracking PixelGET /v1/pixel/{type}Pipeline key (query param)Email opens, no-JS environments
WebhookPOST /v1/webhookPer-source signatureThird-party service events

Key Capabilities

Pipeline Key Validation

Every request is authenticated via a pipeline key (dk_...). The gateway validates pipeline keys against PostgreSQL, with a Redis cache layer to avoid per-request database lookups. Invalid or revoked keys are rejected with a 401 Unauthorized response.

Anonymous Identity

On browser requests, the gateway sets a first-party _dfid cookie:

  • httpOnly — not accessible to client-side JavaScript
  • Secure — only sent over HTTPS
  • SameSite=Lax — prevents CSRF while allowing top-level navigations
  • 2-year TTL — persistent anonymous identity across sessions

This cookie serves as the anonymous identifier for the visitor, providing consistent identity without relying on third-party cookies.

Vendor ID Generation

The gateway self-generates vendor-specific identifiers and sets them as first-party cookies on the customer’s subdomain:

VendorCookieFormat
Google Analytics 4_gaGA1.1.{random}.{timestamp}
Meta / Facebook_fbpfb.1.{timestamp}.{random}
TikTok_ttpUUID v4

Because these are first-party cookies on the customer’s own domain, they persist through ITP restrictions and ad blocker rules.

Click ID Capture

The gateway automatically captures advertising click IDs from URL query parameters and includes them in the event payload:

ParameterVendor
gclidGoogle Ads
fbclidMeta / Facebook
ttclidTikTok
li_fat_idLinkedIn
ScCidSnapchat
epikPinterest
tduidThe Trade Desk

CORS Handling

Browser-based endpoints respond to OPTIONS preflight requests and include the appropriate Access-Control-Allow-* headers. The allowed origins are derived from the source configuration for each pipeline key.

Sections

  • HTTP Endpoints — Browser-facing event collection endpoints used by Datafly.js
  • Server-Side Events — Server-to-server event submission with HMAC authentication
  • Batch API — Submit multiple events in a single request
  • Tracking Pixel — 1x1 transparent GIF for email and no-JS tracking
  • Webhooks — Accept events from third-party services