Vendor Identity Sync

The Vendor Identity Sync Engine is the orchestration layer that coordinates all identity resolution methods — self-generation, click ID capture, and server-proxied enrichment — into a unified identity store. It determines which vendor IDs to collect for each source, manages storage and TTLs in Redis, and ensures the correct IDs are attached to events during processing.

In a traditional client-side tag management setup, each vendor’s JavaScript tag creates its own cookies. A server-side tag manager could theoretically just read those cookies from inbound requests and forward the values to vendor APIs.

With Signal, this approach fails because vendor tags never fire. The Google Analytics script never loads, so the _ga cookie is never created. The Meta pixel never fires, so _fbp never exists. There are no vendor cookies to passively read.

Signal must actively create or collect these identities itself. Depending on the vendor, it does so through self-generation (mint an ID Signal controls, in the vendor’s format), click ID capture (read attribution parameters from URLs), gateway identity sync (redirect the browser to the vendor’s public sync endpoint and capture the returned ID), or server-proxied enrichment (authenticated API call from the gateway). See the method comparison below.

Four Sync Methods

The sync engine uses four methods to establish vendor identity, applied in combination:

Method	When It Runs	What It Produces	Vendor JavaScript Required
Self-Generation	On first request (per vendor)	Vendor-format IDs Signal mints server-side (`ga_client_id`, `fbp`, `ttp`)	No
Click ID Capture	On every request with URL parameters	Attribution IDs captured from the URL (`gclid`, `fbclid`, `epik`, `msclkid`, etc.)	No
Gateway Identity Sync	When no vendor cookie yet exists and consent is granted	Vendor-assigned 1p cookies harvested via a single browser redirect (`pin_unauth`, `microsoft_muid`, `sc_cookie1`, etc.)	No
Server-Proxied Enrichment	On first request or on schedule	Identity tokens from authenticated vendor APIs (RampID envelope, UID2 token, Acxiom Person ID, TDID via API)	No

Each method is independent. A visitor can have self-generated IDs, captured click IDs, synced vendor cookies, and enrichment results all stored simultaneously. The sync engine does not require all four to be present — it uses whatever IDs are available for each vendor.

Which method is correct for a given vendor depends on what the vendor’s own pixel would do:

Vendors whose pixel sets a cookie Signal can format consistently (Meta _fbp, TikTok _ttp, GA4 _ga) → Self-Generation
Vendors whose primary match signal is an attribution parameter carried in the URL (Google Ads click IDs, most performance channels) → Click ID Capture
Vendors whose pixel sets a cookie on the vendor’s domain via a match-pixel redirect (Pinterest, Microsoft, Snapchat, Reddit, LinkedIn, X, Criteo, Trade Desk, etc.) → Gateway Identity Sync
Vendors whose identity requires an authenticated API call with credentials and hashed PII (LiveRamp ATS, UID2, Acxiom RealID) → Server-Proxied Enrichment

Per-Source Vendor Module Selection

Not every source needs every vendor’s identity. The sync engine only generates, captures, and syncs IDs for vendors that have an active integration configured on the source:

Source: "Main Website" (src_abc123)
  Integrations: GA4, Meta CAPI, TikTok Events API, Pinterest Ads, Microsoft Ads

  → Self-generation:       ga_client_id, fbp, ttp
  → Click ID capture:      gclid, fbclid, ttclid, epik, msclkid
  → Gateway identity sync: pin_unauth, microsoft_muid
  → No Snapchat, LinkedIn, or Reddit IDs are collected

This means the Ingestion Gateway only sets cookies that are actually needed, minimising cookie overhead and avoiding unnecessary Redis storage.

When a new integration is added to a source, the sync engine begins generating the relevant vendor ID on the next request from that visitor. Existing visitors receive the new vendor ID on their next page load — there is no need to wait for a new session.

Removing an integration does not immediately delete stored vendor IDs from Redis. They expire naturally based on their configured TTL. This allows re-enabling an integration without losing identity continuity.

Vendor ID Storage

All vendor IDs are stored in a Redis hash keyed by the visitor’s canonical identity (so synced IDs survive cross-device when a user_id ties two canonicals together):

Key:    cvid:{org_id}:{canonical_id}
Type:   Hash

Fields (value is a JSON record with value + source + collected_at + expires_at):
  ga_client_id    → { "value": "1234567890.1708876543",
                      "source": "generated", ... }
  fbp             → { "value": "fb.1.1708876543000.9876543210",
                      "source": "generated", ... }
  fbc             → { "value": "fb.1.1708876543000.IwAR3x...",
                      "source": "derived", ... }
  ttp             → { "value": "a1b2c3d4e5f6g7h8i9j0k1l2m3n",
                      "source": "generated", ... }
  gclid           → { "value": "CjwKCAjw...",
                      "source": "click", ... }
  fbclid          → { "value": "IwAR3x...", "source": "click", ... }
  pin_unauth      → { "value": "f47ac10b-58cc-4372-a567-0d02b2c3d479",
                      "source": "sync", ... }
  microsoft_muid  → { "value": "0123456789ABCDEF...", "source": "sync", ... }
  sc_cookie1      → { "value": "b82d5c91-3f7a-4e28-9c8f-1a2b3c4d5e6f",
                      "source": "sync", ... }
  ramp_id         → { "value": "XY1000bGluZWRpbjANCm...",
                      "source": "api", ... }
  uid2_token      → { "value": "AgAAAAN...", "source": "api", ... }

The source field identifies how the ID was collected: generated (self-gen), derived (computed from another ID, e.g. fbc from fbclid), click (captured from URL), sync (gateway identity sync), or api (server-proxied enrichment). This metadata is preserved through to delivery so the event-processor can apply source-specific rules (e.g. only use a click ID within its attribution window).

Each field in the hash represents a single vendor identity value. This structure allows the Event Processor to retrieve all IDs for a visitor in a single HGETALL call, or fetch specific fields with HMGET when preparing events for individual vendors.

TTL Management

Different vendor IDs have different expected lifetimes. The sync engine manages TTLs at two levels:

Self-generated IDs (ga_client_id, fbp, ttp) are set as first-party cookies on the customer’s domain so Datafly.js can read them back on subsequent visits. Their Max-Age matches the vendor’s expected cookie lifetime:

Vendor ID	Cookie TTL
`ga_client_id` (GA4 client ID)	2 years
`fbp` (Meta browser ID)	90 days
`fbc` (Meta click ID)	90 days
`ttp` (TikTok)	13 months

Click IDs captured from URL parameters are also persisted as first-party cookies with a 90-day default lifetime, so they survive page navigation within the attribution window.

Gateway-synced IDs (pin_unauth, microsoft_muid, sc_cookie1, etc.) are not stored as first-party cookies on the customer’s domain — they live only in Redis under the canonical identity. See Gateway Identity Sync for details. The vendor’s own cookie is set on the vendor’s own domain by the redirect chain; Signal reads that cookie’s value via the 302 callback but does not mirror it client-side.

Redis TTL

Vendor ID fields in Redis use per-field expiry via a companion key pattern:

identity:{anonymous_id}           → Hash with all IDs (TTL = max of any field)
identity:{anonymous_id}:ttl:fbc   → Expiry marker for _fbc (90 days)
identity:{anonymous_id}:ttl:gclid → Expiry marker for gclid (90 days)

A background process periodically cleans up expired fields from the hash. The hash itself persists as long as any field is still valid.

⚠️

Click IDs have short TTLs (typically 90 days) because they represent a single ad interaction. Do not extend click ID TTLs beyond the vendor’s attribution window — doing so may cause the vendor to reject the stale click ID.

How IDs Flow Through the System

The full lifecycle of a vendor ID, from creation to delivery:

1. COLLECTION (Ingestion Gateway)
   Browser request → Gateway checks for existing cookies
   → Missing vendor IDs are generated (self-generation)
   → URL parameters are scanned for click IDs
   → All IDs are set as cookies (Set-Cookie header)
   → All IDs are stored in Redis hash

2. STORAGE (Redis)
   identity:{anonymous_id} hash holds all vendor IDs
   → Each field has an independent TTL
   → Updated on every request (TTLs refreshed for active visitors)

3. ENRICHMENT (Event Processor)
   Event arrives from Kafka (raw-events topic)
   → Processor reads anonymous_id from the event
   → Looks up identity:{anonymous_id} in Redis
   → For each target integration, attaches the relevant vendor IDs
   → Publishes enriched event to delivery topic

4. DELIVERY (Delivery Workers)
   Delivery Worker consumes from delivery-{integration_id}
   → Formats the vendor ID into the API payload
   → Sends server-to-server to vendor API
   → Vendor matches the ID to its own identity graph

Example: GA4 Delivery

{
  "client_id": "1234567890.1708876543",
  "events": [{
    "name": "purchase",
    "params": {
      "transaction_id": "T-12345",
      "value": 79.99,
      "currency": "USD"
    }
  }]
}

The client_id value 1234567890.1708876543 was self-generated by Signal, set as the _ga cookie on the browser, stored in Redis, retrieved during processing, and included in the Measurement Protocol payload. GA4 accepts it as a valid client ID and can stitch this server-sent event with any client-side GA4 data (if the standard GA4 tag also runs), or use it as a standalone identity for pure server-side implementations.

Sync Timing

Identity sync operations happen at different points depending on the method:

Operation	When	Blocking
Self-generate vendor IDs	Ingestion Gateway request handling	Yes — IDs are set in the response
Capture click IDs from URL	Ingestion Gateway request handling	Yes — included in event payload
Gateway identity sync	Browser follows redirect after `/v1/batch` response carries `pending_syncs`	No — captured asynchronously; available for subsequent events
Server-proxied enrichment	Asynchronous after first event	No — result stored in Redis for next event
Attach IDs to events	Event Processor pipeline	Yes — IDs are attached before delivery

Self-generation and click ID capture are synchronous operations that happen during request handling. The generated IDs are immediately available in the event payload and in Redis. Gateway identity sync and server-proxied enrichment are asynchronous — the browser redirect (or API call) happens after the initial event is processed, and the result is available for subsequent events.

For server-proxied enrichment, the dual-storage pattern (Redis + browser IndexedDB) means the ID is available server-side immediately after the enrichment call completes, even before the visitor’s next page load. See Server-Proxied Enrichment for details.

Anonymous ID Self-Generated IDs