Getting StartedCore Concepts

Core Concepts

Before you start configuring the platform, it helps to understand the key building blocks and how they fit together.

Pipelines

A pipeline is the central concept in Datafly Signal. It represents one data collection endpoint — typically one website or app — and controls what happens to the events collected from it.

Each pipeline has:

  • A pipeline key (dk_...) — a unique identifier used by the JS collector to authenticate events
  • One or more integrations — vendor destinations where events will be delivered
  • Parameters — shared configuration values (like a GA4 Measurement ID) that are injected into integration configs at processing time

Think of a pipeline as the answer to: “For this website, which vendors should receive events, and how should those events be transformed?”

When to Create Multiple Pipelines

In most cases, you’ll create one pipeline per website or app. You might create separate pipelines when:

  • You have distinct websites with different vendor requirements
  • You want completely separate configuration for staging vs production
  • You manage multiple brands with different data destinations

Integrations

An integration connects your pipeline to a specific vendor destination. When you add an integration, you’re telling Signal: “Send events from this pipeline to this vendor’s API.”

Each integration includes:

  • Vendor type — which vendor API to deliver to (GA4, Meta CAPI, TikTok, BigQuery, etc.)
  • Credentials — API keys, access tokens, or measurement IDs required by the vendor
  • Field mappings — how to transform your canonical events into the vendor’s expected format

Datafly Signal ships with 150+ pre-built integrations in the Integration Library, covering advertising platforms, analytics tools, CDPs, data warehouses, and more.

Templates and Revisions

Integrations use a template and revision model:

  • A template is the integration itself (e.g. “Our GA4 Integration”)
  • A revision is a versioned snapshot of its configuration
  • You can create new revisions to update mappings without affecting live traffic until you’re ready to publish

This gives you a safe way to iterate on configuration changes.

Blueprints

A blueprint is a pre-built configuration for a specific vendor and industry vertical. Instead of mapping every field from scratch, you can select a blueprint when installing an integration to get a working configuration immediately.

For example, the GA4 Retail Blueprint pre-maps common e-commerce events (purchase, add_to_cart, view_item, etc.) to GA4’s Measurement Protocol format, including all the standard parameters GA4 expects.

Available blueprint verticals include Retail, Travel, and Media. Blueprints are fully editable after install — they’re a starting point, not a constraint.

The V2 Schema-Mapping Builder

When you configure an integration (either from a blueprint or from scratch), you use the V2 schema-mapping builder in the Management UI. This is where you define:

  • Parameters — Connection credentials and shared values (e.g. measurement_id, api_secret)
  • Global mappings — Fields that apply to every event sent to this vendor (e.g. client_id, user_id)
  • Event mappings — Per-event-type field mappings (e.g. map purchase events to GA4’s purchase event with transaction_id, value, currency, items[])
  • Defaults — Fallback values applied when source data is missing

Each field mapping specifies a source (where the value comes from in the canonical event) and a mode (direct mapping, static value, expression, or computed).

Event Processing

Every event passes through a consistent set of governance and transformation steps before delivery: consent enforcement, bot filtering, PII governance, identity stitching, per-vendor transformation, and deduplication. The exact list of controls and how each is configured is covered in Processing.

The Datafly.js Collector

Datafly.js is a lightweight, modular JavaScript SDK (around 5.2 KB gzipped) that you add to your website. It replaces all vendor-specific tags and sends events to your Datafly Signal endpoint.

It provides four core methods:

MethodPurposeExample
page()Track a page viewCalled on every page load
track()Track a custom eventdatafly.track('Purchase', { value: 99.99 })
identify()Set the user’s identitydatafly.identify('user-123', { email: '...' })
group()Associate the user with a groupdatafly.group('company-456', { name: '...' })

The collector also automatically captures:

  • Ad click IDs from URL parameters (gclid, fbclid, ttclid, etc.)
  • Consent state from your consent management platform
  • Page context (URL, title, referrer)
  • An anonymous identifier used internally for identity stitching

Team & Roles

Datafly Signal uses role-based access control to manage who can do what:

RoleWhat they can do
Org AdminFull access — manage team, sources, pipelines, integrations, settings
Source AdminManage sources, integrations, pipelines, and brands
Source EditorCreate and edit sources, integrations, and transformations
Source ViewerRead-only access to all resources and the real-time debugger
Data Governance AdminManage data layer, transformations, and consent settings

Team management is available under Settings > RBAC in the Management UI.

Next Steps

Now that you understand the building blocks, head to Your First Pipeline to set everything up.