Core Concepts
Before you start configuring the platform, it helps to understand the key building blocks and how they fit together.
Pipelines
A pipeline is the central concept in Datafly Signal. It represents one data collection endpoint — typically one website or app — and controls what happens to the events collected from it.
Each pipeline has:
- A pipeline key (
dk_...) — a unique identifier used by the JS collector to authenticate events - One or more integrations — vendor destinations where events will be delivered
- Parameters — shared configuration values (like a GA4 Measurement ID) that are injected into integration configs at processing time
Think of a pipeline as the answer to: “For this website, which vendors should receive events, and how should those events be transformed?”
When to Create Multiple Pipelines
In most cases, you’ll create one pipeline per website or app. You might create separate pipelines when:
- You have distinct websites with different vendor requirements
- You want completely separate configuration for staging vs production
- You manage multiple brands with different data destinations
Integrations
An integration connects your pipeline to a specific vendor destination. When you add an integration, you’re telling Signal: “Send events from this pipeline to this vendor’s API.”
Each integration includes:
- Vendor type — which vendor API to deliver to (GA4, Meta CAPI, TikTok, BigQuery, etc.)
- Credentials — API keys, access tokens, or measurement IDs required by the vendor
- Field mappings — how to transform your canonical events into the vendor’s expected format
Datafly Signal ships with 150+ pre-built integrations in the Integration Library, covering advertising platforms, analytics tools, CDPs, data warehouses, and more.
Templates and Revisions
Integrations use a template and revision model:
- A template is the integration itself (e.g. “Our GA4 Integration”)
- A revision is a versioned snapshot of its configuration
- You can create new revisions to update mappings without affecting live traffic until you’re ready to publish
This gives you a safe way to iterate on configuration changes.
Blueprints
A blueprint is a pre-built configuration for a specific vendor and industry vertical. Instead of mapping every field from scratch, you can select a blueprint when installing an integration to get a working configuration immediately.
For example, the GA4 Retail Blueprint pre-maps common e-commerce events (purchase, add_to_cart, view_item, etc.) to GA4’s Measurement Protocol format, including all the standard parameters GA4 expects.
Available blueprint verticals include Retail, Travel, and Media. Blueprints are fully editable after install — they’re a starting point, not a constraint.
The V2 Schema-Mapping Builder
When you configure an integration (either from a blueprint or from scratch), you use the V2 schema-mapping builder in the Management UI. This is where you define:
- Parameters — Connection credentials and shared values (e.g.
measurement_id,api_secret) - Global mappings — Fields that apply to every event sent to this vendor (e.g.
client_id,user_id) - Event mappings — Per-event-type field mappings (e.g. map
purchaseevents to GA4’spurchaseevent withtransaction_id,value,currency,items[]) - Defaults — Fallback values applied when source data is missing
Each field mapping specifies a source (where the value comes from in the canonical event) and a mode (direct mapping, static value, expression, or computed).
Event Processing
Every event passes through a consistent set of governance and transformation steps before delivery: consent enforcement, bot filtering, PII governance, identity stitching, per-vendor transformation, and deduplication. The exact list of controls and how each is configured is covered in Processing.
The Datafly.js Collector
Datafly.js is a lightweight, modular JavaScript SDK (around 5.2 KB gzipped) that you add to your website. It replaces all vendor-specific tags and sends events to your Datafly Signal endpoint.
It provides four core methods:
| Method | Purpose | Example |
|---|---|---|
page() | Track a page view | Called on every page load |
track() | Track a custom event | datafly.track('Purchase', { value: 99.99 }) |
identify() | Set the user’s identity | datafly.identify('user-123', { email: '...' }) |
group() | Associate the user with a group | datafly.group('company-456', { name: '...' }) |
The collector also automatically captures:
- Ad click IDs from URL parameters (
gclid,fbclid,ttclid, etc.) - Consent state from your consent management platform
- Page context (URL, title, referrer)
- An anonymous identifier used internally for identity stitching
Team & Roles
Datafly Signal uses role-based access control to manage who can do what:
| Role | What they can do |
|---|---|
| Org Admin | Full access — manage team, sources, pipelines, integrations, settings |
| Source Admin | Manage sources, integrations, pipelines, and brands |
| Source Editor | Create and edit sources, integrations, and transformations |
| Source Viewer | Read-only access to all resources and the real-time debugger |
| Data Governance Admin | Manage data layer, transformations, and consent settings |
Team management is available under Settings > RBAC in the Management UI.
Next Steps
Now that you understand the building blocks, head to Your First Pipeline to set everything up.