IdentityDevice Recognition

Device Recognition

Device recognition is a server-side mechanism for re-identifying returning visitors when traditional identifiers (cookies, localStorage) have been cleared or are unavailable. It uses a combination of HTTP request signals and optional browser attributes to generate a probabilistic device hash — without relying on browser fingerprinting libraries or third-party scripts.

Device recognition is not browser fingerprinting. It does not use canvas rendering, font enumeration, or other intrusive techniques. It operates on signals that the browser already sends with every HTTP request, plus a small number of optional attributes collected by Datafly.js with explicit opt-in.

The Problem

The _dfid cookie provides persistent first-party identity, but it can be lost:

  • The visitor clears their browser cookies or browsing data
  • The visitor uses private/incognito browsing
  • Safari ITP clears storage in edge cases
  • The visitor switches to a different browser profile

When the cookie is gone, the Ingestion Gateway generates a new _dfid, and all previously collected vendor IDs, click IDs, and identity associations are orphaned in Redis under the old anonymous ID. The visitor appears as a brand new user.

Device recognition provides a probabilistic recovery path: if the incoming request’s device signals match a previously seen device, Signal can reconnect the visitor to their existing identity record.

How It Works

Signal Collection

When a request arrives at the Ingestion Gateway, the following signals are collected:

Server-side signals (always available):
  → IP address
  → User-Agent header
  → Accept-Language header

Client-side signals (sent by Datafly.js when enabled):
  → Screen resolution (e.g. "1920x1080")
  → Timezone (e.g. "Europe/London")
  → Locale (e.g. "en-GB")

Optional signals (opt-in only):
  → WebGL renderer string
  → Canvas hash

Hash Generation

The collected signals are concatenated and hashed using SHA-256 to produce a device fingerprint:

Input:   IP + User-Agent + Accept-Language + screen + timezone + locale
Hash:    SHA-256(concatenated input)
Result:  "a3f8c2e1b9d7..." (64-character hex string)

The hash is stored in Redis, keyed to the visitor’s current _dfid:

SET device:{hash} {anonymous_id} EX {ttl_seconds}

Identity Recovery

On subsequent requests where no _dfid cookie is present, the gateway computes the device hash from the incoming signals and checks Redis:

1. New request arrives -- no _dfid cookie
2. Compute device hash from request signals
3. GET device:{hash} from Redis
4. If found → existing anonymous_id recovered
5. Set _dfid cookie to the recovered anonymous_id
6. All previous identity associations are restored

Confidence Scoring

Not all signal combinations are equally reliable. Device recognition assigns a confidence score based on which signals matched and how unique the combination is:

ConfidenceScoreCriteria
High90%+All core signals + at least 2 optional signals match; unique IP (not shared network)
Medium70—89%All core signals match; IP may be shared (e.g. corporate network)
Low50—69%Core signals match but IP is a known VPN/proxy or shared NAT

The confidence score is returned in the event payload so that downstream consumers can decide how to treat the match:

{
  "anonymous_id": "f47ac10b-58cc-4372-a567-0d02b2c3d479",
  "device_recognition": {
    "matched": true,
    "confidence": 0.92,
    "level": "high",
    "signals_matched": ["ip", "user_agent", "accept_language", "screen", "timezone", "locale"]
  }
}
⚠️

Device recognition is probabilistic, not deterministic. A high-confidence match is strong evidence that the visitor is the same person, but it is not a guarantee. Configure your confidence threshold based on your use case — paywall enforcement may require high confidence, while analytics enrichment can tolerate medium.

Configuration

Device recognition is configured per organisation in the Management UI or via the Management API:

{
  "device_recognition": {
    "enabled": true,
    "ttl_days": 30,
    "confidence_threshold": 0.7,
    "optional_signals": {
      "webgl": false,
      "canvas": false
    }
  }
}
FieldTypeDefaultDescription
enabledbooleanfalseEnable device recognition
ttl_daysnumber30How long device hashes are stored in Redis (7—90 days)
confidence_thresholdnumber0.7Minimum confidence score to accept a match (0.5—1.0)
optional_signals.webglbooleanfalseInclude WebGL renderer string in the hash
optional_signals.canvasbooleanfalseInclude canvas hash in the hash

Optional signals (WebGL, canvas) increase accuracy but require Datafly.js to perform a small amount of additional work on page load. They are disabled by default and should only be enabled when higher accuracy is needed and the privacy implications have been reviewed.

Publisher Paywall Integration

For publishers using device recognition to enforce metered paywalls (detecting returning visitors who have cleared cookies to reset their article count), Signal exposes a dedicated API endpoint:

POST /v1/fingerprint/check
Content-Type: application/json
Authorization: Bearer {api_key}

{
  "signals": {
    "ip": "203.0.113.42",
    "user_agent": "Mozilla/5.0 ...",
    "accept_language": "en-GB,en;q=0.9",
    "screen": "1920x1080",
    "timezone": "Europe/London",
    "locale": "en-GB"
  }
}

Response:

{
  "matched": true,
  "anonymous_id": "f47ac10b-58cc-4372-a567-0d02b2c3d479",
  "confidence": 0.92,
  "level": "high",
  "first_seen": "2026-02-01T10:30:00Z",
  "visit_count": 47
}

This allows the publisher’s paywall logic to query Signal directly and determine whether the visitor has been seen before, regardless of cookie state.

Privacy Considerations

Device recognition is designed with privacy in mind:

  • Opt-in only — disabled by default; must be explicitly enabled per organisation
  • Hashed signals — raw signals are never stored; only the SHA-256 hash is persisted in Redis
  • Configurable TTL — device hashes expire after the configured TTL (default 30 days, max 90 days)
  • No third-party sharing — device hashes are used only within the customer’s own Signal deployment
  • GDPR considerations — device hashes may constitute personal data under GDPR. Customers should include device recognition in their privacy policy and cookie consent mechanism. The hash can be deleted via the Management API as part of a data subject access request (DSAR)
⚠️

Consult your data protection officer or legal team before enabling device recognition. While the signals used are less intrusive than traditional browser fingerprinting, the resulting hash may be considered personal data under GDPR, ePrivacy, or equivalent regulations in your jurisdiction.

Limitations

Device recognition has inherent limitations that affect its accuracy:

LimitationImpact
VPN/proxy usersIP address changes, reducing the hash to non-IP signals only. Confidence drops to low.
Shared networksMultiple users behind the same NAT (corporate offices, university campuses) may produce identical hashes. Confidence scoring accounts for this.
Browser updatesUser-Agent string changes on browser updates, invalidating the hash. The device will appear as a new visitor until the hash is re-established.
Anti-fingerprintingBrowsers like Brave and Firefox with strict privacy settings may randomise or omit signals, reducing match accuracy.
Mobile carriersMobile networks frequently rotate IP addresses, making IP an unreliable signal for mobile visitors.
Screen resolution changesExternal monitor connections or display scaling changes alter the screen resolution signal.

Despite these limitations, device recognition provides meaningful identity recovery for the majority of desktop browser sessions where cookies have been cleared — the most common identity loss scenario.