IntegrationsDatabasesGoogle Spanner

Google Cloud Spanner

Datafly Signal writes first-party events into a Spanner table — globally distributed, strongly consistent, horizontally scalable relational storage.

Prerequisites

Before configuring Google Cloud Spanner in Signal, you need a GCP project with a Spanner instance, a database with a target table, and a service account.

Create a GCP Account and Project

  1. Sign up at cloud.google.com if you don’t already have an account.
  2. Create a new project or select an existing one in the GCP Console.
  3. Note the Project ID.

Enable the Cloud Spanner API

  1. Go to APIs & Services > Library.
  2. Search for Cloud Spanner API.
  3. Click Enable.

Create a Spanner Instance

  1. Go to the Spanner console.
  2. Click Create instance.
  3. Enter an Instance name (e.g. datafly-events) and Instance ID.
  4. Choose a Configuration:
    • Regional — data in a single region (lower latency, lower cost).
    • Multi-region — data replicated across regions (higher availability).
  5. Set the Compute capacity (processing units or nodes). 1 node = 1000 processing units.
  6. Click Create.

For development and testing, you can use the free trial instance (1 node, limited to specific regions). For production, size the instance based on your expected write throughput.

Create a Database

  1. In the Spanner console, click on your instance.
  2. Click Create database.
  3. Enter a Database name (e.g. events_db).
  4. Click Create.

Create a Table

In the Spanner console, open the database and run the following DDL:

CREATE TABLE Events (
  event_id STRING(64) NOT NULL,
  type STRING(20),
  event STRING(256),
  anonymous_id STRING(64),
  user_id STRING(256),
  timestamp TIMESTAMP,
  received_at TIMESTAMP,
  context JSON,
  properties JSON,
  traits JSON,
  source_id STRING(64),
  integration_id STRING(64),
) PRIMARY KEY (event_id);

Spanner uses the primary key for data distribution. Using event_id (a UUID) as the primary key ensures even distribution across splits. Avoid monotonically increasing keys like timestamps as primary keys — they cause hotspots.

Create a Service Account

  1. Go to IAM & Admin > Service Accounts > Create Service Account.
  2. Enter a name (e.g. datafly-signal-spanner).
  3. Grant the Cloud Spanner Database User role (roles/spanner.databaseUser).
  4. Click Done.

Generate a Service Account Key

  1. Click on the service account.
  2. Go to Keys > Add Key > Create new key > JSON.
  3. The key file will download. Store it securely.
⚠️

Store the JSON key file securely. Do not commit it to version control.

Configuration

FieldTypeRequiredDescription
project_idstringYesThe Google Cloud project ID that contains the Spanner instance.
instance_idstringYesThe Spanner instance ID.
database_idstringYesThe Spanner database ID.
tablestringYesThe target table name. Also accepts table_name.
service_account_jsonsecretYesThe full JSON key file content for a service account with roles/spanner.databaseUser.

You can alternatively supply a single fully-qualified database field in the form projects/<pid>/instances/<iid>/databases/<did> instead of the three split values.

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Google Cloud Spanner or filter by Database.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/google_spanner/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Google Cloud Spanner",
    "variant": "default",
    "config": {
      "project_id": "datafly-analytics",
      "instance_id": "datafly-events",
      "database_id": "events_db",
      "table": "Events",
      "service_account_json": "{\"type\": \"service_account\", ...}"
    },
    "delivery_mode": "server_side"
  }'

Schema

Signal writes the standard event envelope. The recommended table definition:

ColumnSpanner typeNotes
event_idSTRING(64) NOT NULLPrimary key. UUID gives even split distribution.
typeSTRING(20)Event type.
eventSTRING(256)Event name.
anonymous_idSTRING(64)First-party visitor identifier.
user_idSTRING(256)Logged-in user identifier (nullable).
timestampTIMESTAMPClient event time.
received_atTIMESTAMPTime Signal received the event.
contextJSONPage, device, user agent, consent metadata.
propertiesJSONCustom event properties.
traitsJSONUser traits.
source_idSTRING(64)Pipeline source identifier.
integration_idSTRING(64)Signal integration identifier.

Query JSON columns with JSON_VALUE() and JSON_QUERY().

Spanner is a first-party destination in your own GCP project. The default blueprint forwards all events. Apply consent filtering via pipeline transforms or downstream views on context if your governance requires it.

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. Open the Spanner console and navigate to your database.
  3. Go to Query and run:
SELECT * FROM Events ORDER BY timestamp DESC LIMIT 10;
  1. Verify that event rows are appearing with correct data.
  2. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in the tableVerify the project ID, instance ID, database ID, and table name are correct.
Permission denied (403)The service account lacks the Cloud Spanner Database User role. Add it in IAM & Admin > IAM.
NOT_FOUND: Database not foundThe database does not exist. Verify the database ID in the Spanner console.
NOT_FOUND: Table not foundThe table does not exist in the database. Verify the table name (case-sensitive in Spanner).
Invalid service account JSONEnsure you pasted the complete JSON key file content.
RESOURCE_EXHAUSTEDThe instance is at capacity. Increase the number of processing units or nodes.
Write hotspotsAvoid sequential primary keys. Use UUIDs or add a hash prefix to distribute writes evenly across splits.

Visit Google Cloud Spanner documentation for full SQL reference, schema design best practices, and performance tuning guides.

See also