Google BigQuery

Datafly Signal delivers events to Google BigQuery for large-scale analytics, reporting, and machine learning workloads using the Streaming Insert API.

Prerequisites

Before configuring Google BigQuery in Signal, you need a GCP project with BigQuery enabled, a dataset and table, and a service account with the BigQuery Data Editor role.

Create a GCP Account and Project

  1. Sign up at cloud.google.com if you don’t already have an account.
  2. Go to the GCP Console and click Select a project > New Project.
  3. Enter a Project name (e.g. datafly-analytics).
  4. Click Create.
  5. Note the Project ID (shown below the project name).

Enable the BigQuery API

  1. In the GCP Console, go to APIs & Services > Library.
  2. Search for BigQuery API.
  3. Click Enable (it may already be enabled by default).

Create a Dataset

  1. Go to the BigQuery console.
  2. In the Explorer panel, click the three dots next to your project and select Create dataset.
  3. Enter a Dataset ID (e.g. datafly_events).
  4. Select a Data location (e.g. US, EU). This cannot be changed after creation.
  5. Click Create dataset.

Create a Table

  1. In the BigQuery console, click the three dots next to your dataset and select Create table.
  2. Choose Empty table as the source.
  3. Enter a Table name (e.g. events).
  4. Define the schema manually or run the following SQL:
CREATE TABLE `your-project.datafly_events.events` (
  event_id STRING NOT NULL,
  type STRING,
  event STRING,
  anonymous_id STRING,
  user_id STRING,
  timestamp TIMESTAMP,
  received_at TIMESTAMP,
  sent_at TIMESTAMP,
  context JSON,
  properties JSON,
  traits JSON,
  source_id STRING,
  integration_id STRING
)
PARTITION BY DATE(timestamp)
CLUSTER BY type, event;

Partitioning by timestamp and clustering by type and event significantly improves query performance and reduces costs for large event volumes.

Create a Service Account

  1. Go to IAM & Admin > Service Accounts.
  2. Click Create Service Account.
  3. Enter a Name (e.g. datafly-signal-bigquery).
  4. Click Create and Continue.
  5. Grant the BigQuery Data Editor role (roles/bigquery.dataEditor).
  6. Click Done.

Generate a Service Account Key

  1. Click on the service account you just created.
  2. Go to the Keys tab.
  3. Click Add Key > Create new key.
  4. Select JSON format.
  5. Click Create. The key file will download automatically.
⚠️

Store the JSON key file securely. It provides write access to your BigQuery dataset. Do not commit it to version control. The key content is what you will paste into the Signal configuration.

Configuration

FieldTypeRequiredDescription
project_idstringYesThe Google Cloud project ID that contains the target BigQuery dataset.
dataset_idstringYesThe BigQuery dataset ID where the table resides.
table_idstringYesThe BigQuery table ID to insert rows into. The table must already exist with a compatible schema.
service_account_jsonsecretYesThe full JSON key file content for the GCP service account with BigQuery Data Editor permissions.
locationselectYesThe geographic location of the BigQuery dataset (e.g. US, EU, us-east1).

Signal Setup

Quick Setup

  1. Navigate to Integrations in the sidebar.
  2. Open the Integration Library tab.
  3. Find Google BigQuery or filter by Cloud Storage.
  4. Click Install, select a variant if available, and fill in the required fields.
  5. Click Install Integration to create the integration with a ready-to-use default blueprint.

API Setup

curl -X POST http://localhost:8084/v1/admin/integration-catalog/google_bigquery/install \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Google BigQuery",
    "variant": "default",
    "config": {
      "project_id": "datafly-analytics",
      "dataset_id": "datafly_events",
      "table_id": "events",
      "service_account_json": "{\"type\": \"service_account\", \"project_id\": \"datafly-analytics\", ...}",
      "location": "US"
    },
    "delivery_mode": "server_side"
  }'

Testing

  1. Enable the integration in Signal and trigger a test event on your website.
  2. Open the BigQuery console and navigate to your dataset and table.
  3. Click Preview to see recently inserted rows, or run a query:
SELECT * FROM `your-project.datafly_events.events`
ORDER BY timestamp DESC
LIMIT 10;
  1. Verify that event rows are appearing with correct data.
  2. In Signal, check the Live Events view to confirm delivery status shows as successful.

Troubleshooting

ProblemSolution
Events not appearing in BigQueryVerify the project ID, dataset ID, table ID, and location are correct.
Permission denied (403)The service account lacks the BigQuery Data Editor role. Go to IAM & Admin > IAM and add the role.
Not found: TableThe table does not exist. Verify the dataset and table IDs. Ensure the table has been created.
Invalid JSON keyThe service account JSON is malformed. Ensure you pasted the complete JSON key file content.
Schema mismatchThe table schema must match the event fields. Check column names and types with SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'events';.
Location mismatchThe location config must match the dataset’s actual location. Check the dataset details in BigQuery console.
Streaming buffer delayBigQuery streaming inserts may take a few minutes to appear in query results. Data is immediately available for real-time queries but may show a delay in the Preview tab.
Quota exceededBigQuery has streaming insert quotas. Check quotas and limits and request an increase if needed.

Visit Google BigQuery documentation for full API reference, SQL syntax, and cost optimisation guides.