Google BigQuery
Datafly Signal delivers events to Google BigQuery for large-scale analytics, reporting, and machine learning workloads using the Streaming Insert API.
Prerequisites
Before configuring Google BigQuery in Signal, you need a GCP project with BigQuery enabled, a dataset and table, and a service account with the BigQuery Data Editor role.
Create a GCP Account and Project
- Sign up at cloud.google.com if you don’t already have an account.
- Go to the GCP Console and click Select a project > New Project.
- Enter a Project name (e.g.
datafly-analytics). - Click Create.
- Note the Project ID (shown below the project name).
Enable the BigQuery API
- In the GCP Console, go to APIs & Services > Library.
- Search for BigQuery API.
- Click Enable (it may already be enabled by default).
Create a Dataset
- Go to the BigQuery console.
- In the Explorer panel, click the three dots next to your project and select Create dataset.
- Enter a Dataset ID (e.g.
datafly_events). - Select a Data location (e.g.
US,EU). This cannot be changed after creation. - Click Create dataset.
Create a Table
- In the BigQuery console, click the three dots next to your dataset and select Create table.
- Choose Empty table as the source.
- Enter a Table name (e.g.
events). - Define the schema manually or run the following SQL:
CREATE TABLE `your-project.datafly_events.events` (
event_id STRING NOT NULL,
type STRING,
event STRING,
anonymous_id STRING,
user_id STRING,
timestamp TIMESTAMP,
received_at TIMESTAMP,
sent_at TIMESTAMP,
context JSON,
properties JSON,
traits JSON,
source_id STRING,
integration_id STRING
)
PARTITION BY DATE(timestamp)
CLUSTER BY type, event;Partitioning by timestamp and clustering by type and event significantly improves query performance and reduces costs for large event volumes.
Create a Service Account
- Go to IAM & Admin > Service Accounts.
- Click Create Service Account.
- Enter a Name (e.g.
datafly-signal-bigquery). - Click Create and Continue.
- Grant the BigQuery Data Editor role (
roles/bigquery.dataEditor). - Click Done.
Generate a Service Account Key
- Click on the service account you just created.
- Go to the Keys tab.
- Click Add Key > Create new key.
- Select JSON format.
- Click Create. The key file will download automatically.
Store the JSON key file securely. It provides write access to your BigQuery dataset. Do not commit it to version control. The key content is what you will paste into the Signal configuration.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
project_id | string | Yes | The Google Cloud project ID that contains the target BigQuery dataset. |
dataset_id | string | Yes | The BigQuery dataset ID where the table resides. |
table_id | string | Yes | The BigQuery table ID to insert rows into. The table must already exist with a compatible schema. |
service_account_json | secret | Yes | The full JSON key file content for the GCP service account with BigQuery Data Editor permissions. |
location | select | Yes | The geographic location of the BigQuery dataset (e.g. US, EU, us-east1). |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Google BigQuery or filter by Cloud Storage.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/google_bigquery/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Google BigQuery",
"variant": "default",
"config": {
"project_id": "datafly-analytics",
"dataset_id": "datafly_events",
"table_id": "events",
"service_account_json": "{\"type\": \"service_account\", \"project_id\": \"datafly-analytics\", ...}",
"location": "US"
},
"delivery_mode": "server_side"
}'Testing
- Enable the integration in Signal and trigger a test event on your website.
- Open the BigQuery console and navigate to your dataset and table.
- Click Preview to see recently inserted rows, or run a query:
SELECT * FROM `your-project.datafly_events.events`
ORDER BY timestamp DESC
LIMIT 10;- Verify that event rows are appearing with correct data.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in BigQuery | Verify the project ID, dataset ID, table ID, and location are correct. |
Permission denied (403) | The service account lacks the BigQuery Data Editor role. Go to IAM & Admin > IAM and add the role. |
Not found: Table | The table does not exist. Verify the dataset and table IDs. Ensure the table has been created. |
Invalid JSON key | The service account JSON is malformed. Ensure you pasted the complete JSON key file content. |
| Schema mismatch | The table schema must match the event fields. Check column names and types with SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'events';. |
| Location mismatch | The location config must match the dataset’s actual location. Check the dataset details in BigQuery console. |
| Streaming buffer delay | BigQuery streaming inserts may take a few minutes to appear in query results. Data is immediately available for real-time queries but may show a delay in the Preview tab. |
| Quota exceeded | BigQuery has streaming insert quotas. Check quotas and limits and request an increase if needed. |
Visit Google BigQuery documentation for full API reference, SQL syntax, and cost optimisation guides.