Google Cloud Storage
Datafly Signal writes batches of first-party events as objects into a GCS bucket — ready for query with BigQuery external tables, Dataproc, or any analytics framework that reads GCS.
This integration is currently in beta. Configuration and behaviour may change.
Prerequisites
Before configuring Google Cloud Storage in Signal, you need a GCP project with a GCS bucket and a service account with the Storage Object Creator role.
Create a GCP Account and Project
- Sign up at cloud.google.com if you don’t already have an account.
- Create a new project or select an existing one in the GCP Console.
- Note the Project ID.
Enable the Cloud Storage API
- Go to APIs & Services > Library.
- Search for Cloud Storage JSON API.
- Click Enable (it may already be enabled by default).
Create a GCS Bucket
- Go to the Cloud Storage console.
- Click Create bucket.
- Enter a Bucket name (e.g.
datafly-events-production). Names must be globally unique. - Choose a Location type:
- Region — lowest latency, single region (recommended for Signal).
- Multi-region — higher availability, replicated across regions.
- Choose a Storage class: Standard (recommended for frequently accessed data), Nearline, Coldline, or Archive.
- Leave Public access prevention enforced (recommended).
- Click Create.
Choose a region close to your Signal infrastructure to minimise latency and egress costs. Standard storage class is recommended for event data that will be queried regularly.
Create a Service Account
- Go to IAM & Admin > Service Accounts > Create Service Account.
- Enter a name (e.g.
datafly-signal-gcs). - Grant the Storage Object Creator role (
roles/storage.objectCreator). - Click Done.
Generate a Service Account Key
- Click on the service account.
- Go to Keys > Add Key > Create new key > JSON.
- The key file will download. Store it securely.
Store the JSON key file securely. Do not commit it to version control. The entire JSON content is what you will paste into the Signal configuration.
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | The GCS bucket name. Also accepts bucket_name. |
project_id | string | Yes | The Google Cloud project ID that owns the bucket. |
service_account_json | secret | Yes | The full JSON key file content for a service account with roles/storage.objectCreator. |
prefix | string | No | Optional prefix (folder path) prepended to all object names (e.g. events/). Include a trailing slash. |
Signal Setup
Quick Setup
- Navigate to Integrations in the sidebar.
- Open the Integration Library tab.
- Find Google Cloud Storage or filter by Cloud Storage.
- Click Install, select a variant if available, and fill in the required fields.
- Click Install Integration to create the integration with a ready-to-use default blueprint.
API Setup
curl -X POST http://localhost:8084/v1/admin/integration-catalog/google_cloud_storage/install \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Google Cloud Storage",
"variant": "default",
"config": {
"bucket": "datafly-events-production",
"project_id": "datafly-analytics",
"service_account_json": "{\"type\": \"service_account\", ...}",
"prefix": "events/"
},
"delivery_mode": "server_side"
}'Schema
Signal writes batched events as newline-delimited JSON (NDJSON) objects. Each line is one event using the canonical envelope (event_id, type, event, anonymous_id, user_id, timestamp, received_at, sent_at, context, properties, traits, source_id, integration_id).
Object keys follow <prefix>YYYY/MM/DD/HH/<batch-uuid>.json.gz, suitable for BigQuery external tables with Hive-style partition layout.
Consent
GCS is a first-party destination in your own GCP project. The default blueprint forwards all events. Apply consent-aware filtering or partitioning via pipeline transforms on context.consent if needed.
Testing
- Enable the integration in Signal and trigger a test event on your website.
- Open the Cloud Storage console and navigate to your bucket.
- Browse to the prefix path and verify that event files are appearing.
- Click on a file to download and inspect the event data.
- In Signal, check the Live Events view to confirm delivery status shows as successful.
Troubleshooting
| Problem | Solution |
|---|---|
| Events not appearing in the bucket | Verify the bucket name, project ID, and prefix are correct. |
Permission denied (403) | The service account lacks Storage Object Creator role on the bucket. Add it in IAM & Admin > IAM. |
Bucket not found (404) | The bucket does not exist. Verify the bucket name (globally unique, case-sensitive). |
| Invalid service account JSON | Ensure you pasted the complete JSON key file content, including all fields. |
| Files appearing but empty | Check the batch settings. Events are buffered before flushing to files. |
| Bucket location mismatch | The bucket location does not affect connectivity but may impact latency. Choose a location close to your Signal deployment. |
| Uniform bucket-level access errors | If the bucket uses uniform bucket-level access, ensure the service account has the role at the bucket level via IAM, not legacy ACLs. |
Visit Google Cloud Storage documentation for full API reference, lifecycle management, and access control guides.