Scaling for High Volume

Signal scales via configuration — the same code and Docker images handle 1,000 events/sec and 2,000,000+ events/sec. The difference is environment variable settings and infrastructure sizing.

Throughput Tiers

Tier	Partitions	Concurrency	Delivery Mode	Max Events/sec
Default	3	3	Single topic, 10 workers	100,000–150,000
Medium	12	12	Single topic, 50 workers	500,000
Large	48	48	Per-vendor topics, 100 workers each	2,000,000+
Maximum	96	96	Per-vendor topics, 200 workers each	5,000,000+

The default configuration handles most customers. Only increase when you’re consistently above 50,000 events/sec.

Scaling Environment Variables

These variables control throughput. All are optional — defaults are safe for small-to-medium deployments.

Kafka & Processing

Variable	Default	Description
`KAFKA_NUM_PARTITIONS`	`3`	Number of partitions per Kafka topic. Set in infrastructure compose. More partitions = more concurrent consumers.
`CONSUMER_CONCURRENCY`	`3`	Number of concurrent message handlers in the event processor. Should match partition count.
`DELIVERY_CONCURRENCY`	`10`	Number of concurrent delivery goroutines in the delivery workers.
`REDIS_POOL_SIZE`	`100`	Redis connection pool size. Increase for high-concurrency deployments.
`DB_MAX_CONNS`	`50`	PostgreSQL max connection pool size.

Per-Vendor Delivery Topics

Variable	Default	Description
`DELIVERY_TOPIC_MODE`	`single`	Set to `per_vendor` to publish delivery events to separate topics per vendor type (e.g. `delivery-ga4`, `delivery-meta_capi`).
`VENDOR_TYPE`	(empty)	When `DELIVERY_TOPIC_MODE=per_vendor`, set this on each delivery worker instance to consume from a specific vendor topic (e.g. `ga4`, `meta_capi`, `tiktok`).

When DELIVERY_TOPIC_MODE=per_vendor, deploy one delivery worker instance per vendor type. Each instance sets VENDOR_TYPE to its vendor. This allows independent scaling — a slow vendor (rate-limited API) doesn’t block other vendors.

Bot Pre-Filter

Variable	Default	Description
`BOT_PREFILTER`	`false`	Set to `true` to run bot detection in a separate service before the main event processor.

When enabled, deploy the botfilter binary (from event-processor/cmd/botfilter/) as a separate service. It reads from raw-events, drops bots, and publishes clean events to filtered-events. The main event processor then consumes from filtered-events instead of raw-events.

This is beneficial when bot traffic exceeds 30% of total volume — the event processor only handles real events, reducing CPU and Kafka throughput.

Adaptive Delivery

The delivery workers automatically manage vendor rate limits. No configuration needed.

Vendor	Rate Limit	Batch Size
GA4 Measurement Protocol	1,000/sec	25 events/request
Meta Conversions API	2,000/sec	1,000 events/request
TikTok Events API	500/sec	100 events/request
Amplitude	1,000/sec	2,000 events/request
Google BigQuery	50,000/sec	10,000 rows/request
Webhooks	Unlimited	1 event/request

Under normal traffic, events are delivered immediately (real-time). When the send rate approaches 80% of a vendor’s limit, events are automatically buffered and sent at the vendor’s maximum safe rate. This prevents 429 (Too Many Requests) errors and wasted retries.

GKE Sizing Guide

Recommended infrastructure sizing by customer volume:

Size	Events/month	Kafka	PostgreSQL	Redis	Processors	Est. GCP Cost
Small	Up to 1M	Single broker, 1 partition	Cloud SQL Basic (1 vCPU)	Memorystore 1GB	1 replica	~$150/mo
Medium	1–10M	Single broker, 3 partitions	Cloud SQL Standard (2 vCPU)	Memorystore 2GB	2 replicas	~$400/mo
Large	10–100M	3-broker cluster, 12 partitions	Cloud SQL HA (4 vCPU)	Memorystore 4GB	6 replicas	~$1,200/mo
Enterprise	100M+	5-broker cluster, 48 partitions	Cloud SQL HA (8 vCPU)	Redis Cluster 8GB	24 replicas	~$3,500/mo

Cost Optimisation

Spot Instances

Event processor, delivery workers, and ingestion gateway are stateless — they restart cleanly and are ideal for GKE spot (preemptible) nodes. This saves 60–80% on compute costs.

Only Kafka, PostgreSQL, and Redis need stable on-demand instances.

Autoscaling

Use Kubernetes HPA (Horizontal Pod Autoscaler) or KEDA for Kafka lag-based scaling:

Service	Min Replicas	Max Replicas	Scale Metric
Ingestion Gateway	2	20	CPU > 60%
Event Processor	1	48	Kafka consumer lag > 1000
Delivery Workers	1	48	Kafka consumer lag > 1000
Management API	1	3	CPU > 70%

Kafka Tiered Storage

For high-volume deployments, move Kafka data older than 24 hours to cloud object storage (GCS/S3):

Hot tier (0–24h): SSD — fast reads for real-time consumers
Cold tier (1–7d): Object storage — cheap storage for replay/recovery

Confluent for Kubernetes supports this natively. For open-source Kafka 3.6+, use remote storage plugins.

Monitoring

Key metrics to watch:

Metric	Warning Threshold	Action
Kafka consumer lag (event-processor)	> 5,000 messages	Scale up event processor replicas or increase concurrency
Kafka consumer lag (delivery-workers)	> 10,000 messages	Scale up delivery workers or check vendor API health
Redis memory usage	> 80% of max	Increase Redis instance size or check for TTL issues
Event processor CPU	> 70% sustained	Scale up replicas
Delivery worker 429 responses	> 1% of requests	Check vendor rate limits, enable adaptive buffering

Signal’s startup retry logic handles infrastructure restarts gracefully — services wait up to 20 seconds for Redis and PostgreSQL to become available before giving up. No manual intervention needed for rolling restarts.

Architecture Overview