Upgrades

This guide covers the standard upgrade process for Datafly Signal, including pre-upgrade checks, the upgrade procedure, rollback steps, and handling major version upgrades.

Version Policy

Datafly Signal follows Semantic Versioning:

Version Change	Meaning	Migration Required
Patch (1.2.3)	Bug fixes only	No
Minor (1.3.0)	New features, backward-compatible	Possible (database migrations)
Major (2.0.0)	Breaking changes	Yes (see release notes)

Pre-Upgrade Checklist

Before upgrading, verify:

Read the release notes for the target version
Check current health — all pods should be Running and Ready
Check Kafka consumer lag — ensure all consumers are caught up
Back up the database (see Backup & DR)
Test in staging first if upgrading a major version

# Verify current status
kubectl get pods -n datafly
kubectl get hpa -n datafly
 
# Check Kafka consumer lag (if using Prometheus)
# Or check via Kafka UI / CLI

Standard Upgrade

Step 1: Update the Helm chart

# Check available versions
helm search repo datafly --versions
 
# Or if using OCI:
helm show chart oci://ghcr.io/datafly/charts/datafly --version 1.3.0

Step 2: Review changes

Compare the default values between your current version and the target version to identify any new configuration options:

helm show values oci://ghcr.io/datafly/charts/datafly --version 1.3.0 > values-new.yaml
diff values-current.yaml values-new.yaml

Step 3: Run the upgrade

helm upgrade datafly oci://ghcr.io/datafly/charts/datafly \
  --namespace datafly \
  --values values-production.yaml \
  --version 1.3.0

If migrations.enabled: true in your values (recommended), database migrations run automatically as an init container on the Management API pod before the service starts. No manual migration step is required.

Step 4: Monitor the rollout

# Watch pods restart with the new version
kubectl rollout status deployment/ingestion-gateway -n datafly
kubectl rollout status deployment/event-processor -n datafly
kubectl rollout status deployment/management-api -n datafly
 
# Verify all pods are running
kubectl get pods -n datafly
 
# Check for errors in logs
kubectl logs -n datafly -l app.kubernetes.io/part-of=datafly --tail=50 --since=5m

Step 5: Verify

# Test event ingestion
curl -X POST https://data.yourdomain.com/v1/t \
  -H "Content-Type: application/json" \
  -d '{"type":"track","event":"Upgrade Test","properties":{}}'
 
# Verify Management UI is accessible
curl -s -o /dev/null -w "%{http_code}" https://app.yourdomain.com

Rollback

If the upgrade causes issues, roll back to the previous version:

# Check Helm release history
helm history datafly -n datafly
 
# Rollback to the previous revision
helm rollback datafly -n datafly
 
# Or rollback to a specific revision
helm rollback datafly 3 -n datafly

⚠️

Helm rollback reverts the Kubernetes resources but does not revert database migrations. If the new version included database schema changes, see “Reverting Migrations” below.

Reverting Migrations

If you need to revert database migrations after a rollback:

Check which migrations were applied:

kubectl exec -n datafly deploy/management-api -- \
  /app/migrate version

Revert migrations one at a time:

kubectl run migrate-down --rm -it \
  --namespace datafly \
  --image=ghcr.io/datafly/management-api:v1.2.0 \
  --env="DATABASE_URL=postgresql://..." \
  -- /app/migrate down 1

Verify the migration state matches the rolled-back version.

Zero-Downtime Upgrades

Datafly Signal supports zero-downtime upgrades when:

PodDisruptionBudgets are enabled (pdb.enabled: true) — ensures at least one pod per service remains available during the rollout
Rolling update strategy is used (Helm chart default) — Kubernetes replaces pods one at a time
Readiness probes pass before traffic is routed to new pods

The default Helm chart configuration supports zero-downtime upgrades for all services with replicas >= 2.

Major Version Upgrades

Major versions (e.g., 1.x → 2.x) may include breaking changes. Follow these additional steps:

Read the migration guide included in the release notes
Back up everything — database, Kafka topic data, Redis state
Test in a staging environment first
Schedule a maintenance window (major upgrades may require brief downtime)
Follow the version-specific migration steps in the release notes

Upgrade Troubleshooting

Pods stuck in `CrashLoopBackOff` after upgrade

This usually indicates a configuration incompatibility or failed migration:

# Check pod logs
kubectl logs -n datafly <pod-name> --previous
 
# If migration failed, check the init container logs
kubectl logs -n datafly <management-api-pod> -c migrate

If the migration init container is failing, see Troubleshooting — Migration Issues.

New pods not starting — ImagePullBackOff

The new image version may not be accessible:

kubectl describe pod <pod-name> -n datafly | grep -A5 "Events"

Verify that:

The image tag exists in the container registry
imagePullSecrets are configured if using a private registry
Network policies allow egress to the container registry

HPA not scaling after upgrade

If autoscaling stops working after an upgrade, check that the metrics server is running and the HPA targets are valid:

kubectl get hpa -n datafly
kubectl describe hpa ingestion-gateway -n datafly

Next Steps

Set up automated backups with Backup & DR
Monitor upgrade health with Observability
See Troubleshooting for resolving common issues

Migrations Backup & DR