Upgrades
This guide covers the standard upgrade process for Datafly Signal, including pre-upgrade checks, the upgrade procedure, rollback steps, and handling major version upgrades.
Version Policy
Datafly Signal follows Semantic Versioning:
| Version Change | Meaning | Migration Required |
|---|---|---|
| Patch (1.2.3) | Bug fixes only | No |
| Minor (1.3.0) | New features, backward-compatible | Possible (database migrations) |
| Major (2.0.0) | Breaking changes | Yes (see release notes) |
Pre-Upgrade Checklist
Before upgrading, verify:
- Read the release notes for the target version
- Check current health — all pods should be
RunningandReady - Check Kafka consumer lag — ensure all consumers are caught up
- Back up the database (see Backup & DR)
- Test in staging first if upgrading a major version
# Verify current status
kubectl get pods -n datafly
kubectl get hpa -n datafly
# Check Kafka consumer lag (if using Prometheus)
# Or check via Kafka UI / CLIStandard Upgrade
Step 1: Update the Helm chart
# Check available versions
helm search repo datafly --versions
# Or if using OCI:
helm show chart oci://ghcr.io/datafly/charts/datafly --version 1.3.0Step 2: Review changes
Compare the default values between your current version and the target version to identify any new configuration options:
helm show values oci://ghcr.io/datafly/charts/datafly --version 1.3.0 > values-new.yaml
diff values-current.yaml values-new.yamlStep 3: Run the upgrade
helm upgrade datafly oci://ghcr.io/datafly/charts/datafly \
--namespace datafly \
--values values-production.yaml \
--version 1.3.0If migrations.enabled: true in your values (recommended), database migrations run automatically as an init container on the Management API pod before the service starts. No manual migration step is required.
Step 4: Monitor the rollout
# Watch pods restart with the new version
kubectl rollout status deployment/ingestion-gateway -n datafly
kubectl rollout status deployment/event-processor -n datafly
kubectl rollout status deployment/management-api -n datafly
# Verify all pods are running
kubectl get pods -n datafly
# Check for errors in logs
kubectl logs -n datafly -l app.kubernetes.io/part-of=datafly --tail=50 --since=5mStep 5: Verify
# Test event ingestion
curl -X POST https://data.yourdomain.com/v1/t \
-H "Content-Type: application/json" \
-d '{"type":"track","event":"Upgrade Test","properties":{}}'
# Verify Management UI is accessible
curl -s -o /dev/null -w "%{http_code}" https://app.yourdomain.comRollback
If the upgrade causes issues, roll back to the previous version:
# Check Helm release history
helm history datafly -n datafly
# Rollback to the previous revision
helm rollback datafly -n datafly
# Or rollback to a specific revision
helm rollback datafly 3 -n dataflyHelm rollback reverts the Kubernetes resources but does not revert database migrations. If the new version included database schema changes, see “Reverting Migrations” below.
Reverting Migrations
If you need to revert database migrations after a rollback:
-
Check which migrations were applied:
kubectl exec -n datafly deploy/management-api -- \ /app/migrate version -
Revert migrations one at a time:
kubectl run migrate-down --rm -it \ --namespace datafly \ --image=ghcr.io/datafly/management-api:v1.2.0 \ --env="DATABASE_URL=postgresql://..." \ -- /app/migrate down 1 -
Verify the migration state matches the rolled-back version.
Zero-Downtime Upgrades
Datafly Signal supports zero-downtime upgrades when:
- PodDisruptionBudgets are enabled (
pdb.enabled: true) — ensures at least one pod per service remains available during the rollout - Rolling update strategy is used (Helm chart default) — Kubernetes replaces pods one at a time
- Readiness probes pass before traffic is routed to new pods
The default Helm chart configuration supports zero-downtime upgrades for all services with replicas >= 2.
Major Version Upgrades
Major versions (e.g., 1.x → 2.x) may include breaking changes. Follow these additional steps:
- Read the migration guide included in the release notes
- Back up everything — database, Kafka topic data, Redis state
- Test in a staging environment first
- Schedule a maintenance window (major upgrades may require brief downtime)
- Follow the version-specific migration steps in the release notes
Upgrade Troubleshooting
Pods stuck in CrashLoopBackOff after upgrade
This usually indicates a configuration incompatibility or failed migration:
# Check pod logs
kubectl logs -n datafly <pod-name> --previous
# If migration failed, check the init container logs
kubectl logs -n datafly <management-api-pod> -c migrateIf the migration init container is failing, see Troubleshooting — Migration Issues.
New pods not starting — ImagePullBackOff
The new image version may not be accessible:
kubectl describe pod <pod-name> -n datafly | grep -A5 "Events"Verify that:
- The image tag exists in the container registry
imagePullSecretsare configured if using a private registry- Network policies allow egress to the container registry
HPA not scaling after upgrade
If autoscaling stops working after an upgrade, check that the metrics server is running and the HPA targets are valid:
kubectl get hpa -n datafly
kubectl describe hpa ingestion-gateway -n dataflyNext Steps
- Set up automated backups with Backup & DR
- Monitor upgrade health with Observability
- See Troubleshooting for resolving common issues