In modern web development, “deploy strategy” isn’t just “push deploy” anymore. Blue/green, canary, rolling: three popular approaches. Each offers a different risk, cost, and complexity trade-off.
Here are the production deploy scenarios I’ve collected over 19 years.
Rolling deploy
The most common and simplest approach. Upgrade the instances in your server cluster to the new version one at a time.
Process:
- Server 1: old version to new version (30 seconds offline)
- Server 2: old to new (30 seconds offline)
- Server 3, 4, 5… same
The load balancer health check ensures only healthy servers serve traffic.
Pros:
– Simple setup
– No extra infrastructure
– Low memory overhead
– Most Kubernetes deployments default to this
Cons:
– Rollback is gradual (every server has to roll back)
– Brief version mismatch (some users on v1, some on v2)
– Database schema migration is tricky
– Partial failure mid-deploy is messy
When to use it:
– Small to medium team
– Standard web app
– No breaking changes
– Cost-conscious
Blue/green deploy
Two identical production environments. Blue is current, Green is new. The traffic switch is atomic.
Process:
- Blue environment is live (current version)
- Deploy new version to Green (no traffic yet)
- Test Green (smoke tests, performance checks)
- Load balancer switches traffic to Green
- Blue sits idle, ready for rollback
Pros:
– Zero-downtime deploy
– Instant rollback (switch traffic back)
– Pre-production testing on an actual environment
– Database migrations are easier
– No version mismatch
Cons:
– Double the infrastructure cost
– Stateful app migration is complex
– Careful with session handling
– Extra operational complexity
When to use it:
– High-traffic production
– Zero-downtime requirement
– Critical systems (payment, healthcare)
– Rollback speed is critical
Canary deploy
Progressive rollout. Deploy to a small user group first, expand if it goes well.
Process:
- Deploy v2 to one server (5 to 10% of the cluster)
- Monitor for 15 to 30 minutes (error rate, latency, business metrics)
- If OK, expand to 25%, monitor
- If OK, 50%
- If OK, 100%
If an issue appears: immediate rollback (only 5% of traffic was affected).
Advanced version: route by user ID hash. The same user always hits the same version.
Pros:
– Lowest-risk deployment
– Real user testing on real data
– Gradual confidence building
– Easy rollback
Cons:
– Complex traffic splitting
– Feature-flag-like complexity
– Database schema changes require care (backward compat across versions)
– Longer deploy cycle (hours vs minutes)
When to use it:
– High-stakes changes (core algorithms, payment logic)
– Large user base
– Breaking change expected but not certain
– A/B testing alongside
Comparison matrix
| Aspect | Rolling | Blue/Green | Canary |
|——–|———|————|——–|
| Infrastructure cost | Low | High (2x) | Medium |
| Rollback speed | Slow | Instant | Medium |
| Risk level | Medium | Low | Lowest |
| Complexity | Low | Medium | High |
| Zero downtime | No (brief) | Yes | Yes |
| Database migration | Hard | Medium | Hardest |
| User-level consistency | No | Yes | Yes (with routing) |
Database schema migration
The sneakiest part of deploy strategy. Do code and schema change at the same time?
Rolling deploy: v1 and v2 coexist for a few minutes. The schema has to be compatible with both. That’s the expand-contract pattern.
Phases:
1. Expand: add the new column (optional, with default value)
2. Deploy new code: writes and reads the new column
3. Backfill: populate the new column for old data
4. Contract: drop the old column (next deploy)
Spread across 3 to 4 deploy cycles. Requires patience.
Blue/green: Green could have its own database, but replication gets hairy. Usually a shared DB with expand-contract schema.
Canary: v1 and v2 run in parallel even longer. Schema compatibility matters the most here.
Tool support
Kubernetes: rolling by default. Canary via Flagger, Argo Rollouts. Blue/green via Spinnaker.
AWS ECS: rolling by default. Blue/green via CodeDeploy.
Vercel, Netlify: atomic deploy (instant switch). Basic blue/green.
Heroku: rolling by default. Canary via add-ons.
Custom: roll your own routing with Nginx config changes, HAProxy, Traefik.
Real project scenarios
Scenario 1: SaaS e-commerce, 100K monthly users
- Kubernetes cluster, 10 pods
- Regular database migrations
- Critical checkout flow
Decision: rolling by default, canary for checkout changes. Blue/green infrastructure cost isn’t worth it.
Scenario 2: mobile app backend, 1M+ users
- High traffic, low-latency requirement
- Payment processing at the core
Decision: blue/green. Zero downtime is mandatory. Instant rollback matters for payment issues.
Scenario 3: analytics platform, internal users
- 500 internal users
- Complex data pipelines
- Latency is tolerable
Decision: rolling. Canary or blue/green would be overkill. Internal user feedback is fast.
Scenario 4: fintech, critical ML model update
- Trading algorithms
- Millions depend on the decision
Decision: aggressive canary. 1% rollout, 2 weeks of monitoring, then expand. Risk minimization is paramount.
Monitoring during deploy
What to watch during a deploy:
1. Error rate. Is it climbing from baseline? 0.5% tolerance is typical.
2. Latency p50/p95/p99. Any regression?
3. Business metrics. Checkout conversion, signups, and so on. Any anomaly after deploy?
4. Infrastructure health. CPU, memory, disk I/O. Hardware stress?
5. Synthetic checks. Smoke tests every minute.
Failed deploy state = automatic rollback trigger. Thresholds are defined in the CI/CD pipeline.
Common failure modes
Rolling deploy failure: 50% of servers are on the new version, and the new version is buggy. Rollback has to revert the whole cluster. Downtime risk.
Fix: extensive staging tests before deploy. Rolling deploy should fail fast, smoke test the first pod.
Blue/green switchover disaster: the Green environment passed tests, but once it takes production traffic, unknown behavior emerges (real data, real load). Rollback is instant, but some users are affected.
Fix: route pre-production traffic to Green (mirror real requests, compare responses).
Canary silent failure: the 5% group is hitting a bug but metrics don’t reflect it. The rollout slowly widens to 100%.
Fix: comprehensive metric coverage. A user feedback channel. Error tracking on every path.
Rollback strategy
An important part of deploy strategy: rollback.
Rolling: gradual rollback to the previous version. Every server downgrades.
Blue/green: load balancer switch. Instant.
Canary: disable the routing rule. The new version gets ignored.
Target: rollback under 5 minutes. During a production issue, every minute is lost revenue.
Testing deploys
How do you test deploy automation?
1. A staging environment that mirrors production. Test deploys there.
2. Chaos engineering. Random pod kills, network blips during deploy. Test recovery.
3. Game days. Quarterly. Deploy plus failure scenario simulation. Test team reaction.
4. Disaster recovery drills. “Production deploy failed, do the rollback drill”. Timed.
Without this discipline, a real failed deploy means panic mode.
Wrap-up
Deploy strategy is a function of risk tolerance plus infrastructure budget plus operational capability. Rolling is enough for most projects. Blue/green for high-stakes systems. Canary for critical changes.
Database schema migration makes the expand-contract pattern essential. Rollback strategy is the most critical piece of preparation.
Monitoring during deploy is as important as monitoring after. Early detection plus instant rollback saves production.