In microservice architectures the moment someone says “distributed transaction”, saga pattern enters the chat. Orchestration, choreography, compensating actions, state machines. Complex, and a real operational load.
On plenty of projects that complexity isn’t required. A simple compensating-action approach is enough. Here are lighter alternatives.
Why not two-phase commit?
In a monolith you use database transactions. Commit or rollback. Easy. In microservices, three services have three databases. Two-phase commit (2PC) looks like the answer on paper, but in practice:
- Every service has to support 2PC (most don’t)
- If the coordinator goes down, the system locks up
- Performance is poor (two round trips per transaction)
- It’s fail-prone in cloud-native environments
That’s why 2PC is almost never used in modern microservices. You need an alternative.
The saga pattern
With a saga you run each step sequentially. If a step fails, you run the compensating action for every prior step to undo them.
Order flow example:
1. Create order (Order service)
2. Authorize payment (Payment service)
3. Reserve inventory (Inventory service)
4. Schedule shipping (Shipping service)
If step 3 fails: void the payment, cancel the order. Compensating actions.
Sagas come in two flavours:
Orchestration: a central orchestrator runs each step and holds the state. Clean, but a single point of failure.
Choreography: each service listens for itself. Events trigger reactions. Decoupled, but the logical flow is hard to follow.
Both carry real implementation complexity.
A simpler alternative: compensating actions + eventual consistency
Most business scenarios don’t need the full saga dance. The pattern I use:
1. Local transaction first. Whatever your service can do inside its own transaction, do it. Write to the database, publish an event.
2. Event publishing via the outbox pattern. In the same transaction, write both the business data and the event to the database. A separate process then moves the event to the queue (CDC or polling).
3. Other services listen for the event and do their work. If one fails, it publishes a compensating event.
4. Monitoring and alerting. You automatically detect states that fell out of sync and alert a human.
That approach gets you 80% of what a saga gives you, at maybe 30% of the implementation complexity.
A real example
In Parademi the subscription upgrade flow goes like this:
- User taps upgrade. The Payment service charges the new subscription. On success it publishes
subscription.upgraded. - User service listens for the event and updates the user’s plan.
- Notification service listens and emails the user.
- Analytics service listens and records the upgrade metric.
If payment succeeds, steps 2-4 run in parallel and independently, each with its own retry logic. One failing doesn’t affect the others.
If payment fails, no event is published and nothing else changes. The user sees an error and retries.
Strictly speaking, that isn’t a saga. No compensating actions are needed because each step is idempotent and independent. Complexity stays minimal.
Idempotency is non-negotiable
The foundation of this approach: every consumer has to be idempotent. Receiving the same event twice has to produce the same outcome. How to get there:
Deduplicate by event ID. Store processed event IDs on the consumer side. Skip duplicates.
Natural idempotency. “Set the user’s plan to pro” is the same whether you run it once or twice. “Add 1000 credits to the user” is not. Design for natural idempotency where you can.
Outbox pattern. Move event publishing inside the business transaction. Avoid partial state.
When compensating actions are worth the cost
There are cases where compensating actions really are needed:
1. Irreversible side effects. Once a physical shipment leaves the warehouse you can’t “cancel” it. Compensating action: send a return label, refund.
2. External service calls. You called a third-party API and they did something. Compensating: call them again to undo it (if they expose that).
3. Multi-step business processes. A multi-step booking flow, hotel plus flight plus car hire. If one fails, compensate the others.
In those three, a saga pattern earns its keep. Elsewhere, a simple event-driven approach is enough.
Monitoring is critical
Without compensating actions, visibility is the most important thing. How do you detect “event was published but the consumer didn’t process it”?
- Publish and consume metrics. Publish rate should equal consume rate. Divergence means lag or failure.
- Dead-letter queue monitoring. Failed events land in the DLQ. If the DLQ isn’t empty, investigate.
- Consistency reports. Periodically compare critical state across services. On a mismatch, alert.
- User-facing error rates. Track complaints like “I upgraded but my plan didn’t change”.
Without that monitoring, eventual consistency eventually becomes eventual inconsistency.
Takeaway
Saga is powerful but overkill for most scenarios. Event-driven compensating actions give you a simpler approach that covers 80% of cases. Idempotent consumers, the outbox pattern, and good monitoring are the three legs.
If you truly have a complex distributed workflow, move up to saga. For simple asynchronous flows, stay away from the unnecessary complexity.