Event-driven design is one of those patterns that looks obviously correct on a whiteboard and turns subtle the moment it ships. We use it constantly and we also pull it out of systems that adopted it for the wrong reasons. The trick is knowing which of your problems are actually event-shaped.
Where it genuinely pays off
Events shine when one thing happening should trigger several independent reactions that you do not want the originator to know about. An order is placed, and billing, inventory, fulfilment, and analytics all need to react. With EventBridge or an SNS-to-SQS fan-out, the order service emits one event and stays ignorant of who listens. You add a fifth consumer later without touching the producer at all.
- Fan-out: one fact, many independent reactions that evolve at different speeds.
- Buffering spikes: a queue in front of a slow consumer turns a flood into a steady drip.
- Audit and replay: an event log is a record of what happened, not just current state.
- Team boundaries: producers and consumers ship on their own schedules without coordination.
Where it bites
The pain starts when a single business operation is actually a chain of events across services, and someone needs to answer a plain question: did this order go through. Now the answer lives in five logs, correlated by an ID you hopefully remembered to propagate. Eventual consistency, which read so cleanly in the design doc, becomes a support ticket about a customer who sees their payment but not their confirmation.
Events are great at telling you that something happened and terrible at telling you whether the whole thing finished.
Then there is the reliability tax. Every consumer needs to be idempotent because at-least-once delivery means duplicates will arrive. Ordering is not guaranteed unless you pay for it. You need dead-letter queues, and you need someone to actually watch them, because a poisoned message that silently retries forever is a failure mode you only discover when the bill or the backlog explodes.
How we decide
We reach for events when the coupling we are removing is real and painful, and we keep a plain synchronous call when the flow is genuinely one step that must succeed or fail together. For multi-step business processes that need a definitive outcome, we use an explicit orchestrator like Step Functions rather than an implicit choreography of events nobody can trace.
- Choreography (events) for loose fan-out where consumers are truly independent.
- Orchestration (Step Functions) when one workflow must complete and you need to see its state.
- Make every consumer idempotent on day one, not after the first duplicate incident.
- Propagate a correlation ID through every event so tracing is possible at all.
Event-driven architecture is a sharp tool. Used on the right seams it removes coupling you will be glad to lose. Sprayed across a flow that was always one transaction, it just hides the logic and hands you a debugging problem you will pay for later.