Every SaaS eventually needs “async jobs” somewhere. Sending email, image processing, third-party API calls, scheduled tasks. Running them inside a synchronous request makes the user wait and sometimes causes timeouts.
The answer: a message queue. Async workers pull jobs off the queue and process them in the background. But which queue?
There are three popular options: Redis, RabbitMQ, and AWS SQS. Here are my notes from using each in production.
Redis queue (Redis List or Stream)
You can use Redis as a queue. Simple with the LIST data type, more sophisticated with Streams.
Usage:
# Producer
LPUSH email_queue "{user_id: 123, template: welcome}"
# Consumer
BRPOP email_queue 0 # blocks, waitsStrengths:
- Speed: microsecond latency. In-memory.
- Simple: LIST is enough for most cases.
- Existing infrastructure: you probably already run Redis for cache, so no new service.
- Persistence possible: AOF or RDB lets you make queues durable.
Weaknesses:
- No native retry logic. If a worker fails, the job is lost. You write the retry logic yourself.
- Dead-letter queue is manual. Moving failed messages to a separate queue is on you.
- Hard to distribute across many consumers. For that you need Redis Streams, which is more complex.
- No message priority. LIST is FIFO by default, priorities need custom logic.
- Persistence risk: on a Redis restart, anything not yet in the AOF is gone.
RabbitMQ
A dedicated message broker. AMQP protocol. 15+ years of maturity.
Usage:
# Producer
channel.basic_publish(exchange='', routing_key='email_queue', body=message)
# Consumer
channel.basic_consume(queue='email_queue', on_message_callback=callback)
channel.basic_ack(delivery_tag) # manual ackStrengths:
- Rich feature set: priority queues, delayed messages, dead-letter exchange, fanout/topic/direct exchange types.
- Acknowledgment mechanism: even if a worker crashes, the job isn’t lost, it gets redelivered.
- Clustering support: multi-node setup, high availability.
- Flexible routing: topic-based routing, fanout (broadcast).
- Good monitoring: built-in web UI. Queue depth, message rate, consumer status visible at a glance.
Weaknesses:
- Operational complexity: setup and maintenance are serious. Clustering is tricky. Partitioning edge cases.
- Memory hungry: big queues use a lot of RAM.
- Network overhead: AMQP is verbose, not as fast as Redis.
- Team expertise: tuning RabbitMQ needs someone who’s done it before.
AWS SQS
AWS’s managed queue service. Zero ops, pay-per-use.
Usage:
# Producer
sqs.send_message(QueueUrl=url, MessageBody=json.dumps(data))
# Consumer
response = sqs.receive_message(QueueUrl=url, MaxNumberOfMessages=10)
for msg in response['Messages']:
process(msg)
sqs.delete_message(QueueUrl=url, ReceiptHandle=msg['ReceiptHandle'])Strengths:
- Fully managed: zero ops. AWS deals with it.
- Effectively unlimited scale: SQS has no practical limit (rate limits are very high).
- Integrated with the AWS ecosystem: Lambda, SNS, Step Functions.
- Two types: Standard (at-least-once) and FIFO (exactly-once).
- Dead-letter queue native. Configure it once, failed messages move automatically.
- Cheap: first 1 million requests a month are free. Normal usage is $1 to $10 a month.
Weaknesses:
- Higher latency: network round trip to AWS. Usually 20 to 50ms.
- Vendor lock-in: tied to AWS. Migrating to another cloud is hard.
- Limited throughput per FIFO queue: 300 msg/sec. Standard is higher.
- No priority: no native priority on FIFO. You need multiple queues plus priority routing.
- Pull-based: SNS + SQS can push, but the default is pull. You have to set up a polling loop.
My decision matrix
Pick Redis:
- Queues are simple (email, notification send)
- Low latency is critical (real-time jobs)
- Low volume (under 1000 msg/day)
- Team has Redis expertise
- You already run Redis (for cache)
Pick RabbitMQ:
- You need the rich queue features (priority, delayed messages, exchange routing)
- On-premise or self-hosted, cloud-agnostic
- Advanced routing (fanout, topic-based)
- Team has RabbitMQ experience
- High throughput plus persistence is critical
Pick AWS SQS:
- You’re already on AWS
- You don’t want ops overhead
- Pay-per-use billing fits
- Vendor lock-in is acceptable (or planned)
- You might need effectively unlimited scale
Picks from actual projects
My calls on real projects:
Project 1 (small SaaS, 10K users):
– Async email, push notifications
– Redis already in place
– Low volume
– Pick: Redis LIST
Project 2 (large SaaS, 500K users):
– Multiple worker types (email, image processing, analytics)
– Priority queue required
– High availability critical
– Pick: RabbitMQ
Project 3 (AWS-hosted, serverless):
– Driving Lambda functions
– Unpredictable traffic
– No ops team
– Pick: AWS SQS
Each one has its right scenario.
Common patterns
Whichever queue you pick, some patterns are universal:
1. Idempotent job handlers. The same message processed twice has to produce the same result. Critical for retries and duplicate delivery.
2. Exponential backoff retry. A failing job retries after 1s, 2s, 4s, and so on. Stops you from hammering the infrastructure.
3. Dead-letter queue. After N retries, failed jobs move to a separate queue. Manual review from there.
4. Message TTL. Old messages should expire. A message still unprocessed after 7 days is almost certainly irrelevant now.
5. Monitoring and alerting. Queue depth, consumer lag, error rate. I don’t ship a queue to production without these metrics.
Throughput vs latency
Which one should you optimise for?
Low latency (real-time): Redis. Microseconds. Things like email confirmation.
High throughput: RabbitMQ or SQS. 10K+ messages per second.
Balanced: SQS. Good enough latency and scale for most cases.
If you need milliseconds, Redis. If seconds are tolerable, SQS. If you need fancy routing, RabbitMQ.
Cost comparison
A 5M messages/month scenario:
- Redis: self-hosted. AWS ElastiCache t3.medium roughly $40/month. But if you already use it for cache, it’s free at the margin.
- RabbitMQ: AWS MQ roughly $100/month. Self-hosted on EC2 is a bit cheaper but has ops overhead.
- SQS: roughly $2/month (first 1M free, then $0.40/1M).
SQS is the cheapest volume-based option. At small scale it’s practically free.
Migrating between them
You start a project on one queue. As it grows you sometimes have to switch.
Good news: with an abstraction layer, migration is easy. Decouple your job publisher/consumer interface from the backend:
class JobQueue:
def publish(self, job_type, payload): ...
def consume(self, job_type, handler): ...Redis, RabbitMQ, or SQS underneath, the worker code doesn’t know which.
My pattern: start with the interface plus a Redis implementation. Add a RabbitMQ or SQS implementation when I need it. Switch by config.
Takeaway
Choosing a message queue is a balance of scaling, operational complexity, and feature set. Redis is simple and fast, RabbitMQ is feature-rich, SQS is managed and AWS-native.
Small projects are fine on Redis. As volume grows, switching to managed (SQS) or feature-rich (RabbitMQ) makes sense. Make the call based on concrete needs today, not on “how big I think this will get”. Make migration easier with an abstraction layer.