Multiple backend services, clients hitting a single endpoint. Classic microservice problem. An API gateway abstracts that communication layer: authentication, rate limiting, routing, logging.
Three popular approaches: Kong (open source + enterprise), Traefik (modern reverse proxy), and a custom gateway you write yourself. This post walks through when each one is the right call.
What an API gateway actually does
Before a request reaches an internal service, the gateway handles:
- Authentication: JWT validation, API key checks
- Authorization: does this user have access to this endpoint?
- Rate limiting: is the user over their budget?
- Routing: which internal service gets this request?
- Transformation: request/response shape changes
- Logging: access logs, audit trail
- Metrics: latency, error rate, throughput
- Circuit breaking: when a downstream service fails, protect the caller
Instead of every internal service re-implementing these eight responsibilities, you centralize them at the gateway.
Kong
Mature (10+ years), built on Nginx, Lua plugin ecosystem.
Upsides:
- Rich plugin ecosystem (100+ plugins). Auth, rate limiting, transforms, logging all off the shelf.
- Battle-tested. Many large companies run it.
- Kong OSS is free, Kong Enterprise (paid) adds dashboard, GUI, support.
- Multiple auth mechanisms (JWT, OAuth, API key, LDAP).
- Declarative config in YAML.
- Admin API plus Konga GUI.
Downsides:
- Setup is complex. A production cluster (Kong + Cassandra/PostgreSQL) is ops-heavy.
- Resource hungry. Nginx + Lua VM on every request.
- Enterprise features are paywalled.
- Writing Lua plugins has a steep learning curve.
When Kong wins: enterprise environments, a mature microservice ecosystem, plugin-rich needs.
Traefik
Modern reverse proxy and API gateway written in Go. Cloud-native from day one.
Upsides:
- Auto-discovery (Docker, Kubernetes, Consul). When a service registers, the gateway configures itself.
- Let’s Encrypt integration built in. SSL certificate management is automated.
- Dynamic config (K8s CRDs, Consul KV, Docker labels).
- Lighter resource footprint than Kong.
- Excellent K8s integration (Traefik is the default ingress controller in many distributions).
Downsides:
- Plugin ecosystem is smaller than Kong’s. Custom middleware options are limited.
- Fewer auth mechanisms (mostly JWT and basic auth; OAuth needs plugins).
- Traefik Enterprise (v3+) gates some features behind a paywall.
When Traefik wins: Kubernetes-based infrastructure, modern cloud-native setups, simpler routing needs.
Custom gateway (you write it yourself)
You write the gateway logic in Node.js, Go, or Python.
Upsides:
- Full control. Your own business logic.
- No operational dependency on a third-party gateway.
- Specific optimizations (custom caching, transforms, auth flows).
- No licensing costs.
- Easy integration with existing internal tooling.
Downsides:
- Development time. 2 to 3 months to be production ready.
- Maintenance burden. Edge cases, security updates.
- Feature gap. Rate limiting, circuit breaker, JWT validation: you write all of it.
- Battle-testing takes time.
- You need real team expertise.
When custom wins: very specific business logic, small-scale project, team has the expertise, wants no third-party dependencies.
Pragmatic comparison
Sorted by real scenarios:
Scenario 1: 3 microservices, 5 developers
Simple routing plus JWT auth plus rate limiting. Kong or Traefik is overkill. A custom Node.js gateway is a one to two week job.
Scenario 2: 20+ microservices on Kubernetes
Traefik is strong here. It’s already the ingress controller. Service discovery is automatic.
Scenario 3: enterprise, multi-tenant, complex auth
Kong. The plugin ecosystem is complete: OAuth, LDAP, per-tenant rate limiting, audit logging. Enterprise dashboard included.
Scenario 4: high throughput (10K+ RPS)
Traefik or a custom Go gateway. Kong can feel heavy. Profile for your specific case.
Feature-by-feature comparison
| Feature | Kong | Traefik | Custom |
|———|——|———|——–|
| JWT validation | plugin | built-in | roll your own |
| Rate limiting | plugin | middleware | Redis-based by hand |
| Circuit breaker | plugin | middleware | by hand |
| Service discovery | manual config | auto (K8s/Docker) | by hand |
| SSL cert auto | plugin | built-in (Let’s Encrypt) | Nginx reverse proxy |
| GUI dashboard | Kong Manager (paid) | Traefik Dashboard | build it yourself |
| Plugin ecosystem | 100+ | 30+ | 0 |
| Cloud-native | good | exceptional | your design |
Auth implementation
Whichever gateway I use, the auth layering looks the same:
Gateway level: JWT signature validation plus basic user ID extraction. Invalid JWT returns 401 and stops here.
Service level: authorization (can this user access this resource?). Business logic.
This split gives the gateway the mandatory security check and leaves the business rule check to the service. Complexity is divided correctly.
Rate limiting across services
Is the user’s rate limit per service, or total?
Total across all services: held at the gateway. A limit of “100 requests per minute” applies across every service.
Per service: the gateway keeps a separate counter per service. “1000/min to the product API, 10/min to payment.”
To apply both, use a gateway-level global limit with per-service overrides.
A Redis-based distributed counter works for either approach.
Circuit breaker strategy
When a downstream service is down, the gateway’s behavior options:
1. Fast fail: return 503 immediately. Instant feedback for the user. Retry once the service recovers.
2. Retry with backoff: retry for 1 to 2 seconds, then fail. Good for transient failures.
3. Cache last response: service is down, cached response is available. Return stale data.
4. Fallback service: route to a secondary service.
Declare which one to use per endpoint. Plugin in Kong, middleware in Traefik, middleware in custom.
Observability matters
The gateway sees every request. Observability here is pure gold:
Access log: every request logged. User, endpoint, latency, status code.
Metrics: request rate, error rate, p50/p95/p99 latency. Prometheus-compatible.
Distributed tracing: propagate the request ID to downstream services. A single user request traces across every service.
Error tracking: log 5xx responses in detail. Downstream service timeouts.
Kong + Datadog, Traefik + Prometheus, custom gateway + OpenTelemetry. The stack changes, the concept does not.
Migrating between gateways
Moving from one gateway to another is serious work:
- Audit current config. Document every plugin and route in Kong.
- Replicate on the target gateway. Port to Traefik. Fix config differences.
- Run in parallel. Both running, traffic split. Compare behavior.
- Gradual traffic shift. 5%, 10%, 25%, 50%, 100%. Slowly.
- Decommission the old one. When migration is complete, turn it off.
A 2 to 4 month project. Plan for it.
Start small, evolve
What should a new project do?
- Phase 1: direct service access. Each service has a public endpoint. Enough for very early stage.
- Phase 2: basic reverse proxy (Nginx, or default Traefik). Single entry point.
- Phase 3: API gateway (Traefik + middleware). Auth, rate limiting, logging.
- Phase 4: full-featured gateway (Kong, custom). Plugin ecosystem, advanced features.
Each phase evolves over 6 to 12 months. Skipping straight to phase 4 on day one is premature optimization.
Bottom line
The gateway choice is driven by team size, infrastructure (do you have K8s?), and complexity requirements.
- Small to medium scale: Traefik or a simple custom gateway. Quick setup, replace only when it breaks.
- Enterprise / complex: Kong. The plugin ecosystem earns its keep.
- K8s-native: Traefik. Service discovery is automatic, SSL is automated.
- Custom business logic: roll your own. But the development investment is real.
Whatever you pick, set up observability (logs, metrics, tracing) from day one. If the gateway isn’t observable, the whole system is opaque.