API gateways in production: Kong vs Traefik vs rolling your own

Multiple backend services, clients hitting a single endpoint. Classic microservice problem. An API gateway abstracts that communication layer: authentication, rate limiting, routing, logging.

Three popular approaches: Kong (open source + enterprise), Traefik (modern reverse proxy), and a custom gateway you write yourself. This post walks through when each one is the right call.

What an API gateway actually does

Before a request reaches an internal service, the gateway handles:

Authentication: JWT validation, API key checks
Authorization: does this user have access to this endpoint?
Rate limiting: is the user over their budget?
Routing: which internal service gets this request?
Transformation: request/response shape changes
Logging: access logs, audit trail
Metrics: latency, error rate, throughput
Circuit breaking: when a downstream service fails, protect the caller

Instead of every internal service re-implementing these eight responsibilities, you centralize them at the gateway.

Kong

Mature (10+ years), built on Nginx, Lua plugin ecosystem.

Upsides:

Rich plugin ecosystem (100+ plugins). Auth, rate limiting, transforms, logging all off the shelf.
Battle-tested. Many large companies run it.
Kong OSS is free, Kong Enterprise (paid) adds dashboard, GUI, support.
Multiple auth mechanisms (JWT, OAuth, API key, LDAP).
Declarative config in YAML.
Admin API plus Konga GUI.

Downsides:

Setup is complex. A production cluster (Kong + Cassandra/PostgreSQL) is ops-heavy.
Resource hungry. Nginx + Lua VM on every request.
Enterprise features are paywalled.
Writing Lua plugins has a steep learning curve.

When Kong wins: enterprise environments, a mature microservice ecosystem, plugin-rich needs.

Traefik

Modern reverse proxy and API gateway written in Go. Cloud-native from day one.

Upsides:

Auto-discovery (Docker, Kubernetes, Consul). When a service registers, the gateway configures itself.
Let’s Encrypt integration built in. SSL certificate management is automated.
Dynamic config (K8s CRDs, Consul KV, Docker labels).
Lighter resource footprint than Kong.
Excellent K8s integration (Traefik is the default ingress controller in many distributions).

Downsides:

Plugin ecosystem is smaller than Kong’s. Custom middleware options are limited.
Fewer auth mechanisms (mostly JWT and basic auth; OAuth needs plugins).
Traefik Enterprise (v3+) gates some features behind a paywall.

When Traefik wins: Kubernetes-based infrastructure, modern cloud-native setups, simpler routing needs.

Custom gateway (you write it yourself)

You write the gateway logic in Node.js, Go, or Python.

Upsides:

Full control. Your own business logic.
No operational dependency on a third-party gateway.
Specific optimizations (custom caching, transforms, auth flows).
No licensing costs.
Easy integration with existing internal tooling.

Downsides:

Development time. 2 to 3 months to be production ready.
Maintenance burden. Edge cases, security updates.
Feature gap. Rate limiting, circuit breaker, JWT validation: you write all of it.
Battle-testing takes time.
You need real team expertise.

When custom wins: very specific business logic, small-scale project, team has the expertise, wants no third-party dependencies.

Pragmatic comparison

Sorted by real scenarios:

Scenario 1: 3 microservices, 5 developers

Simple routing plus JWT auth plus rate limiting. Kong or Traefik is overkill. A custom Node.js gateway is a one to two week job.

Scenario 2: 20+ microservices on Kubernetes

Traefik is strong here. It’s already the ingress controller. Service discovery is automatic.

Scenario 3: enterprise, multi-tenant, complex auth

Kong. The plugin ecosystem is complete: OAuth, LDAP, per-tenant rate limiting, audit logging. Enterprise dashboard included.

Scenario 4: high throughput (10K+ RPS)

Traefik or a custom Go gateway. Kong can feel heavy. Profile for your specific case.

Feature-by-feature comparison

Auth implementation

Whichever gateway I use, the auth layering looks the same:

Gateway level: JWT signature validation plus basic user ID extraction. Invalid JWT returns 401 and stops here.

Service level: authorization (can this user access this resource?). Business logic.

This split gives the gateway the mandatory security check and leaves the business rule check to the service. Complexity is divided correctly.

Rate limiting across services

Is the user’s rate limit per service, or total?

Total across all services: held at the gateway. A limit of “100 requests per minute” applies across every service.

Per service: the gateway keeps a separate counter per service. “1000/min to the product API, 10/min to payment.”

To apply both, use a gateway-level global limit with per-service overrides.

A Redis-based distributed counter works for either approach.

Circuit breaker strategy

When a downstream service is down, the gateway’s behavior options:

1. Fast fail: return 503 immediately. Instant feedback for the user. Retry once the service recovers.

2. Retry with backoff: retry for 1 to 2 seconds, then fail. Good for transient failures.

3. Cache last response: service is down, cached response is available. Return stale data.

4. Fallback service: route to a secondary service.

Declare which one to use per endpoint. Plugin in Kong, middleware in Traefik, middleware in custom.

Observability matters

The gateway sees every request. Observability here is pure gold:

Access log: every request logged. User, endpoint, latency, status code.

Metrics: request rate, error rate, p50/p95/p99 latency. Prometheus-compatible.

Distributed tracing: propagate the request ID to downstream services. A single user request traces across every service.

Error tracking: log 5xx responses in detail. Downstream service timeouts.

Kong + Datadog, Traefik + Prometheus, custom gateway + OpenTelemetry. The stack changes, the concept does not.

Migrating between gateways

Moving from one gateway to another is serious work:

Audit current config. Document every plugin and route in Kong.
Replicate on the target gateway. Port to Traefik. Fix config differences.
Run in parallel. Both running, traffic split. Compare behavior.
Gradual traffic shift. 5%, 10%, 25%, 50%, 100%. Slowly.
Decommission the old one. When migration is complete, turn it off.

A 2 to 4 month project. Plan for it.

Start small, evolve

What should a new project do?

Phase 1: direct service access. Each service has a public endpoint. Enough for very early stage.
Phase 2: basic reverse proxy (Nginx, or default Traefik). Single entry point.
Phase 3: API gateway (Traefik + middleware). Auth, rate limiting, logging.
Phase 4: full-featured gateway (Kong, custom). Plugin ecosystem, advanced features.

Each phase evolves over 6 to 12 months. Skipping straight to phase 4 on day one is premature optimization.

Bottom line

The gateway choice is driven by team size, infrastructure (do you have K8s?), and complexity requirements.

Small to medium scale: Traefik or a simple custom gateway. Quick setup, replace only when it breaks.
Enterprise / complex: Kong. The plugin ecosystem earns its keep.
K8s-native: Traefik. Service discovery is automatic, SSL is automated.
Custom business logic: roll your own. But the development investment is real.

Whatever you pick, set up observability (logs, metrics, tracing) from day one. If the gateway isn’t observable, the whole system is opaque.