Feature flags (feature toggles) are one of the foundational tools of modern software development. They decouple deploy from feature release. Your code is live in production, but hidden from users. Flip the switch, and the feature is live.
A/B tests, canary deploys, gradual rollouts, kill switches: all of it is possible with feature flags. In this post I’ll walk through how to build your own flag system in four stages.
Stage 1: simple config flags
Starting point: plain boolean flags in a config file.
# config/features.yml
features:
new_checkout: false
dark_mode: true
experimental_search: falseUsage in code:
if config.features.new_checkout:
render_new_checkout()
else:
render_old_checkout()Pros:
– Super simple setup
– No external dependency
– Version controlled (config lives in Git)
Cons:
– Changing a flag requires a deploy
– No user-level targeting
– A/B testing impossible
– No real-time toggle
Fine for a small project. Once you grow, it’s not enough.
Stage 2: database-backed flags
Store flags in the database. Toggle them from an admin panel.
Schema:
CREATE TABLE feature_flags (
key VARCHAR(100) PRIMARY KEY,
enabled BOOLEAN DEFAULT FALSE,
description TEXT,
updated_at TIMESTAMP
);Code:
def is_enabled(flag_key: str) -> bool:
flag = db.query("SELECT enabled FROM feature_flags WHERE key = ?", flag_key)
return flag and flag.enabledAdmin panel: simple UI, one toggle per flag.
Pros:
– Real-time toggle (no deploy required)
– Admin panel control
– Audit log (who toggled what, when)
Cons:
– Each flag check is a database call (performance hit)
– Still no user-level targeting
– Still no A/B testing
Fix: add a caching layer. Cache flag state in Redis with a 30 second TTL.
Stage 3: user-targeted flags
Show a feature to some users, hide it from others. Beta testers, internal team, premium users.
Schema:
CREATE TABLE feature_flags (
key VARCHAR(100) PRIMARY KEY,
enabled BOOLEAN,
rollout_percentage INT, -- 0-100
user_groups JSON, -- ["beta", "internal"]
user_whitelist JSON -- specific user_ids
);Code:
def is_enabled(flag_key: str, user_id: str, user_groups: list) -> bool:
flag = get_flag(flag_key)
if not flag.enabled:
return False
# Whitelist check
if user_id in flag.user_whitelist:
return True
# Group check
if any(g in flag.user_groups for g in user_groups):
return True
# Rollout percentage (deterministic hash)
user_hash = int(md5(flag_key + user_id).hexdigest(), 16)
if (user_hash % 100) < flag.rollout_percentage:
return True
return FalseGradual rollout:
– Start: 1% rollout
– If no issues: 5%, 10%, 25%, 50%, 100%
– Issue detected: rollback instantly
Pros:
– User targeting
– Gradual rollout
– Kill switch
– Foundations for A/B testing
Cons:
– Complexity is higher
– You need analytics integration for A/B testing
Stage 4: full A/B testing
A/B testing means feature flags plus analytics plus statistical significance.
Enhanced flag:
CREATE TABLE experiments (
key VARCHAR(100) PRIMARY KEY,
variants JSON, -- [{"name": "control", "weight": 50}, {"name": "new_ui", "weight": 50}]
active BOOLEAN,
started_at TIMESTAMP
);
CREATE TABLE experiment_exposures (
experiment_key VARCHAR(100),
user_id VARCHAR(100),
variant VARCHAR(50),
exposed_at TIMESTAMP,
INDEX (experiment_key, user_id)
);Code:
def get_variant(experiment_key: str, user_id: str) -> str:
experiment = get_experiment(experiment_key)
# Existing assignment?
existing = get_exposure(experiment_key, user_id)
if existing:
return existing.variant
# New assignment (hash-based)
user_hash = int(md5(experiment_key + user_id).hexdigest(), 16) % 100
cumulative = 0
for variant in experiment.variants:
cumulative += variant.weight
if user_hash < cumulative:
record_exposure(experiment_key, user_id, variant.name)
return variant.nameAnalytics integration:
# Track experiment metric
analytics.track("purchase", {
"user_id": user_id,
"amount": 100,
"experiment_new_checkout": get_variant("new_checkout", user_id)
})Later, analyze:
– Variant A conversion rate: 5.2%
– Variant B conversion rate: 6.8%
– Statistical significance: p < 0.01
– Variant B wins, roll out fully
Third-party solutions
If you don’t want to build your own:
LaunchDarkly: enterprise-grade feature flag management. Expensive.
Unleash: open source, self-hosted.
Split.io: feature flags plus A/B testing. Mid-market.
Flagsmith: open source with managed options.
Firebase Remote Config: Google’s free option.
Trade-off: vendor dependency, pricing, customization.
Small project: Firebase Remote Config is enough. Mid-size: LaunchDarkly. Enterprise: self-hosted Unleash.
Flag hygiene
Feature flags have a lifecycle. Ignore it, and you accumulate flag debt.
Types:
– Release flags: for feature deployment. Short-lived (weeks).
– Experiment flags: for A/B testing. Medium-lived (months).
– Ops flags: kill switches, circuit breakers. Long-lived.
– Permission flags: premium features. Permanent.
Cleanup discipline:
– Release flag successful: merge the code, remove the flag within 2 weeks
– Experiment concluded: make the winning variant the default, remove the flag
– Ops flags: review quarterly
Flag debt: flags added six months ago are still in the code. Every flag is cognitive overhead.
Performance considerations
Flag checks happen on every request. Minimize overhead:
- In-memory cache. Keep flag state in app memory, refresh periodically.
- Batch fetching. Fetch all flags at user login, use them for the session.
- Edge-deployed flags. Flag evaluation at the CDN level.
Cache TTL is a balance: long TTL means stale flags, short TTL means DB pressure.
Testing with flags
Testing flag-gated code:
@pytest.fixture
def enable_flag(flag_key):
with mock.patch('features.is_enabled', return_value=True):
yield
def test_new_checkout_flow(enable_flag):
# Test with flag ON
...For every major flag, test both paths. “Flag on” and “flag off” scenarios.
Wrap-up
Feature flags reduce deploy risk. Gradual rollout, kill switch, A/B testing: they enable all of it.
Four-stage adoption: config, DB, user-targeted, full A/B. Each stage builds on the last. Start where your project’s size warrants.
Don’t forget flag hygiene. When flags aren’t deprecated, code complexity climbs. Cleanup discipline is mandatory.