Audit log architecture: building an immutable trail that holds up

An audit log answers the questions of who, when, and what changed. It’s mandatory in finance, healthcare, security, and any domain that has compliance obligations. Even where it isn’t mandatory, it’s what answers “why did this record change in production?”

I’ve designed audit log architectures on two projects: one for fintech compliance, one for tracking admin actions in a SaaS. The core architecture is the same, the details differ. Practical notes below.

What an audit log is, and isn’t

Application logs and audit logs are not the same thing:

Application log: debug, info, warn, error. For the developer. Usually mutable (rotated, archived). Structured, but the priority is throughput, not completeness.

Audit log: the record of business events. “User X cancelled order Y at 14:32.” Immutable. Legal-grade.

Using your application log as an audit log is wrong. Rotation drops data, retention is wrong, you can’t query it.

Core pattern: append-only

An audit log must be append-only. Inserts only, no updates or deletes. The moment a record can be changed, the trust is gone.

CREATE TABLE audit_log (
    id BIGSERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    actor_id UUID NOT NULL,
    actor_type VARCHAR(50) NOT NULL,  -- 'user', 'system', 'api_key'
    action VARCHAR(100) NOT NULL,
    resource_type VARCHAR(50),
    resource_id VARCHAR(100),
    before_state JSONB,
    after_state JSONB,
    metadata JSONB,
    ip_address INET,
    user_agent TEXT
);

-- No UPDATE, no DELETE permission
REVOKE UPDATE, DELETE ON audit_log FROM application_user;

The DB user only has insert permission. No update or delete. That alone prevents application bugs from corrupting the log.

Tamper detection

A DB admin can still delete, a backup restore can rewind the log. For full assurance, use a hash chain:

def compute_hash(entry, previous_hash):
    data = f"{entry.timestamp}|{entry.actor_id}|{entry.action}|{entry.resource_id}|{previous_hash}"
    return hashlib.sha256(data.encode()).hexdigest()

Each new record contains the hash of the previous one. If the chain breaks (a record was deleted or changed), hashes stop matching.

Blockchain-like, but in a DB table. A periodic tamper-check job validates the chain.

For the most sensitive use cases, also push the hash to an external system (AWS QLDB, WORM storage, an immutable external log service).

What to log

Everyone’s list is a bit different. Minimum set:

Authentication events (login, logout, 2FA, password change)
Authorization changes (role grant/revoke, permission change)
Resource access (sensitive data read)
Resource modification (CRUD on key entities)
Configuration changes (feature flag toggle, API key generation)
Admin actions (impersonation, bulk operations)
Export and download actions (GDPR data access)

What not to log:
– Plaintext passwords
– PII beyond the minimum (full credit card numbers, full SSNs)
– Session tokens
– Sensitive response bodies

Tokenise PII. user_email becomes user_id. If the audit log is compromised, PII doesn’t leak.

Write path: async but safe

Writing the audit log synchronously inflates request latency. Writing it async risks loss.

Hybrid:

Push to an in-memory queue (sync, fast)
A background worker batch-inserts to the DB (async)
Queue overflow: block the write or fail open

def log_event(event):
    try:
        audit_queue.put_nowait(event)
    except queue.Full:
        # Critical path: if the audit log is essential, fall back to sync
        audit_db.insert(event)
        alert.send("Audit queue overflow")

On critical systems, block the action if the audit write fails. Banking can’t accept “transfer completed but the audit didn’t write”.

Search performance

Audit logs grow. Hitting 100M+ rows in a year is easy. Search gets slow.

Index strategy:

CREATE INDEX idx_audit_actor ON audit_log(actor_id, timestamp DESC);
CREATE INDEX idx_audit_resource ON audit_log(resource_type, resource_id, timestamp DESC);
CREATE INDEX idx_audit_action_time ON audit_log(action, timestamp DESC);
CREATE INDEX idx_audit_time_brin ON audit_log USING BRIN(timestamp);

BRIN indexes are ideal for time-sorted large tables. 10x smaller than B-tree and fast for sequential access.

JSONB fields with GIN:

CREATE INDEX idx_audit_metadata ON audit_log USING GIN(metadata);

Query: “every action user X did in the last 30 days”:

SELECT * FROM audit_log 
WHERE actor_id = 'xxx' 
  AND timestamp > NOW() - INTERVAL '30 days'
ORDER BY timestamp DESC;

With the right indexes, that’s milliseconds.

Partitioning: 1M+ rows

Once the audit log is over 10M rows, partition. PostgreSQL native partitioning:

CREATE TABLE audit_log (
    ...
) PARTITION BY RANGE (timestamp);

CREATE TABLE audit_log_2026_01 PARTITION OF audit_log
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE audit_log_2026_02 PARTITION OF audit_log
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

One partition per month. Old partitions migrate to cold storage, recent ones keep writes fast.

The pg_partman extension automates partition management.

Retention and archive

How long do you keep it? Regulation dependent:

GDPR: personal data, minimise
HIPAA: 6 years for healthcare records
PCI-DSS: 1 year (payment-related)
SOX: 7 years (financial-related)

Hot storage (active in DB): last 90 days
Warm storage (S3 Glacier or cold DB): two years
Cold storage (tape, archive): 7 to 10 years

Archiving pipeline:

Monthly partitions live in hot storage
After 90 days, export to warm (Parquet files in S3)
After two years, move to Glacier
After 10 years, delete (regulation-dependent)

Admin UI: an unused audit log is a waste

An audit log nobody reads is waste. You need an admin UI:

Timeline view: history for a user or resource
Filters: action type, actor, date range
CSV export
Diff view: before vs after state side by side
Alerting: suspicious patterns (multiple failed logins, bulk sensitive data export)

Without a built-in log viewer, the system isn’t really audited. Admins won’t grep through 500K rows.

Forensic patterns

Incident investigation:

“Account compromised”: login history, IP changes, unusual actions for the user.

“Data leak”: who exported, who did bulk reads in the last 90 days.

“Settings changed maliciously”: configuration change history, who did it.

“Shadow IT”: writes from accounts outside the system accounts.

Have query templates ready. The first 10 minutes of a crisis shouldn’t be “where do I look?”

A note on immutability

Audit logs are immutable, but users have the right to erasure (GDPR). How do you reconcile the two?

Answer: don’t store PII directly in the audit log. Store reference IDs. When the user account is deleted, the reference still works but the PII connections go away.

before_state: {"user_id": "12345", "action": "role_changed"}
-- user_id 12345 was later anonymised to 'deleted-user-12345'

The audit trail is intact, the PII is gone.

Cost optimisation

Audit logs take up serious space. Cost optimisation:

Use compressed text instead of JSONB if you never search inside it
Don’t deduplicate action entries (you’d break the audit trail, but at the application layer, identical request IDs can be deduped)
Regular partitioning and archiving
Cold storage is cheap (S3 Glacier)

On one fintech the audit log hot storage was 50 GB, annual archive 800 GB, total cost about $40/month. Regulation made it mandatory, so the cost is fine.

Closing thought

The audit log is the feature that saves the day. Until the first incident, you wonder why you bothered. After the first incident, you’re glad you did.

Minimum viable audit log:
– Append-only table
– Hash chain tamper detection
– Actor plus resource plus timestamp plus before/after JSONB
– Admin viewer UI
– Retention policy

Shipping a complex system to production without these five is a risk. Even when compliance doesn’t demand it, you’ll value it during an incident.