Multi-tenant data export and erasure: shipping GDPR without leaking tenants

GDPR gives users two rights that bite hard: right to access (give me a copy of my data) and right to erasure (delete my data). In a multi-tenant SaaS, implementing these features is messier than it looks on the regulation summary.

I’ve built export and erasure in two B2B SaaS projects. Here are the traps, the patterns, and the compliance notes I wish I’d had on day one.

The regulatory context

GDPR (EU): Article 15 (right to access), Article 17 (right to erasure), Article 20 (right to data portability). You have 30 days to respond.

Other jurisdictions: similar rights with their own retention windows. Some categories (tax records, audit logs) must be retained for years even after erasure.

Customers have to be able to export and delete their data. Without these features the legal exposure is real.

Why multi-tenant makes it harder

Single-tenant is easy: “give me everything for user_id X”. Join, export, done.

Multi-tenant gets complicated:

A user can belong to several tenants (personal plus company workspace)
Which tenant owns which piece of that user’s data?
Tenant admins can request an export for their tenant (all user data inside it)
If a tenant admin deletes the tenant, what about users who never consented?
Shared resources (user A’s comment on user B’s document)

These relationships make it non-trivial to define the scope of any given export or erasure.

Designing the export

First decision: who is requesting, and at what scope?

User-initiated: “download my data”. Scope is data belonging to that user.

Tenant admin-initiated: “download the entire tenant”. Scope is everything the tenant owns.

System backup: internal, for migrations, never user-facing.

Each path has its own policy.

Scope definition

What exactly is “user data”? You have to decide per entity:

User entity:
- profile (name, email, avatar): direct user data, EXPORT
- comments: written by the user, EXPORT
- documents authored: created by the user, EXPORT
- documents collaborated on: comments on another user's doc, EXPORT (their comments only)
- logs (login, action history): user activity, EXPORT
- system metadata (created_at, roles): EXPORT
- payment info: LIMITED (last 4 of CC OK, full PAN no)
- sessions: not useful to export, SKIP

Write this decision table for every entity. It’s the balance between compliance coverage and usefulness of the export.

Format: JSON plus file attachments

There is no standard format. GDPR says “machine-readable”, so most companies ship JSON with attached files:

export.zip
├── profile.json
├── comments.json
├── documents/
│   ├── doc-1.json
│   ├── doc-1-attachment-1.pdf
│   └── ...
├── activity-log.csv
└── README.txt (how to read this archive)

JSON as the primary format, CSV as an optional spreadsheet-friendly companion, binary files as attachments, and a README that explains the layout.

Export pipeline

User submits a request (UI or API)
Request lands in the queue
A worker queries all of the user’s data
JSON plus attachments get packaged into a ZIP
Upload to a private S3 bucket with a short-lived signed URL
Email the download link to the user
Link expires in 7 days (security)
Log the access when the user downloads

@queue.task
def export_user_data(user_id, request_id):
    data = gather_user_data(user_id)
    zip_path = create_export_zip(data)
    s3_url = upload_to_private_bucket(zip_path, expires=7*24*60*60)
    send_email(user_id, 'export_ready', {'url': s3_url})
    mark_request_completed(request_id)

Big exports (GB-scale) can take hours. Add progress tracking in the UI so users don’t think it’s stuck.

The shared-data problem

User A created a document in a workspace, user B commented on it. What should A’s export contain?

A’s document: included
B’s comment on A’s doc: included, but tagged “by user B” in metadata
B’s personal info: NOT included (reference by user ID only)

B’s identity stays hidden. If the export contains a reference ID, B can do their own lookup via their own export.

Erasure: the harder half

Delete is harder than export. When user A asks to be erased:

A’s profile: DELETE
A’s private documents: DELETE
A’s comments on shared docs: ANONYMIZE (leave them as “deleted user” so the doc stays coherent)
A’s edits on another user’s doc: historical record (audit log), ANONYMIZE
A’s payment transactions: retain (tax and anti-fraud: usually 7 years)
A’s support tickets: ANONYMIZE (keep the transcript, strip the PII)
A’s aggregate analytics: aggregates stay, individual tracks go

Hard-delete-everywhere conflicts with compliance in most jurisdictions. Anonymization is the standard pattern.

Anonymization pattern

def anonymize_user(user_id):
    user = User.get(user_id)
    user.email = f"deleted-{user_id}@anonymous.local"
    user.name = "Deleted User"
    user.avatar_url = None
    user.phone = None
    user.status = "deleted"
    user.deleted_at = now()
    user.save()
    
    # Related data
    Comment.where(user_id=user_id).update(display_name="Deleted User")
    Document.where(author_id=user_id).update(author_display="Deleted User")
    
    # Full delete for some
    Session.where(user_id=user_id).delete()
    NotificationPreference.where(user_id=user_id).delete()
    
    # Audit
    AuditLog.insert({
        'action': 'user_erased',
        'user_id': user_id,
        'reason': 'user_request',
        'timestamp': now()
    })

This pattern preserves referential integrity (the doc’s author is not null) while stripping the PII.

Retention policy: what you keep

GDPR does not force you to delete everything. Under “legitimate interest” you can keep:

Payment records (anti-fraud, tax)
Audit trail (compliance)
Aggregated analytics (de-personalized)

Here’s the response pattern I use for an erasure request:

Your data has been deleted. The following records are retained as required by law:
- Payment records (7 years, tax)
- Audit logs (10 years, regulation)
- Anonymized usage statistics

None of these can identify you.

Transparency plus compliance.

Deleting a tenant

A tenant admin says “delete this tenant”. But there are other users inside it.

Options:
1. On tenant deletion, also erase every user’s data (auto-erasure)
2. Tenant deletion is delayed 30 days, users get notified, anyone can export before the cutoff
3. Tenant soft delete, users can migrate out

I prefer option 2. Users get a warning, then decide for themselves.

Data portability: is the format actually portable?

GDPR article 20 talks about “data portability”. Data should be in a format that can move to another service.

In reality there is no standard format. Slack, Google Docs, and Notion all export in different shapes. Portability is a theoretical claim.

Common format options:
– JSON, generic
– CSV, spreadsheet
– ICS, calendar
– PDF, documents
– EML, email

Export in your own format with a README that explains it. Any other service will need manual mapping anyway.

Admin UI: managing requests

You need an admin panel to track data requests:

List: pending, in progress, completed, failed
Details: requester, requested_at, scope, status
Actions: approve (if manual review is needed), rerun (failed exports), mark complete
Audit: who did what, when

Compliance officers get a separate view: every request in the last 30 days, and whether the SLA (30-day response) is being met.

SLA tracking

GDPR says 30 days. You have to respond inside that window:

T+0: request received
T+1: acknowledgment email (“we received your request”)
T+1..T+25: processing
T+25: if not done, escalate (internal alert)
T+30: hard deadline

Alert rules trigger on day 25. That keeps the compliance risk under control.

Testing: dummy user scenarios

Erasure is hard to test. My approach:

Create a fake user in staging with realistic data
Run export
Verify export contents (all expected data there)
Run erasure
Check whether anything still references the user
Query: SELECT * FROM any_table WHERE references_user_id = X should return empty or anonymized rows

This suite runs on every feature deploy. The regression you want to catch: a new feature adds user data but forgets to wire into the erasure path.

Final take

Export and erasure are regulatory requirements, but they’re also a trust signal. A good implementation does six things:

Define scope entity by entity
Export as JSON plus attachments in a ZIP
Treat erasure as anonymization, not hard delete
Be transparent about retention policy
Track SLA in an admin UI
Make testing continuous

These six hit compliance, earn user trust, and minimize legal risk. Teams that cut corners here tend to discover the cost during an audit.