CI/CD pipeline from 20 minutes to 4: the exact moves

The CI pipeline on the team I joined took 20 minutes. It ran twice per pull request (once on open, once before merge). Developers would escape to Slack saying “I’m waiting on the build”, and a big chunk of the day went idle. In two weeks we brought it down to 4 minutes without rewriting the pipeline, just by optimising the existing steps. Here’s the walk-through.

Starting point

Lint: 2 minutes
Unit test: 5 minutes
Integration test: 7 minutes
Build: 4 minutes
Deploy (staging): 2 minutes
Total: around 20 minutes

It was GitLab CI on a Kubernetes runner, with caching half configured. Every stage was spinning up a new container.

Step 1: Docker layer caching (win: 4 minutes)

The build stage ran npm install plus composer install on every run. That alone was 3 minutes. The Dockerfile was layered wrong: source code was copied before the lock files, so any tiny code change busted the cache.

I reordered the Dockerfile:

COPY package-lock.json composer.lock ./
RUN npm ci && composer install --no-dev
COPY . .
RUN npm run build

As long as lock files don’t change, dependency install comes from cache. We also wired the GitLab registry as the Docker cache backend. Build time dropped from 4 minutes to 30 seconds.

Step 2: Parallelise the tests (win: 6 minutes)

Unit tests (5 min) and integration tests (7 min) were running sequentially. We split them into parallel jobs. Then we sharded integration tests by topic into three parts and ran those in parallel too. The pipeline turned into a Y shape.

Unit went to 5 minutes, integration to 3 minutes (the longest shard). Together: from 12 down to 5.

Step 3: Incremental lint (win: 1.5 minutes)

Lint was running across the whole project every time, 2 minutes. ESLint plus PHPCS plus Prettier. Meanwhile the average PR touches 8 files. We pulled changed files with git diff --name-only origin/main...HEAD and ran lint only against those. 30 seconds.

Safety net: pre-commit hooks were already linting before commit, so CI was really just a check. Full-project lint moved to a nightly job.

Step 4: Database seed cache (win: 2 minutes)

Integration tests started by running migrations plus seed, 2 minutes. This never changed. We snapshotted MySQL and stored the snapshot as an image. Jobs restore from that image and are ready in 10 seconds.

Step 5: Beefier runner (win: 1 minute)

The runner was 2 vCPU / 4 GB. Build and test were CPU bound. We moved to 4 vCPU / 8 GB, a marginal cost, but build and test steps sped up by about 30%.

Step 6: Artifact caching (win: 30 seconds)

Node modules and composer packages were being extracted fresh for every job. We used GitLab cache to cache node_modules and vendor keyed off the lock file hash. Jobs restore from cache and run.

Step 7: Post-deploy check in parallel (win: 1 minute)

A smoke test ran after deploy. Deploy and smoke were sequential, smoke took 1 minute. We kicked off canary health checks without waiting for deploy to finish. The smoke test ran async in parallel, and by the time deploy was done the test had already passed.

Final state

Lint: 30 seconds
Unit test: 2 minutes
Integration test: 3 minutes (in parallel)
Build: 30 seconds
Deploy + smoke: 2 minutes (in parallel)
Total critical path: around 4 minutes

Pipeline minutes dropped, but the bigger win was developer flow. “Waiting on a build” time went down by about 70%.

Traps to watch for

When you parallelise tests, you have to hunt down shared state. Tests that write to the same database in parallel became flaky. Namespace it: give every worker its own schema.
Define cache keys carefully. With the wrong key the cache never invalidates, and you waste hours chasing a corrupt cache.
Incremental lint has to stay in sync with the main branch. Set the diff base correctly.
Upgrading the runner costs money, measure the return.
Version the snapshot images and regenerate when the schema changes.

Pipeline speed isn’t just CI spend, it’s developer morale. Every minute you cut feeds back into “let me finish this thing” motivation.