Story points are not hours: the estimation mistake most teams make

When story points were introduced in agile, they meant “size”. The size of a story. Not time. Most teams twisted it into “1 point equals 1 hour”, “3 points equals half a day”, “8 points equals a week”.

That mental model is wrong and harmful. From 10 years of watching agile teams, here’s the approach to estimation that actually works.

What a story point originally meant

Mike Cohn introduced it in “Agile Estimating and Planning” (2005). Conceptually:

Size: how big the work is
Complexity: how complex it is
Uncertainty: the unknowns

A story point is the combined estimate of those three. Not time.

Fibonacci scale: 1, 2, 3, 5, 8, 13, 21. Reflects uncertainty. Small differences (1 vs 2) are precise, big ones (13 vs 21) are approximate.

Why not time?

Time is hard to estimate:
– Developer experience varies
– You don’t know the blockers up front
– Context switching costs
– Unexpected complexity

Story points are relative size. “This task is 2x bigger than that one” is easier intuitively than “this will take three hours”.

Team velocity (points completed per sprint) calibrates to time on its own. “We do 30 points in two weeks, so we can plan 30 points for next sprint” is enough planning.

Common antipatterns

Antipattern 1: “1 point equals 4 hours”

Under management pressure developers start converting points into hours. Points lose their meaning.

Antipattern 2: Per-story hour estimation

In planning someone says “this one is three hours, this one is eight”. That triggers micromanagement.

Antipattern 3: Cross-team point comparison

Team A does 50 points a sprint, team B does 30, so “A is faster”. Points are relative. Every team has its own scale.

Antipattern 4: Estimating to impress

A developer lowballs the estimate (“I’m fast”), then doesn’t finish the sprint. Trust erodes.

Antipattern 5: No estimation at all

“Agile means no estimation.” Extreme stance. You need some estimation to plan. The point is not to obsess over it.

Planning Poker

A team estimation session. Each developer votes privately on points for a story, then everyone reveals at once.

Process:

The Product Owner presents the story
Team asks clarifying questions
Everyone picks a Fibonacci card (their estimate)
Simultaneous reveal
Lowest and highest estimates explain (“why 2? why 13?”)
After discussion, re-vote
Converge, accept

This process:
– Captures the team’s collective intelligence
– Surfaces hidden assumptions
– Balances junior and senior perspectives
– Builds consensus instead of individual ownership

Calibrating against reference stories

Story points are relative. You calibrate using a “reference story”.

Example:
– “Add a logout button” = 2 points (simple, understood)
– “Build user profile page” = 5 points (medium)
– “Migrate authentication to OAuth 2.0” = 13 points (complex, uncertain)

New stories get compared to the reference stories. “Slightly more complex than the profile page, so 8 points.”

In a new team the first 2 to 3 sprints establish the reference stories.

Velocity tracking

Team velocity is the number of story points completed per sprint.

Monitoring:
– Sprint 1: 28 points
– Sprint 2: 32 points
– Sprint 3: 30 points
– Sprint 4: 35 points
– Sprint 5: 29 points

Average around 31. Plan the next sprint for 30.

Note: velocity fluctuates. Holidays, illness, dependency blockers all move it. A 3 to 4 sprint running average is more reliable.

Estimation in new teams

In a new team the first few months of estimation are painful:

No reference stories
Team productivity is unknown
Complexity estimates are miscalibrated

Approach:

Throw arbitrary points at the first sprint. Completion is not required.
At the end, retrospective: “How hard was this story really? Was the point value right?”
After 3 to 4 sprints, reference stories are in place.
Velocity stabilises.
Six months in, estimation is disciplined.

You can’t rush this. Team dynamics take time.

T-shirt sizing as an alternative

Instead of story points, T-shirt sizing (XS, S, M, L, XL):

XS: trivial
S: small, well understood
M: medium, some complexity
L: large, significant work
XL: very large, should be broken down

Upside: no numerical false precision. No “is it a 5 or an 8?” debates.

Downside: velocity tracking is harder. You have to assign a numerical equivalent to S and M yourself.

Some teams prefer this. Senior developers tend to be more tolerant of the estimation process when it’s this loose.

The NoEstimates movement

Some teams reject estimation altogether. They use throughput tracking: just the count of completed stories.

Reasoning:
– Estimation is time-consuming
– Often inaccurate
– Creates false commitment

Prerequisite:
– Stories are roughly the same size (breakdown discipline)
– Stable team
– Flexible delivery expectations

NoEstimates is extreme. Most orgs can’t accept it (budget, roadmap, stakeholder expectations). A hybrid is possible: estimate major epics, day-to-day just track throughput.

Estimation vs commitment

Critical distinction:

Estimation: “I think this is around 5 points.”

Commitment: “I’ll deliver 30 points this sprint.”

An estimate has an uncertainty range (“5 points, could be 3 to 8”). A commitment is definite.

Management tends to mix them up. They treat estimates as commitments. Developers respond with defensive estimation (adding buffer). Trust erodes.

Better framing: “The team forecasts 30 points this sprint. Actual could land between 25 and 35. Risk factors: X, Y.”

Tools that help

Jira: velocity report, sprint planning, story points built in
Linear: modern alternative, estimation is simpler
Shortcut (Clubhouse): estimate-focused project management
Asana: basic estimation, good for smaller teams

Tools are opinionated. Pick the one that fits your team’s workflow.

Estimation hygiene

Do:
– Use story points for size, not time
– Calibrate with reference stories
– Estimate by team consensus (planning poker)
– Reflect on estimate vs actual in retros
– Watch velocity trends, investigate outliers

Don’t:
– Try to convert points into hours
– Compare velocity across teams
– Present estimates as rigid commitments
– Pressure a junior into a lower estimate
– Run a “the estimate was wrong” blame game after a sprint

Takeaway

Story points are size, not time. Set that mental model correctly and estimation becomes disciplined, team dynamics healthier, planning realistic.

Planning poker, reference stories, velocity tracking are the core tools. T-shirt sizing or NoEstimates are alternatives for different contexts.

Estimation is never perfect. “Good enough” is enough. Over-engineering the estimation process is itself waste.