Token Maxxing Is Making Engineering Teams Slower

[data-reveal],[data-reveal-children]>*{opacity:1!important;transform:none!important}[data-reveal] .rpt-grow{width:var(--w)!important}[data-reveal] .rpt-wipe{clip-path:none!important}[data-reveal] .rpt-draw{stroke-dashoffset:0!important}[data-reveal] .rpt-late{opacity:var(--late-o,1)!important}

PRs analyzed

1M+

Across 2,444 engineering organizations

Reactive work, median org

44%

Bugs + maintenance, nearly half of all capacity

Reverts vs PR growth

3.7×vs2.6×

Failures compounding faster than output

$0.18reaches users

For every dollar a team spends on AI coding tools, just 18 cents becomes shipped product. The other 82 cents is consumed by the maintenance cycle those same tools accelerate, not because engineers are slow, but because there is no closed loop between production reality and the code being written.

$0.44

$0.27

$0.11

$0.18

Reactive engineering$0.44Code rework$0.27Review friction$0.11Shipped product$0.18

Proportional allocation of AI engineering spend, platform average. The green band is the only part that reaches users.

Reactive · P75 / P90$0.62 / $0.76

Shipped · P75 / P90$0.10 / $0.06

Half of all engineering output is reactive.

At the median organization, 44% of every PR is reactive: fixing existing code or keeping systems running. The distribution has a long tail: at the 90th percentile, more than three-quarters of all engineering effort produces no net-new product.

Share of engineering output classified as reactive (bugs + maintenance), by organization percentile. The shaded band is the interquartile range. More AI spend accelerates volume on both sides, features and maintenance alike, so the tail only thickens.

1 in 4

lines written each week is thrown away before the week closes. Not planned refactoring, but code that did not survive the sprint it was written in. The AI generates from local context, never from production reality: which patterns failed, which edge cases were already tried and reverted.

Weekly code churn by percentile

Industry 27%

Median

25%

P75

38%

P90

55%

Lines written and discarded within the same week. The median sits right at the Pluralsight/GitPrime industry benchmark; the P90 runs at twice it.

Reverts are outpacing output.

Over twelve weeks, weekly PR volume grew 2.6×, but reverted PRs grew 3.7×. The failure rate is climbing faster than the work itself. Each revert spawns a bug-fix PR, which feeds the reactive total. The 44% becomes 50%, then 56%.

Merged PRs · indexed (100 = 2,525/wk)

Reverted PRs · indexed (100 = 10/wk)

Feb 16 - May 4, 2026, platform-wide. No pre-AI baseline exists. The argument rests on the rate difference: if quality scaled with velocity, revert growth would track PR growth. It ran 40% faster.

The compounding loop · larger each turn

More AI spendMore volumeReview can't scaleShips unreviewedReactive work grows↻ back to the start

Seven findings from billions of AI tokens.

Across 1M+ pull requests · 2,444 organizations · May 2026

44%

Median org reactive-work share: bugs + maintenance consuming nearly half of all engineering capacity

25%

Weekly code churn: one in four lines written is discarded before the sprint closes

21.6%

Comments addressed: 4 in 5 review comments are never acted on, across 225,000+ records

48.5%

PRs rubber-stamped in under 60 minutes: 10,588 with zero reviewer comments

3.7×

Reverted PRs grew 3.7× while PR volume grew 2.6×: failures compounding faster than output

11.6d

Average bug lifetime in production before being caught and reverted

11,033

High-risk flagged PRs approved and merged anyway: 52.7% of all automated risk flags were ignored

Half approved in under an hour. Four in five comments never acted on.

Review hasn't scaled with AI output. 48.5% of PRs are approved in under 60 minutes, faster than any meaningful review. And bug & error comments, the highest-value category, are addressed just 26% of the time.

Avg comments / PR20.8

Bot-generated80%

Comments addressed21.6%

Comment types: share of 225k comments, and how often each is acted on

ShareActed on

Bug / error

32.0%26%

Other

20.2%18%

Security

10.9%21%

Testing

8.5%21%

Performance

8.3%21%

Code suggestion

7.3%16%

Style / nit

5.5%20%

Documentation

4.3%21%

Refactor / design

3.0%21%

Review turnaround time · 1M+ PRs

Under 1 hr

48.5%

1-4 hours

15.4%

4-24 hours

17.2%

Over 24 hrs

18.9%

759 of every 1,000 issues are critical or high.

Across organizations with production error tracking connected, three in four issues are serious enough to cause direct user impact. Critical issues fire 3.3 times on average before anyone catches them: failures that already reached users, repeatedly, before being logged.

132

627

189

Criticalfires 3.3×

service-breaking, data loss, security

Highfires 1.3×

significant functional failure

Mediumfires 0.6×

non-blocking, limited blast radius

Lowfires 0.2×

cosmetic, no direct user harm

Per 1,000 production issues · 1,141 issues · 1,543 PR match events. “Fires” = average times an issue recurs before it is caught.

Top recurring error classes: flagged in review, what happened next

Fixed

Merged anyway

missing null check

unvalidated input to external api

unvalidated external input

unhandled timeout

timeout no retry breaker

unhandled activity failure

unhandled async operation

validation after mutation

1,543 match events · 13 organizations. Merged anyway = flagged by Entelligence, then approved and merged without being fixed.

Close the loop. Every fix compounds.

Today's tools see half the picture: reviewers see PRs but never production; SRE agents see outages but never the PR that caused them. Entelligence sees both, building a living memory of your org's failures so every review gets smarter than the last.

Review Agentreviews every PR against real incident history, citing the past failure your diff resembles

Instrument Agentadds the right logs, traces and alerts at merge time, so a break explains itself

RCA Agentroot-causes production failures and drafts the fix PR

Monitor Agentwatches logs after deploy and confirms the fix held

The closed loop · knowledge compounds, not debt

Reactive work

44% reactive

→

Measurable reduction

Review quality

21.6% addressed

→

Production-grounded

Bug lifetime

11.6-day avg

→

Caught at PR merge