Entelligence
Platform Intelligence · May 2026
Entelligence Research · May 2026

Entelligence Research · May 2026

Token Maxxing Is Making Engineering Teams Slower

An analysis of 1M+ pull requests across 2,444 engineering organizations. More AI spend. More code volume. More production failures. We measured where AI engineering effort actually goes, how code review has responded to 2.6× volume growth, and why the reactive work treadmill keeps accelerating. The findings: $0.82 of every AI dollar is consumed before a single feature reaches users.

PRs analyzed
1M+
2,444 engineering organizations
Platform avg reactive work
44%
Bugs + maintenance, median
Revert rate vs PR growth
3.7× vs 2.6×
Failures growing faster than output
01 / The Dollar Breakdown
$0.18 Shipped. $0.82 Consumed Before It Gets There.
Proportional allocation of AI engineering spend across work type categories · platform average
$0.44
$0.27
$0.11
$0.18
Reactive Engineering — $0.44
Code Rework — $0.27
Review Friction — $0.11
Shipped Product — $0.18
Bugs
Patches and fixes to existing code. No net-new product value delivered.
KTLO
Keep the Lights On — infrastructure, dependency upgrades, config maintenance.
Features
New product capabilities shipped to users.
Innovation
Architectural changes, R&D, significant new patterns.

At the current trajectory, an engineering team spending $100,000/year on AI coding tools generates roughly $18,000 of shipped product value. The remaining $82,000 is consumed by the maintenance cycle those same tools are helping to accelerate. This is not because engineers are inefficient or the AI tools are bad. It is because there is no closed loop between production reality and the code being written.

Per-dollar allocation · median and percentile breakdown
CategoryPlatform avgP75P90What it measures
Reactive Engineering$0.44$0.62$0.76Bug fixes + maintenance PRs
Code Rework$0.27$0.38$0.55Code written and discarded within the week
Review Friction$0.11Overhead from review that doesn't catch anything
Shipped Product$0.18$0.10$0.06Net-new value that reaches users
02 / The Reactive Tax
44% at Median. 76% at the 90th Percentile.
Share of engineering output classified as reactive work (bugs + maintenance) by organization percentile

Nearly half of all engineering output on the platform is classified as reactive — fixing existing code or keeping existing systems running. At the median organization, 44% of every PR is reactive. At the 75th percentile it is 62%. At the 90th percentile, more than three-quarters of all engineering effort goes toward work that produces no net-new product. These are organizations that, for every feature built, are also burning three-quarters of their capacity on maintenance. More AI spend accelerates the volume on both sides, not just the features.

Median reactive
44%
Bugs + maintenance share · P50
75th percentile
62%
Top quartile of reactive orgs
90th percentile
76%
Highest-reactive organizations
Reactive work — organization distribution
P1020%P2532%Median44%P7562%P9076%
Proactive vs reactive split · platform avg
Reactive (bugs + maintenance)43.9%
Proactive (features + innovation)24.7%
Unclassified31.4%

At the 90th percentile, organizations spend 4.2× more on reactive work than on building product. These organizations are not outliers — they represent the ceiling of what happens when AI volume grows without a quality feedback loop.

03 / Code Rework
1 in 4 Lines Written Each Week Is Thrown Away.
Weekly code churn — lines written and discarded within the same week, by percentile

At the median, 25% of code written in any given week is overwritten or deleted before that week closes. This is not planned refactoring or technical debt cleanup — it is code that did not survive the sprint it was written in. For teams heavily using AI coding assistants, this reflects a structural gap: the AI generates code from local context (the file, the prompt, the immediate task) but not from production reality — which patterns have failed, which edge cases have already been tried and reverted, what the actual requirement turned out to be. At the 90th percentile, more than half of all code written each week is discarded.

Median weekly churn
25%
Lines discarded within the week
75th percentile
38%
Top-quartile churn orgs
90th percentile
55%
Majority of code discarded
Industry benchmark
27%
Pluralsight / GitPrime avg
Weekly code churn — organization distribution
P2515%Median25%P7538%P9055%

Industry benchmark at 27% (Pluralsight/GitPrime). Median matches; P90 is 2× the benchmark.

04 / Output vs Failure Rate
PR Volume Grew 2.6×. Reverted PRs Grew 3.7×.
12 weeks of merged PRs vs reverted PRs · Feb 16 – May 4, 2026 · platform-wide

Between February 16 and May 4, weekly PR volume on the platform grew from 2,525 to 6,654 — a 2.6× increase. Over the same period, reverted pull requests grew from 10 to a peak of 37 per week — a 3.7× increase. The failure rate is growing faster than output. Each revert triggers a bug-fix PR. Each bug-fix PR adds to the reactive work total. The 44% becomes 50%, then 56%. This is the compounding structure of the token maxxing trap.

Methodology noteNo pre-AI baseline exists for these organizations — they were already using AI tools when they connected. This is not a before/after comparison. The structural argument rests on the rate difference: if AI tools delivered quality proportional to velocity, revert growth would track PR growth. It grew 40% faster.
PR volume growth
2.6×
2,525 → 6,654 per week
Revert growth
3.7×
10 → 37 reverted PRs/week
Avg merge → revert
11.6days
P50: 5.2d · P75: 14d · P90: 25d
Emergency reverts (<4 hrs)
298
Out of 589 total all-time reverts
Merged PRs (indexed · 100 = 2,525/week)
Reverted PRs (indexed · 100 = 10/week)
1002003004002.6×Peak 3.7×3.2×Feb 16Mar 9Mar 30Apr 20May 4INDEXED TO WEEK 1 = 100
The compounding loop
AI spend
tools + tokens
Volume growth
2.6× in 12 weeks
Review doesn't scale
48.5% rubber-stamped
Code ships unreviewed
80% comments unacted
Reactive work grows
44% → 50% → 56%
More AI spend
to compensate
PR volume grew 2.6×; reverted PRs grew 3.7×. If AI tools improved quality proportionally to velocity, the revert rate would grow slower than volume. It grew faster. The loop closes back to the start — each new AI spend cycle compounds the reactive debt.
Research Summary
7 Findings from Billions of AI Tokens
Across 1M+ pull requests · 2,444 engineering organizations · May 2026
01
44%
Median org: reactive work share — bugs + maintenance consuming nearly half of all engineering capacity
02
25%
Weekly code churn — one in four lines written is discarded before the sprint closes
03
21.6%
Comments addressed — 4 in 5 review comments are never acted on, across 225,000+ comment records
04
48.5%
PRs rubber-stamped in under 60 minutes — 10,588 with zero reviewer comments
05
3.7×
Reverted PRs grew 3.7× while PR volume grew 2.6× — failures compounding faster than output
06
11.6d
Average bug lifetime in production before being caught and reverted
07
11,033
High-risk flagged PRs approved and merged anyway — 52.7% of all automated risk flags were ignored
05 / Code Review
Half Approved in Under an Hour. 4 in 5 Comments Never Acted On.
Review comment patterns and turnaround time · 1M+ pull requests

Code review has not scaled with AI output volume. 48.5% of all PRs are approved in under 60 minutes — faster than any meaningful review could take place. Across the PRs reviewed by the Entelligence platform, comment-level data surfaces the structural breakdown: 80% of review comments are bot-generated, and only 21.6% of all comments are ever acted on. Bug and error comments — the highest-value category at 32% of all comments — are addressed at only 26%. The constraint is not effort or tooling. It is that review happens without production context.

Comment volume & address rate · by source
MetricValueNote
Avg comments per PR20.8total
— Bot comments16.780.2% of total
— Human comments4.119.8% of total
Comments addressed21.6%platform avg
— Bot comments addressed23.3%
— Human comments addressed15.0%
Avg addressed rate · per reviewer16%781 reviewers · range 0–100%
Comment types · share and address rate
225kCOMMENTS
Type
Share
Addr.
Bug / error
32.0%
26.3%
Other
20.2%
18.0%
Security
10.9%
20.9%
Testing
8.5%
20.9%
Performance
8.3%
21.1%
Code suggestion
7.3%
16.1%
Style / nit
5.5%
19.6%
Documentation
4.3%
21.2%
Refactor / design
3.0%
20.5%
Review turnaround time distribution · 1M+ PRs
Under 1 hr
48.5%
1–4 hours
15.4%
4–24 hours
17.2%
Over 24 hrs
18.9%
06 / Production Error Patterns
759 of Every 1,000 Issues Are Critical or High Severity.
Production error tracking across connected organizations · 1,141 issues · 1,543 PR match events

Across organizations with production error tracking connected, the issue severity distribution is not what most engineering leaders expect. 132 of every 1,000 tracked issues are Critical — service-breaking, data-corrupting, or security-exposing failures. A further 627 per 1,000 are High severity. Together, 3 in 4 production issues are serious enough to cause direct user impact. Critical issues fire 3.3 times on average before anyone catches them. These are not edge cases surfaced by careful monitoring — they are failures that have already reached users multiple times before being identified and logged.

Severity distribution · per 1,000 production issues
132
627
189
51
Critical
High
Medium
Low
What each severity level means · fires per issue · ratio to low
LevelWhat it meansPer 1,000 issuesAvg firesRatio to low
CRITICALService-breaking failures, data corruption, security exposures — direct, immediate user impact1323.3×2.6×
HIGHSignificant functional failures, performance degradation, or data inconsistency under normal operation6271.3×6.5×
MEDIUMNon-blocking bugs, degraded experiences, or edge-case failures with limited blast radius1890.6×3.7×
LOWMinor issues, cosmetic bugs, or non-impacting edge cases with no direct user harm510.2×
Top recurring error classes · flagged in PR review · what happened next
Error class
Fixed
Merged anyway
Flagged
Merged
missing null check
85
73
unvalidated input to external api
54
41
unvalidated external input
51
46
unhandled timeout
42
39
timeout no retry breaker
35
28
unhandled activity failure
30
20
unhandled async operation
26
13
validation after mutation
26
12

1,543 match events · 13 organizations with production error tracking connected. “Merged anyway” = flagged by Entelligence, approved and merged without being fixed.

07 / The Closed-Loop Alternative
Every Fix Should Make the Next One Less Likely.
The token maxxing problem is a context problem, not a volume problem

Engineering teams burn 44% of AI spend on bug fixes. Code reviewers don't learn from production. SRE agents remediate but can't prevent — neither delivers what engineering leaders actually need: reliability that compounds. Entelligence closes the full loop with a production intelligence world model — unifying code, incidents, observability, and customer signal into one living context graph so every fix compounds. The $0.44 shrinks. The $0.18 grows.

Reactive work
44% reactive
Measurable reduction
Review quality
21.6% addressed
Production-grounded
Bug lifetime
11.6-day avg
Caught at PR merge