Operator notes · The Subsidy Era · 2026

Same plan.
Same user.
100,000× the bill.

Welcome to AI subscriptions in 2026 — and the unwinding of the prices that made them feel unlimited.
A reading carousel · 14 slides · sources at end
@kamilkrauspe
Operator notes · The Subsidy Era · 2026

The agent ran for
38 minutes.
52.5 million tokens.

One documented Cursor session. The flat-plan economics that hid bills like this one are coming apart across the industry.
Cover variant B · single-fact hook
@kamilkrauspe
Operator notes · The Subsidy Era · 2026

Your $200 AI plan
is smaller than what
a heavy day costs
the vendor.

That gap was the business model. Five major vendors are now unwinding it inside a few weeks.
Cover variant C · structural hook
@kamilkrauspe
Reading the draft

Three carousel pools.
One selection deck.

Every body slide carries a tag — V3 (dense), ALT (sparse, single-fact), ALT-B (middle density). We will keep ~10, drop the rest.

14 candidate beats · ~30 slides total · pick at compression
@kamilkrauspe
V3 · beat 1
The April 2026 cluster · Dense timeline

The signals stacked up
in a few weeks.

Six pricing or capacity moves from five major vendors. Late March → late April 2026.

March
Google
Antigravity restructure
Quota → AI Credits. AI Pro weekly tokens fall ~97%.
April 2
OpenAI
Codex per-token shift
Codex bills paid plans by token, not by message.
April 9
Amazon
Jassy shareholder letter
“Capacity constraints that yield unserved demand.”
April 21
Anthropic
Claude Code paywall test
Briefly Max-only. Reversed within a day.
April 23
Anthropic
Statement to Fortune
“Compute is a constraint across the entire industry.”
April 27
GitHub
Copilot → AI Credits
All plans move to usage-based billing on June 1.
Five vendors. One pattern.
Calling it coincidence takes effort.
SourcesAnthropic blog & status; Fortune; GitHub blog; OpenAI help center; Google blog & developer forum; Amazon shareholder letter.
@kamilkrauspe
ALT-B · beat 1
The April 2026 cluster · Mid-density

The signals stacked up
in a few weeks.

Six pricing or capacity moves from five major vendors, late March through late April 2026.

March
Google
Antigravity restructure
AI Pro weekly tokens fall ~97%.
April 2
OpenAI
Codex per-token shift
Paid plans bill per token, not per message.
April 9
Amazon
Jassy shareholder letter
“Capacity constraints that yield unserved demand.”
April 21
Anthropic
Claude Code paywall test
Briefly Max-only. Reversed within a day.
April 23
Anthropic
Statement to Fortune
“Compute is a constraint across the entire industry.”
April 27
GitHub
Copilot → AI Credits
All plans usage-based on June 1.
SourcesAnthropic, GitHub, OpenAI, Google, Amazon — primary statements & filings, late March – April 2026.
@kamilkrauspe
ALT · CAND-04
The April 2026 cluster · Lean version

The past few weeks.

Six pricing or capacity moves. Five vendors.

March
Google restructures Antigravity to AI Credits.
April 2
OpenAI moves Codex billing from messages to tokens.
April 9
AWS shareholder letter: "capacity constraints that yield unserved demand."
April 21
Anthropic tests Claude Code as Max-only. Reverses within a day.
April 23
Anthropic to Fortune: "compute is a constraint across the entire industry."
April 27
GitHub: all Copilot plans move to AI Credits on June 1.

Five vendors moving in the same direction inside a single month.

SourcesVendor primaries · April 2026.
@kamilkrauspe
V3 · beat 2
Variance per session

A flat plan works when the heaviest user can't bend it far.

Per-session token consumption across six common workloads — same nominal user, 2026 Q1 evidence.

Inline autocompleteone keystroke
~100<$0.01
Single chat promptno project context
~1K~$0.01
Targeted file editshort multi-step
~10K~$0.10
Interactive coding30–90 min session
100K–1M$1–$10
Long autonomous run1–4 hours
~10M~$100
Multi-agent / backgroundproduction fleets
~100M~$1,000
10²10³10⁴10⁵10⁶10⁷10⁸10⁹
six orders of magnitude

Reference points: a "hi" round-trip to Claude Code = ~31K tokens. The 38-minute Cursor case = 52.5M tokens.

SourcesPer-session telemetry from Anthropic, OpenAI, Cursor, Replit, GitHub. Dollar figures use Sonnet 4.6 PAYG ($3 input / $15 output per million).
@kamilkrauspe
ALT · CAND-01
Variance · one screen

One developer, one flat-rate plan,
four very different sessions.

Tokens consumed in a single working session, with the equivalent PAYG cost.

1K
a chat prompt
~$0.01
200K
a focused edit
~$2
50M
a long autonomous run
~$500
100M+
background sub-agents
$1,000+
SourcesPer-session telemetry synthesis, Q1 2026. Dollar figures use Sonnet 4.6 PAYG ($3 input / $15 output per million).
@kamilkrauspe
V3 · beat 2b
What each tier looks like

What does each tier actually look like?

Real workloads. 2026 evidence.

Tier
2026 anchor
Tier 110²–10³ · inline / chat
Copilot doesn’t bill inline completions — “small enough to absorb.”
Tier 210⁴ · targeted edit
Anthropic Opus worked example: 50K input + 15K output, one-hour task.
Tier 310⁵–10⁶ · interactive
Vantage 50-turn session: ~1M input + 40K output. Input dominates 25:1.
Tier 410⁷ · long autonomous
Cursor case: 52.5M tokens in 38 minutes. Loop replayed 120K window.
Tier 510⁸+ · multi-agent
GitHub gh-aw repo: 237.8M tokens in a single day.

The variance lives in the work, not the price tag.

SourcesAnthropic, GitHub Copilot, Vantage, Cursor forum, gh-aw repo telemetry. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.
@kamilkrauspe
ALT · CAND-02
Type "hi" into Claude Code. Hit enter.
31,000
tokens
Spent before the agent does anything — roughly $0.10 of context loaded just to greet you.
SourceAnthropic Claude Code GitHub issue thread · 2026. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.
@kamilkrauspe
ALT · CAND-15
In a single day. In a single GitHub repository.
(gh-aw — public-facing internal Copilot tracker)
237,800,000
tokens · 378 runs · February 11, 2026
A single repo billed roughly $700–$3,500 of inference in a day.
SourceGitHub gh-aw repo telemetry · per-session survey synthesis. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.
@kamilkrauspe
V3 + ALT-B · beat 3
Capacity rationing · The silent half

The visible half is pricing.
The silent half is rationing.

Four vendors quietly route paid users to lighter models when frontier capacity runs short.

Vendor
In their own words
OpenAI
“Will not appear as a selectable model in the model picker.”
Cursor
“A frontier model that has capacity at that time.”
GitHub
Old fallback “moved exhausted users onto lower-cost models.”
Google
Users at limit “continue the same chat with the lighter Fast model.”
Pricing pressure on one side. Capacity rationing on the other.
They are reinforcing each other.
SourcesOpenAI release notes; Cursor docs; GitHub forum; Google Gemini Help Center.
@kamilkrauspe
ALT · CAND-05
"
Will not appear as a selectable model in the model picker.
— OpenAI release notes,
on the GPT-5.4 mini fallback for paid users.
The capacity rationing happens whether or not you can see it.
@kamilkrauspe
ALT · CAND-13
"
Compute is a constraint
across the entire industry.
— Anthropic, statement to Fortune.
April 23, 2026.
@kamilkrauspe
ALT · CAND-03
Rate-limit errors recorded across LLM API calls in a single month.
8,400,000
Datadog State of AI Engineering 2026.
The dominant production failure mode for agents is no longer cost — it is capacity.
SourceDatadog State of AI Engineering · 2026.
@kamilkrauspe
V3 + ALT-B · beat 4
Three sources of the gap

Vendors absorbed the variance.
The money came from somewhere.

Three sources have visibly carried the gap. None is a perpetual mechanism.

1
Investor capital
$852B
OpenAI post-money, March 2026
  • OpenAI raised $122B in March.
  • Anthropic Series G at $380B.
  • HSBC: ~$207B more needed by 2030.
2
Hyperscaler compute
$16.8B
Amazon Q1 pre-tax gain on Anthropic
  • Multi-GW TPU and Trainium reservations.
  • Delivery contingent on revenue success.
3
Cross-subsidy
<5%
Anthropic-disclosed share driving heavy-user limits
  • GitHub April 27: flat-rate “no longer sustainable.”
  • OpenAI segmented Plus from Pro a year before.

None of the three is perpetual. The transition is to mechanisms the customer can see on the invoice.

SourcesVendor primaries; SEC filings; Reuters; HSBC via Reuters Breakingviews; Anthropic blog (July 2025).
@kamilkrauspe
V3 + ALT-B · beat 5
Where the cost lives

Where the cost of an
inference-shaped workload lives.

Three places. Two with engineering vocabularies. One still emerging.

1
Model
Cheaper inference per unit.
Owned by labs and serving providers.
MoE, latent attention, KV compression, quantization.
2
Harness
Less or better inference per task.
Owned by users and tool builders.
Context loading, tool design, caching discipline.
3
Human
What gets sent through any of it.
Owned by users (still emerging).
Model choice, escalation, when to delegate vs. supervise.

Two have engineering vocabularies. The third shows up as variance inside the others.

@kamilkrauspe
ALT · CAND-14
The seam

Three places
the cost lives.

1
MODEL
labs and serving providers
2
HARNESS
users and tool builders
3
HUMAN
still emerging

Two have engineering vocabularies. One does not.

@kamilkrauspe
V3 + ALT-B · beat 6
Mechanics of $/M tokens

A $/M-tokens number is mechanical.
Rent ÷ throughput.

$/M tokens = GPU rent / hour ÷ tokens/sec × 3,600

The 2026 architectural fronts mostly raise the denominator.

Mixture-of-Experts
DeepSeek V3.2 carries 671B total, activates 37B per token. Fewer FLOPs → more tokens/sec.
KV / latent attention
Compress the cache. ~10× less HBM at long context. More concurrent requests fit per GPU.
Quantization (NVFP4)
Lower-precision weights and KV. Less memory and bandwidth per token.

User-facing lever: prompt caching. Anthropic reads at 10% of input price; one varying byte at the prefix kills the chain.

SourcesHugging Face (DeepSeek, GPT-OSS); NVIDIA NVFP4; Anthropic pricing.
@kamilkrauspe
ALT · CAND-06
A formula

Where a price per million tokens
actually comes from.

rent ÷ throughput

Anything that moves throughput moves the price. The 2026 architectural fronts are mostly versions of this.

@kamilkrauspe
ALT · CAND-08
A model and its weights

DeepSeek V3.2 carries 671B parameters,
and activates a fraction of them per token.

total parameters
671B
most of the model is dormant on most tokens
active per token
37B
what each forward pass actually computes

Mixture-of-Experts. One of several reasons "$/M tokens" can fall.

SourceHugging Face · DeepSeek V3.2 model card.
@kamilkrauspe
V3 · beat 7
Where a $/M number comes from

Where a $/M-tokens number
actually comes from.

A single open-weights model on a single GPU, walked through five lines of arithmetic.

Step 1
GPT-OSS-120B at MXFP4 on 1× H100 80GB.
Step 2
Hardware rent: ~$3 / hour on commodity rental.
Step 3
Saturated throughput: ~5,000 output tokens/sec.
Step 4
5,000 × 3,600 = 18M tokens/hour.
Step 5
$3 ÷ 18 = ~$0.17 / M output tokens.
Illustrative, not normative

Moves with prompt mix, cache hit rate, prefill-vs-decode, on-demand vs reserved capacity, and runtime engine. Shows the physics. Not a market-pricing claim.

@kamilkrauspe
ALT-B · beat 7
A worked example · GPT-OSS-120B

Where a $/M-tokens number
actually comes from.

A worked example on one open-weights model and one commodity GPU.

Step 1
1× H100 80GB on a typical GPU rental market.
Step 2
Hardware rent: ~$3 / hour on commodity rental.
Step 3
Saturated throughput at MXFP4: ~5,000 tokens/second.
Step 4
Tokens/hour = 5,000 × 3,600 = 18,000,000.
Step 5
Cost / M output tokens = $3 ÷ 18 = ~$0.17 / M.

Illustrative on one model and one config. Cache, prompt mix, runtime engine, and procurement model all move the answer. Not a market-price comparison.

SourceOpenAI GPT-OSS-120B model card; commodity GPU rental rates (2026 Q1).
@kamilkrauspe
ALT · CAND-07
One model · one calculation

GPT-OSS-120B at MXFP4
on 1× H100.

Rent
1× H100 80GB rental: ~$3 / hour
Speed
Saturated throughput: ~5,000 tokens/second
Per hr
5,000 × 3,600 = 18,000,000 tokens / hour
Per M
$3 ÷ 18 = ~$0.17 / M tokens

Move any term. The price moves.

@kamilkrauspe
V3 + ALT-B · beat 8
Where the multiplier lives

Model-side gains lower the unit cost.
Harness-side gains compound.

Agentic loops are recursive. So is the bill.

turn 150× turns · per-turn deltas compoundturn 50
Pattern
Evidence
Just-in-time context
Cursor MCP dynamic discovery: 46.9% reduction in agent tokens.
Scoped sub-agents
~7× more tokens per run — earned on parallelizable work.
Cache-breakpoint discipline
Falling from 85% to 70% hit-rate roughly doubles the bill.
Compaction before exhaustion
Anthropic April postmortem: botched compaction → quality regression.
A 10% per-turn improvement, applied 50 times, is not a 10% improvement.
Harness gains compound. So do harness regressions.
SourcesCursor engineering blog; Anthropic April 2026 postmortem; Anthropic pricing docs.
@kamilkrauspe
ALT · CAND-09
Total agent token consumption, before and after switching MCP tools
from a static load to dynamic context discovery.
46.9%
reduction
That is not a model improvement. It is a harness improvement.
SourceCursor engineering blog · "Dynamic context discovery."
@kamilkrauspe
V3 · beat 9
Two teams · one tool · 10× variance

One AI tool, one $5K seat budget,
a 10× gap in cost per merged PR.

The visible artifact of the harness/user seam.

Team A
$5,000 / month
$35
cost per merged PR
Team B
$5,000 / month
$120
cost per merged PR

The variance lives in three user habits: model selection, reasoning concision, and escalation thresholds.

The user layer has no dashboard.
It shows up as variance inside the harness layer's numbers.
SourceVantage agentic-coding cost analysis · April 15, 2026.
@kamilkrauspe
ALT-B + ALT · CAND-10
Two teams · two bills

Identical setup,
a $35 vs $120 outcome per merged PR.

Team A
$35
per merged PR
Team B
$120
per merged PR

The variance is not in the price. It is in how people use the same tool.

SourceVantage · April 15, 2026.
@kamilkrauspe
V3 · beat 10
A note on falling prices

Won't falling prices
make this moot?

They have fallen. That has not reduced total spend.

$ baseline ~1,000× lower $11.5B (2024) $37B (2025) · +220%
Per-token price (Epoch AI · fixed-quality) Enterprise GenAI total spend (CloudZero)

Per-token prices fell ~1,000×. Total enterprise GenAI spend rose 220%. Adoption ate the savings.

Planning around 2026-Q1 inference prices is
the same mistake teams made with cloud spend in 2017.
SourcesEpoch AI inference-pricing dataset; CloudZero State of AI Costs; CloudZero Cloud Unit Economics 2026.
@kamilkrauspe
ALT · CAND-11
A line going down · a line going up

Won't falling prices
make this moot?

price baseline ~1,000× lower $11.5B $37B
Per-token price Enterprise GenAI total spend

They have. It hasn't reduced total spend.

SourcesEpoch AI · CloudZero.
@kamilkrauspe
V3 · beat 11
Labor-substitution · regional table

The substitution math depends entirely
on which cell you're in.

AI subscription, AI PAYG (avg vs. heavy), and engineer cost — by region. The numbers diverge sharply.

Average dev = ~2M tokens/active day. Heavy power-user = 50–100M, midpointed at 75M (Redelinghuys forensic dataset, Anthropic enterprise capacity guidance).

Region (senior eng FL)
$/month
$200 plan
Avg dev (2M)
Heavy · Sonnet
Heavy · Opus
United States — major
$20,000
1.0%
0.3%
10.5%
17.6%
Western Europe (UK, DE)
$11,000
1.8%
0.5%
19.2%
31.9%
Slovakia / CEE
$7,000
2.9%
0.8%
30.1%
50.2%

The question is not "does AI replace engineers."

It is: at what subsidy level, in what region, for which seniority tier, and for what kind of work?

AI is absorbing pieces of the role,
while the human stays accountable for the whole.
SourcesLevels.fyi; ERI SalaryExpert; Anthropic + OpenAI pricing; Sonar State of Code 2026; Faros AI; Anthropic Economic Index.
@kamilkrauspe
ALT-B · beat 11
Heavy power-user · 75M tokens/day

Heavy power-user. Sonnet 4.6 and Opus 4.7
PAYG at 75M tokens/day.

The substitution math depends on which cell you're in.

Average dev ~2M/day. Heavy 50–100M/day midpointed at 75M. 85% cache hit, 22 working days.

Region
$/month
$200 plan
Avg dev
Heavy · Sonnet
Heavy · Opus
US — major
$20,000
1.0%
0.3%
10.5%
17.6%
W. Europe (UK, DE)
$11,000
1.8%
0.5%
19.2%
31.9%
Slovakia / CEE
$7,000
2.9%
0.8%
30.1%
50.2%

The average developer is fully self-funded by the $200 plan.
The heavy user is heavily subsidized — at Opus rates, by 12–23×.

SourcesLevels.fyi; ERI SalaryExpert; Anthropic + OpenAI pricing; Redelinghuys forensic dataset.
@kamilkrauspe
ALT · CAND-12
The Slovakia row

Heavy power-user.
Sonnet 4.6 PAYG at 75M tokens/day.

Region (senior eng FL)
Heavy · Sonnet
Heavy · Opus
United States — major ($20K/mo)
10.5%
17.6%
Western Europe ($11K/mo)
19.2%
31.9%
Slovakia / CEE ($7K/mo)
30.1%
50.2%

The substitution math depends entirely on which cell you're in.

SourcesLevels.fyi · Anthropic + OpenAI pricing · 2026 Q1.
@kamilkrauspe
V3 only · beat 12
The lesson

The lesson is not that AI got more expensive.

It is that anyone whose plan depended on subsidized prices was always exposed.

1
The flat plan is not the floor.
The $200/month subscription is smaller than the per-user PAYG cost it covers in any heavy-user scenario. That gap is the subsidy.
2
Anchor on something more durable than the meter.
Cost-per-token is volatile. Outcome telemetry — merged PRs, resolved tickets, defects per AI commit — survives a repricing event.
3
Assume rebound, not deflation.
Lower per-token prices have historically been absorbed by larger, more autonomous workloads. Plan for total spend flat or rising.
This will not be the only repricing event.
The lesson is to stop building plans that depend on a specific subsidy.
@kamilkrauspe
CLOSE · 1
"
The plan never priced the work.
The work is starting to price itself.
@kamilkrauspe
CLOSE · 2
"
The subsidies were never the plan.
They were the runway.
The runway is shorter now.
@kamilkrauspe
CLOSE · 3
"
What changed is not the cost.
What changed is who can see it.
@kamilkrauspe
CLOSE · 4
"
Cheaper inference will not save the budget.
It will just buy more autonomy.
@kamilkrauspe