Operator notes · The Subsidy Era · 2026

Same plan.
Same user.
100,000× the bill.

Welcome to AI subscriptions in 2026 — and the unwinding of the prices that made them feel unlimited.

A reading carousel · 14 slides · sources at end

@kamilkrauspe

Operator notes · The Subsidy Era · 2026

The agent ran for
38 minutes.
52.5 million tokens.

One documented Cursor session. The flat-plan economics that hid bills like this one are coming apart across the industry.

Cover variant B · single-fact hook

@kamilkrauspe

Operator notes · The Subsidy Era · 2026

Your $200 AI plan
is smaller than what
a heavy day costs
the vendor.

That gap was the business model. Five major vendors are now unwinding it inside a few weeks.

Cover variant C · structural hook

@kamilkrauspe

Reading the draft

Three carousel pools.
One selection deck.

Every body slide carries a tag — V3 (dense), ALT (sparse, single-fact), ALT-B (middle density). We will keep ~10, drop the rest.

14 candidate beats · ~30 slides total · pick at compression

@kamilkrauspe

V3 · beat 1

The April 2026 cluster · Dense timeline

The signals stacked up
in a few weeks.

Six pricing or capacity moves from five major vendors. Late March → late April 2026.

March

Google

Antigravity restructure

Quota → AI Credits. AI Pro weekly tokens fall ~97%.

April 2

OpenAI

Codex per-token shift

Codex bills paid plans by token, not by message.

April 9

Amazon

Jassy shareholder letter

“Capacity constraints that yield unserved demand.”

April 21

Anthropic

Claude Code paywall test

Briefly Max-only. Reversed within a day.

April 23

Anthropic

Statement to Fortune

“Compute is a constraint across the entire industry.”

April 27

GitHub

Copilot → AI Credits

All plans move to usage-based billing on June 1.

Five vendors. One pattern.

Calling it coincidence takes effort.

SourcesAnthropic blog & status; Fortune; GitHub blog; OpenAI help center; Google blog & developer forum; Amazon shareholder letter.

@kamilkrauspe

ALT-B · beat 1

The April 2026 cluster · Mid-density

The signals stacked up
in a few weeks.

Six pricing or capacity moves from five major vendors, late March through late April 2026.

March

Google

Antigravity restructure

AI Pro weekly tokens fall ~97%.

April 2

OpenAI

Codex per-token shift

Paid plans bill per token, not per message.

April 9

Amazon

Jassy shareholder letter

“Capacity constraints that yield unserved demand.”

April 21

Anthropic

Claude Code paywall test

Briefly Max-only. Reversed within a day.

April 23

Anthropic

Statement to Fortune

“Compute is a constraint across the entire industry.”

April 27

GitHub

Copilot → AI Credits

All plans usage-based on June 1.

SourcesAnthropic, GitHub, OpenAI, Google, Amazon — primary statements & filings, late March – April 2026.

@kamilkrauspe

ALT · CAND-04

The April 2026 cluster · Lean version

The past few weeks.

Six pricing or capacity moves. Five vendors.

March

Google restructures Antigravity to AI Credits.

April 2

OpenAI moves Codex billing from messages to tokens.

April 9

AWS shareholder letter: "capacity constraints that yield unserved demand."

April 21

Anthropic tests Claude Code as Max-only. Reverses within a day.

April 23

Anthropic to Fortune: "compute is a constraint across the entire industry."

April 27

GitHub: all Copilot plans move to AI Credits on June 1.

Five vendors moving in the same direction inside a single month.

SourcesVendor primaries · April 2026.

@kamilkrauspe

V3 · beat 2

Variance per session

A flat plan works when the heaviest user can't bend it far.

Per-session token consumption across six common workloads — same nominal user, 2026 Q1 evidence.

Inline autocompleteone keystroke

~100<$0.01

Single chat promptno project context

~1K~$0.01

Targeted file editshort multi-step

~10K~$0.10

Interactive coding30–90 min session

100K–1M$1–$10

Long autonomous run1–4 hours

~10M~$100

Multi-agent / backgroundproduction fleets

~100M~$1,000

10²10³10⁴10⁵10⁶10⁷10⁸10⁹

six orders of magnitude

Reference points: a "hi" round-trip to Claude Code = ~31K tokens. The 38-minute Cursor case = 52.5M tokens.

SourcesPer-session telemetry from Anthropic, OpenAI, Cursor, Replit, GitHub. Dollar figures use Sonnet 4.6 PAYG ($3 input / $15 output per million).

@kamilkrauspe

ALT · CAND-01

Variance · one screen

One developer, one flat-rate plan,
four very different sessions.

Tokens consumed in a single working session, with the equivalent PAYG cost.

a chat prompt

~$0.01

200K

a focused edit

~$2

50M

a long autonomous run

~$500

100M+

background sub-agents

$1,000+

SourcesPer-session telemetry synthesis, Q1 2026. Dollar figures use Sonnet 4.6 PAYG ($3 input / $15 output per million).

@kamilkrauspe

V3 · beat 2b

What each tier looks like

What does each tier actually look like?

Real workloads. 2026 evidence.

Tier

2026 anchor

Tier 110²–10³ · inline / chat

Copilot doesn’t bill inline completions — “small enough to absorb.”

Tier 210⁴ · targeted edit

Anthropic Opus worked example: 50K input + 15K output, one-hour task.

Tier 310⁵–10⁶ · interactive

Vantage 50-turn session: ~1M input + 40K output. Input dominates 25:1.

Tier 410⁷ · long autonomous

Cursor case: 52.5M tokens in 38 minutes. Loop replayed 120K window.

Tier 510⁸+ · multi-agent

GitHub gh-aw repo: 237.8M tokens in a single day.

The variance lives in the work, not the price tag.

SourcesAnthropic, GitHub Copilot, Vantage, Cursor forum, gh-aw repo telemetry. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.

@kamilkrauspe

ALT · CAND-02

Type "hi" into Claude Code. Hit enter.

31,000

tokens

Spent before the agent does anything — roughly $0.10 of context loaded just to greet you.

SourceAnthropic Claude Code GitHub issue thread · 2026. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.

@kamilkrauspe

ALT · CAND-15

In a single day. In a single GitHub repository.
(gh-aw — public-facing internal Copilot tracker)

237,800,000

tokens · 378 runs · February 11, 2026

A single repo billed roughly $700–$3,500 of inference in a day.

SourceGitHub gh-aw repo telemetry · per-session survey synthesis. $ at Sonnet 4.6 PAYG: $3/M input, $15/M output.

@kamilkrauspe

V3 + ALT-B · beat 3

Capacity rationing · The silent half

The visible half is pricing.
The silent half is rationing.

Four vendors quietly route paid users to lighter models when frontier capacity runs short.

Vendor

In their own words

OpenAI

“Will not appear as a selectable model in the model picker.”

Cursor

“A frontier model that has capacity at that time.”

GitHub

Old fallback “moved exhausted users onto lower-cost models.”

Google

Users at limit “continue the same chat with the lighter Fast model.”

Pricing pressure on one side. Capacity rationing on the other.

They are reinforcing each other.

SourcesOpenAI release notes; Cursor docs; GitHub forum; Google Gemini Help Center.

@kamilkrauspe

ALT · CAND-05

Will not appear as a selectable model in the model picker.

— OpenAI release notes,
on the GPT-5.4 mini fallback for paid users.

The capacity rationing happens whether or not you can see it.

@kamilkrauspe

ALT · CAND-03

Rate-limit errors recorded across LLM API calls in a single month.

8,400,000

Datadog State of AI Engineering 2026.
The dominant production failure mode for agents is no longer cost — it is capacity.

SourceDatadog State of AI Engineering · 2026.

@kamilkrauspe

V3 + ALT-B · beat 4

Three sources of the gap

Vendors absorbed the variance.
The money came from somewhere.

Three sources have visibly carried the gap. None is a perpetual mechanism.

Investor capital

$852B

OpenAI post-money, March 2026

OpenAI raised $122B in March.
Anthropic Series G at $380B.
HSBC: ~$207B more needed by 2030.

Hyperscaler compute

$16.8B

Amazon Q1 pre-tax gain on Anthropic

Multi-GW TPU and Trainium reservations.
Delivery contingent on revenue success.

Cross-subsidy

<5%

Anthropic-disclosed share driving heavy-user limits

GitHub April 27: flat-rate “no longer sustainable.”
OpenAI segmented Plus from Pro a year before.

None of the three is perpetual. The transition is to mechanisms the customer can see on the invoice.

SourcesVendor primaries; SEC filings; Reuters; HSBC via Reuters Breakingviews; Anthropic blog (July 2025).

@kamilkrauspe

V3 + ALT-B · beat 5

Where the cost lives

Where the cost of an
inference-shaped workload lives.

Three places. Two with engineering vocabularies. One still emerging.

Model

Cheaper inference per unit.

Owned by labs and serving providers.

MoE, latent attention, KV compression, quantization.

Harness

Less or better inference per task.

Owned by users and tool builders.

Context loading, tool design, caching discipline.

Human

What gets sent through any of it.

Owned by users (still emerging).

Model choice, escalation, when to delegate vs. supervise.

Two have engineering vocabularies. The third shows up as variance inside the others.

@kamilkrauspe

ALT · CAND-14

The seam

Three places
the cost lives.

MODEL

labs and serving providers

HARNESS

users and tool builders

HUMAN

still emerging

Two have engineering vocabularies. One does not.

@kamilkrauspe

V3 + ALT-B · beat 6

Mechanics of $/M tokens

A $/M-tokens number is mechanical.
Rent ÷ throughput.

$/M tokens = GPU rent / hour ÷ tokens/sec × 3,600

The 2026 architectural fronts mostly raise the denominator.

Mixture-of-Experts

DeepSeek V3.2 carries 671B total, activates 37B per token. Fewer FLOPs → more tokens/sec.

KV / latent attention

Compress the cache. ~10× less HBM at long context. More concurrent requests fit per GPU.

Quantization (NVFP4)

Lower-precision weights and KV. Less memory and bandwidth per token.

User-facing lever: prompt caching. Anthropic reads at 10% of input price; one varying byte at the prefix kills the chain.

SourcesHugging Face (DeepSeek, GPT-OSS); NVIDIA NVFP4; Anthropic pricing.

@kamilkrauspe

ALT · CAND-06

A formula

Where a price per million tokens
actually comes from.

rent ÷ throughput

Anything that moves throughput moves the price. The 2026 architectural fronts are mostly versions of this.

@kamilkrauspe

ALT · CAND-08

A model and its weights

DeepSeek V3.2 carries 671B parameters,
and activates a fraction of them per token.

total parameters

671B

most of the model is dormant on most tokens

active per token

37B

what each forward pass actually computes

Mixture-of-Experts. One of several reasons "$/M tokens" can fall.

SourceHugging Face · DeepSeek V3.2 model card.

@kamilkrauspe

V3 · beat 7

Where a $/M number comes from

Where a $/M-tokens number
actually comes from.

A single open-weights model on a single GPU, walked through five lines of arithmetic.

Step 1

GPT-OSS-120B at MXFP4 on 1× H100 80GB.

Step 2

Hardware rent: ~$3 / hour on commodity rental.

Step 3

Saturated throughput: ~5,000 output tokens/sec.

Step 4

5,000 × 3,600 = 18M tokens/hour.

Step 5

$3 ÷ 18 = ~$0.17 / M output tokens.

Illustrative, not normative

Moves with prompt mix, cache hit rate, prefill-vs-decode, on-demand vs reserved capacity, and runtime engine. Shows the physics. Not a market-pricing claim.

@kamilkrauspe

ALT-B · beat 7

A worked example · GPT-OSS-120B

Where a $/M-tokens number
actually comes from.

A worked example on one open-weights model and one commodity GPU.

Step 1

1× H100 80GB on a typical GPU rental market.

Step 2

Hardware rent: ~$3 / hour on commodity rental.

Step 3

Saturated throughput at MXFP4: ~5,000 tokens/second.

Step 4

Tokens/hour = 5,000 × 3,600 = 18,000,000.

Step 5

Cost / M output tokens = $3 ÷ 18 = ~$0.17 / M.

Illustrative on one model and one config. Cache, prompt mix, runtime engine, and procurement model all move the answer. Not a market-price comparison.

SourceOpenAI GPT-OSS-120B model card; commodity GPU rental rates (2026 Q1).

@kamilkrauspe

ALT · CAND-07

One model · one calculation

GPT-OSS-120B at MXFP4
on 1× H100.

Rent

1× H100 80GB rental: ~$3 / hour

Speed

Saturated throughput: ~5,000 tokens/second

Per hr

5,000 × 3,600 = 18,000,000 tokens / hour

Per M

$3 ÷ 18 = ~$0.17 / M tokens

Move any term. The price moves.

@kamilkrauspe

V3 + ALT-B · beat 8

Where the multiplier lives

Model-side gains lower the unit cost.
Harness-side gains compound.

Agentic loops are recursive. So is the bill.

turn 150× turns · per-turn deltas compoundturn 50

Pattern

Evidence

Just-in-time context

Cursor MCP dynamic discovery: 46.9% reduction in agent tokens.

Scoped sub-agents

~7× more tokens per run — earned on parallelizable work.

Cache-breakpoint discipline

Falling from 85% to 70% hit-rate roughly doubles the bill.

Compaction before exhaustion

Anthropic April postmortem: botched compaction → quality regression.

A 10% per-turn improvement, applied 50 times, is not a 10% improvement.

Harness gains compound. So do harness regressions.

SourcesCursor engineering blog; Anthropic April 2026 postmortem; Anthropic pricing docs.

@kamilkrauspe

ALT · CAND-09

Total agent token consumption, before and after switching MCP tools
from a static load to dynamic context discovery.

46.9%

reduction

That is not a model improvement. It is a harness improvement.

SourceCursor engineering blog · "Dynamic context discovery."

@kamilkrauspe

V3 · beat 9

Two teams · one tool · 10× variance

One AI tool, one $5K seat budget,
a 10× gap in cost per merged PR.

The visible artifact of the harness/user seam.

Team A

$5,000 / month

$35

cost per merged PR

Team B

$5,000 / month

$120

cost per merged PR

The variance lives in three user habits: model selection, reasoning concision, and escalation thresholds.

The user layer has no dashboard.

It shows up as variance inside the harness layer's numbers.

SourceVantage agentic-coding cost analysis · April 15, 2026.

@kamilkrauspe

ALT-B + ALT · CAND-10

Two teams · two bills

Identical setup,
a $35 vs $120 outcome per merged PR.

Team A

$35

per merged PR

Team B

$120

per merged PR

The variance is not in the price. It is in how people use the same tool.

SourceVantage · April 15, 2026.

@kamilkrauspe

V3 · beat 10

A note on falling prices

Won't falling prices
make this moot?

They have fallen. That has not reduced total spend.

Per-token price (Epoch AI · fixed-quality) Enterprise GenAI total spend (CloudZero)

Per-token prices fell ~1,000×. Total enterprise GenAI spend rose 220%. Adoption ate the savings.

Planning around 2026-Q1 inference prices is

the same mistake teams made with cloud spend in 2017.

SourcesEpoch AI inference-pricing dataset; CloudZero State of AI Costs; CloudZero Cloud Unit Economics 2026.

@kamilkrauspe

ALT · CAND-11

A line going down · a line going up

Won't falling prices
make this moot?

Per-token price Enterprise GenAI total spend

They have. It hasn't reduced total spend.

SourcesEpoch AI · CloudZero.

@kamilkrauspe

V3 · beat 11

Labor-substitution · regional table

The substitution math depends entirely
on which cell you're in.

AI subscription, AI PAYG (avg vs. heavy), and engineer cost — by region. The numbers diverge sharply.

Average dev = ~2M tokens/active day. Heavy power-user = 50–100M, midpointed at 75M (Redelinghuys forensic dataset, Anthropic enterprise capacity guidance).

Region (senior eng FL)

$/month

$200 plan

Avg dev (2M)

Heavy · Sonnet

Heavy · Opus

United States — major

$20,000

1.0%

0.3%

10.5%

17.6%

Western Europe (UK, DE)

$11,000

1.8%

0.5%

19.2%

31.9%

Slovakia / CEE

$7,000

2.9%

0.8%

30.1%

50.2%

The question is not "does AI replace engineers."

It is: at what subsidy level, in what region, for which seniority tier, and for what kind of work?

AI is absorbing pieces of the role,

while the human stays accountable for the whole.

SourcesLevels.fyi; ERI SalaryExpert; Anthropic + OpenAI pricing; Sonar State of Code 2026; Faros AI; Anthropic Economic Index.

@kamilkrauspe

ALT-B · beat 11

Heavy power-user · 75M tokens/day

Heavy power-user. Sonnet 4.6 and Opus 4.7
PAYG at 75M tokens/day.

The substitution math depends on which cell you're in.

Average dev ~2M/day. Heavy 50–100M/day midpointed at 75M. 85% cache hit, 22 working days.

Region

$/month

$200 plan

Avg dev

Heavy · Sonnet

Heavy · Opus

US — major

$20,000

1.0%

0.3%

10.5%

17.6%

W. Europe (UK, DE)

$11,000

1.8%

0.5%

19.2%

31.9%

Slovakia / CEE

$7,000

2.9%

0.8%

30.1%

50.2%

The average developer is fully self-funded by the $200 plan.
The heavy user is heavily subsidized — at Opus rates, by 12–23×.

SourcesLevels.fyi; ERI SalaryExpert; Anthropic + OpenAI pricing; Redelinghuys forensic dataset.

@kamilkrauspe

ALT · CAND-12

The Slovakia row

Heavy power-user.
Sonnet 4.6 PAYG at 75M tokens/day.

Region (senior eng FL)

Heavy · Sonnet

Heavy · Opus

United States — major ($20K/mo)

10.5%

17.6%

Western Europe ($11K/mo)

19.2%

31.9%

Slovakia / CEE ($7K/mo)

30.1%

50.2%

The substitution math depends entirely on which cell you're in.

SourcesLevels.fyi · Anthropic + OpenAI pricing · 2026 Q1.

@kamilkrauspe

V3 only · beat 12

The lesson

The lesson is not that AI got more expensive.

It is that anyone whose plan depended on subsidized prices was always exposed.

The flat plan is not the floor.

The $200/month subscription is smaller than the per-user PAYG cost it covers in any heavy-user scenario. That gap is the subsidy.

Anchor on something more durable than the meter.

Cost-per-token is volatile. Outcome telemetry — merged PRs, resolved tickets, defects per AI commit — survives a repricing event.

Assume rebound, not deflation.

Lower per-token prices have historically been absorbed by larger, more autonomous workloads. Plan for total spend flat or rising.

This will not be the only repricing event.

The lesson is to stop building plans that depend on a specific subsidy.

@kamilkrauspe

Same plan.Same user.100,000× the bill.

The agent ran for38 minutes.52.5 million tokens.

Your $200 AI planis smaller than whata heavy day coststhe vendor.

Three carousel pools.One selection deck.

The signals stacked upin a few weeks.

The signals stacked upin a few weeks.

The past few weeks.

A flat plan works when the heaviest user can't bend it far.

One developer, one flat-rate plan,four very different sessions.

What does each tier actually look like?

The visible half is pricing.The silent half is rationing.

Vendors absorbed the variance.The money came from somewhere.

Where the cost of aninference-shaped workload lives.

Three placesthe cost lives.

A $/M-tokens number is mechanical.Rent ÷ throughput.

Where a price per million tokensactually comes from.

DeepSeek V3.2 carries 671B parameters,and activates a fraction of them per token.

Where a $/M-tokens numberactually comes from.

Where a $/M-tokens numberactually comes from.

GPT-OSS-120B at MXFP4on 1× H100.

Model-side gains lower the unit cost.Harness-side gains compound.

One AI tool, one $5K seat budget,a 10× gap in cost per merged PR.

Identical setup,a $35 vs $120 outcome per merged PR.

Won't falling pricesmake this moot?

Won't falling pricesmake this moot?

The substitution math depends entirelyon which cell you're in.

Heavy power-user. Sonnet 4.6 and Opus 4.7PAYG at 75M tokens/day.

Heavy power-user.Sonnet 4.6 PAYG at 75M tokens/day.

The lesson is not that AI got more expensive.

Same plan.
Same user.
100,000× the bill.

The agent ran for
38 minutes.
52.5 million tokens.

Your $200 AI plan
is smaller than what
a heavy day costs
the vendor.

Three carousel pools.
One selection deck.

The signals stacked up
in a few weeks.

The signals stacked up
in a few weeks.

One developer, one flat-rate plan,
four very different sessions.

The visible half is pricing.
The silent half is rationing.

Vendors absorbed the variance.
The money came from somewhere.

Where the cost of an
inference-shaped workload lives.

Three places
the cost lives.

A $/M-tokens number is mechanical.
Rent ÷ throughput.

Where a price per million tokens
actually comes from.

DeepSeek V3.2 carries 671B parameters,
and activates a fraction of them per token.

Where a $/M-tokens number
actually comes from.

Where a $/M-tokens number
actually comes from.

GPT-OSS-120B at MXFP4
on 1× H100.

Model-side gains lower the unit cost.
Harness-side gains compound.

One AI tool, one $5K seat budget,
a 10× gap in cost per merged PR.

Identical setup,
a $35 vs $120 outcome per merged PR.

Won't falling prices
make this moot?

Won't falling prices
make this moot?

The substitution math depends entirely
on which cell you're in.

Heavy power-user. Sonnet 4.6 and Opus 4.7
PAYG at 75M tokens/day.

Heavy power-user.
Sonnet 4.6 PAYG at 75M tokens/day.