Same plan.
Same user.
100,000× the bill.

The Subsidy Era — AI subscriptions, coming due.

@kamilkrauspe

A heavy day can break the economics
of a $200 plan.

$200

One heavy agentic-coding day burns more inference
than the monthly plan covers.

The rest is paid by someone — until it gets metered.

@kamilkrauspe

Four weeks. Five moves.
One direction.

March

Google

Antigravity restructured to AI Credits. AI Pro effective token usage cut sharply.

April 2

OpenAI

Codex billing moved from credits-per-message to credits-per-token.

April 21

Anthropic

Claude Code briefly restricted to Max plans. Reversed within a day.

April 23

Anthropic

Postmortem: “compute is a constraint across the entire industry.”

April 27

GitHub

All Copilot plans move to AI Credits. Input, output, cached tokens all metered.

@kamilkrauspe

Agentic work doesn’t have
normal SaaS variance.

Inline autocompleteone keystroke

< 1K

< $0.01

Single chat promptno project context

< 10K

< $0.05

Targeted file editshort multi-step

< 100K

< $0.30

Interactive coding30–90 min session

< 10M

< $10

Long autonomous run1–4 hours

< 100M

< $80

Multi-agent / backgroundproduction fleets

up to 1B

$500+

~100,000× spread

52.5M tokens · 38 minutes

Cursor agent replaying a 120K context window in a tool loop.

Sonnet 4.6 list rates. x-axis logarithmic. Estimates.

@kamilkrauspe

A real heavy user
costs more to serve
than the plan covers.

What you pay

$200

flat subscription, per month

What heavy use actually costs

$1,000–$4,400

inference at list rates, per month

The plan covers about 5–20% of the inference.
Someone is paying the rest.

Heavy power-user at 100M tokens/active day. Cache hit ~90%. Sonnet 4.6 to Opus 4.7. Estimates.

@kamilkrauspe

It’s not only about the money.

Heavy use overshoots the plan in dollars.
It also overshoots it in capacity.

4×

Tokens per request, heavy users (P90), year over year.

Pricing changes aren’t only cost recovery.
They’re also rationing on a scarce resource.

Datadog State of AI Engineering, 2026.

@kamilkrauspe

The cost moves at three layers.

Inference

The unit cost of generating tokens.

Harness

The loop around the model.

Use

The choices people make in the loop.

The first two improve over time. The third needs discipline.

@kamilkrauspe

Layer 1 — Inference

Throughput is the denominator.

GPU rent ÷ tokens per hour = $ per million tokens

Example: $3/hour ÷ 18M tokens/hour ≈ $0.17/MT.
Self-hosted GPT-OSS-120B on H100 80GB. Single-config example.

More throughput

vLLM · SGLang · TensorRT-LLM

Less active compute

MoE · quantization · distillation

Less memory pressure

KV compression · MLA · paged attention

Push throughput up, unit cost falls.

@kamilkrauspe

Layer 2 — Harness

Every turn carries the previous turn.

In Claude Code, Cursor, Codex, Copilot — every coding agent is a loop. A 50-turn session is 50 calls, each carrying everything before.

Turn 1 — system prompt + tools + user ask. ~30K tokens.

Turn 2 — Turn 1 + grep results. ~32K.

Turn 3 — Turn 2 + file contents. ~35K.

…by Turn 50, running context ~150K. Cumulative across turns: several million.

cached uncached

≈3.5M tokens

~90% cached

turn 1turn 25turn 50

Caching makes replayed tokens cheaper. It does not make them disappear.

@kamilkrauspe

Layer 2 — Harness

How harnesses get cheaper.

Wins on this layer compound.

Reduce the loop

Programmatic tool calling

JIT context discovery

Scoped subagents

Preserve the cache

Stable prefixes

TTL-aware sessions

Dynamic content outside cache

Shape the work

Diff/edit formats

Model routing

Workflow-shaped tools

Anthropic case: 150K → 2K. Cursor: −46.9% on MCP runs. Aider: whole-file edits cost ~3× diff output.

This is context engineering in practice.

@kamilkrauspe

Layer 3 — Use

Same spend.
Different productivity.

Team A

Spend: $5,000/month

$35

cost per merged PR

Team B

Spend: $5,000/month

$120

cost per merged PR

The variance is not only in price. It’s in which model people reach for, when they escalate, and when they stop.

Specify · Control · Measure

This is the layer
that does not get cheaper on its own.

Vantage agentic-coding cost analysis, April 2026. ~10× cost-per-developer variance within a single team is typical.

@kamilkrauspe

Layer 1 sets the rate card. Layer 2 sets the loop. Layer 3 sets the denominator.

What was free was never free.

@kamilkrauspe

Same plan. Same user. 100,000× the bill.

Four weeks. Five moves.One direction.

Agentic work doesn’t havenormal SaaS variance.

A real heavy usercosts more to servethan the plan covers.

It’s not only about the money.

The cost moves at three layers.

Throughput is the denominator.

Every turn carries the previous turn.

How harnesses get cheaper.

Same spend.Different productivity.

Layer 1 sets the rate card. Layer 2 sets the loop. Layer 3 sets the denominator.

Same plan.
Same user.
100,000× the bill.

Four weeks. Five moves.
One direction.

Agentic work doesn’t have
normal SaaS variance.

A real heavy user
costs more to serve
than the plan covers.

Same spend.
Different productivity.