What is the cheapest LLM API in 2026?

On raw per-token cost, DeepSeek V3.2 at $0.28/M input and $0.42/M output is the cheapest mainstream LLM API. Gemini 3.1 Flash-Lite ($0.25 input / $1.50 output) is cheaper on input and the only sub-$1/M model with a 1M context window and 363 tok/s speed. Pure cost: DeepSeek V3.2. Cost + context + speed together: Flash-Lite.

What is the best LLM subscription for developers in 2026?

Claude Max 20x at $200/mo for heavy daily Claude Code use. Claude Max 5x at $100/mo if you can't justify the $200 tier. ChatGPT Plus at $20/mo if you want one all-purpose sub with separate Codex and chat pools. Google AI Plus at $7.99/mo if you need flagship Gemini 3.1 Pro access on a budget.

Is Gemini 3.1 Flash-Lite better than Gemini 2.5 Flash?

For speed, yes — 363 tok/s vs 249 tok/s is 45% faster, and TTFT is 2.5x faster. For cost, 17% cheaper input and 40% cheaper output. For quality, benchmarks show a significant jump (GPQA Diamond 86.9% vs 72.8%). For new projects, Flash-Lite is the better default. For stable production pipelines on 2.5 Flash, wait for GA — Flash-Lite is still in preview.

How much does Kimi K2.5 cost in 2026?

Kimi K2.5 API pricing is $0.60/M input and $2.50/M output — 4-17x cheaper than GPT-5.4 and 5-6x cheaper than Claude Sonnet 4.6. Subscription comes in two tiers: $39/mo for light use, $99/mo for serious agentic coding with bigger quotas and an automated OpenClaw setup. The $39 tier hits limits fast; go to $99 if you're committing to a Kimi-driven workflow.

Cheapest LLM 2026 by Tier: API and Subscription Reference

This page is a lookup, not a read. Ctrl-F the tier you care about, hit the verdict at the end of the section, leave. I run Claude Max for daily coding, I've paid for everything else on this list at some point in the last year, and I track LLM pricing data on this site — so the numbers match what the meter actually says.

Two grids. One subscription, one per-token API. Everything else is justification.

The 30-second answer: master pricing table#

Band	Plan / Model	Price	What it's for	Verdict
Sub $200/mo	Claude Max 20x	$200/mo	Coding + Claude Code, heavy use	Subscription winner — top tier
Sub $200/mo	ChatGPT Pro 20x	$200/mo	Planning + GPT-5.4 Pro reasoning	Runner-up
Sub $100/mo	Claude Max 5x	$100/mo	Coding + Claude Code, mid-volume	Subscription winner — mid tier
Sub $100/mo	ChatGPT Pro 5x	$100/mo	Planning + Codex + GPT-5.4 Pro	New contender
Sub $100/mo	Synthetic.new	~$80/mo	OpenCode agents on OSS models	OSS-agent alternative
Sub $100/mo	Kimi K2.5 Pro	$39 / $99/mo	Agentic coding + auto OpenClaw setup	$99 tier for serious use
Sub $20/mo	ChatGPT Plus	$20/mo	Daily driver, separate chat + Codex pools	Subscription winner — $20 tier
Sub $20/mo	Gemini Pro + Antigravity	$20/mo	Image/video + GCP credit bundle	Best bundle value
Sub $20/mo	Claude Pro	$20/mo	Long-form writing, light Sonnet use	Avoid for dev work
Sub under-$20	Google AI Plus	$7.99/mo	Flagship Gemini 3.1 Pro + Deep Research + Veo	Subscription winner — under $20
Sub under-$20	ChatGPT Go	$8/mo	GPT-5.3 Instant + Custom GPTs	OpenAI alt
Sub under-$20	NanoGPT	~$8/mo	OpenCode experimentation	Tinkerer's wildcard
API sub-$1/M	DeepSeek V3.2	$0.28 in / $0.42 out	Cheapest raw per-token cost	API winner — sub-$1 on cost
API sub-$1/M	Gemini 3.1 Flash-Lite	$0.25 in / $1.50 out	1M context + 363 tok/s + cheap	API winner — sub-$1 on capability
API $1-3/M	GPT-5 mini	$0.25 in / $2.00 out	Cheap input, 400K context	$1-3 contender
API $1-3/M	Claude Haiku 4.5	$1.00 in / $5.00 out	Anthropic ecosystem, faster Sonnet alt	$1-3 quality pick
API $1-3/M	Kimi K2.5	$0.60 in / $2.50 out	Best price/quality for agentic coding	API winner — $1-3
API $3+/M	Claude Sonnet 4.6	$3 in / $15 out	Production-grade coding default	$3+ default
API $3+/M	Claude Opus 4.6 / 4.7	$5 in / $25 out	Frontier coding, agentic ceiling	API winner — $3+ on quality
API $3+/M	Gemini 3.1 Pro	$2.50 in / $12 out (tiered)	1M context, multimodal	Cheaper-than-Claude flagship

Subscription winners by tier: $200 → Claude Max 20x · $100 → Claude Max 5x · $20 → ChatGPT Plus · under $20 → Google AI Plus. API winners by band: sub-$1 cost → DeepSeek V3.2 · sub-$1 capability → Gemini 3.1 Flash-Lite · $1-3 → Kimi K2.5 · $3+ quality → Claude Opus 4.7.

That's the whole post. Everything below is the why and the use-this-when for each band.

How the price-band cutoffs work in 2026#

Two grids exist because two purchase patterns do.

Subscriptions charge a flat monthly fee against a quota of messages, agent turns, or model time. The decision: does my usage justify the seat price?

API per-token pricing charges for exactly the input and output you generate. The decision: how much will this run cost at scale?

Crossover line: if you're a working developer running Claude Code or Codex for hours a day, the API equivalent of your usage costs 5-10x more than a Max subscription. If you're running a production pipeline at scale, the API is cheaper than a seat because you control the prompt budget. The two pricing models aren't competitors — they're complementary, and most serious users run both.

The bands themselves are how price points cluster in mid-2026:

Subscription: $200 (frontier power-user), $100 (serious dev), $20 (daily driver), under $20 (budget chat).
API: sub-$1/M output (volume), $1-3/M (mid), $3+/M (premium quality).

Below those is the free tier (Gemini CLI's free quota, ChatGPT's free GPT-5.3, self-hosted OSS). Above the top band, vendors stop publishing list prices and you're in custom enterprise contracts.

Subscription band — $200/mo winner: Claude Max 20x#

What you get: 20x the Opus 4.6 usage of Claude Pro, unmetered Claude Sonnet 4.6, Claude Code access with generous agent limits. Enough headroom to run Claude Code as a daily driver without hitting walls mid-session.

Opus 4.6 is, as of mid-2026, the strongest all-around coding and agentic-work model I've tested — best Aider-leaderboard model in daily use, best Claude Code backend, best long-context reasoner for multi-turn work. The Max 20x ceiling is high enough that entire monorepos fit inside the 1M-context window without forced compaction.

The "is $200 worth it" math: arguments online routinely benchmark Max against the raw API at a 1,500-prompts-per-month baseline (~50/day). That isn't power-user volume. A real Claude Code day on Opus 4.6 ($5/M input, $25/M output — see Anthropic's pricing page) burns $100-500 of API tokens normally and can cross $1,000 on a big-refactor day. At that volume, Max 20x is roughly a 90% discount versus equivalent API spend. It pays for itself in two productive days.

The runner-up at $200 is ChatGPT Pro 20x — same price, 20x GPT-5.4 Pro usage, maximum Codex tasks, unlimited GPT-5.3. Better for pure reasoning, weaker for daily coding. The pattern that works: plan with ChatGPT Pro, implement with Claude Max. For getting more out of Claude Code itself, my Claude Code setup guide covers CLAUDE.md, MCP servers, and skills end-to-end.

Use this when: you code for a living, you run Claude Code daily, and your usage would burn more than $200 in API tokens per month.

Subscription band — $100/mo winner: Claude Max 5x#

What you get: 5x the usage of Claude Pro, full Sonnet 4.6 access, Opus 4.6 at a reduced quota, Claude Code with meaningful agent limits.

The default day-to-day for most working engineers who can't or won't justify Max 20x — frontier-quality coding without the $200 bill. You hit the Opus quota wall sooner; if your work is Opus-heavy, Max 20x earns the upgrade inside two weeks.

One caveat: Claude Code's default reasoning effort dropped to "medium" for Max subscribers in early 2026 and a lot of Max 5x users noticed an output quality drop without knowing why. Fix in my Claude Code effort-level guide.

The $100 bracket has three real alternatives:

ChatGPT Pro 5x at $100/mo — OpenAI's new 5x tier mirroring Claude Max (April 2026). GPT-5.4 Pro, expanded Codex tasks, unlimited GPT-5.3. Same trade-off as the $200 bracket: Claude Code + Opus is the stronger coding stack; GPT-5.4 Pro + Codex is the stronger planning stack. For Codex setup specifically, see my OpenAI Codex setup guide.
Synthetic.new at ~$80/mo — access to a rotating set of open-source frontier models (GLM, Kimi, Qwen, Llama variants) with throughput that runs noticeably faster than Chutes or OpenCode Zen in practice. Open-source models are not Opus 4.6, but for grunt work (boilerplate, refactors, ticket implementation), you're paying 20% of Claude Max 5x price for 70% of the useful output.
Kimi K2.5 Pro at $39 or $99/mo — Moonshot's dev-facing tiers. The $39 entry tier hits limits fast on real agentic coding; the $99 tier earns its slot with bigger quotas and an automated OpenClaw setup that gets you running in minutes instead of wiring up an agent harness by hand. Strong for repetitive, constrained, easy-to-validate work; for ambiguous high-risk stuff, still pay for Claude.

For pairing Synthetic or Kimi with a remote agent environment, my OpenClaw GCP setup guide walks through getting Ubuntu desktop running on GCP.

Use this when: you want frontier-tier coding access without the $200 bill, and you're OK occasionally hitting the Opus wall in a long session.

Subscription band — $20/mo winner: ChatGPT Plus#

What you get: GPT-5.4 access with generous limits, DALL·E image generation, Advanced Voice, Custom GPTs, file uploads, code interpreter, web browsing, and — most importantly — separate usage pools for chat and Codex.

The separate-pool thing is what makes ChatGPT Plus the honest $20 pick for developers. Chat and Codex quotas don't cannibalize each other — spend the morning in a planning conversation and still have room for a full Codex session in the afternoon. Claude Pro can't match that at the same price. GPT-5.4 is a capable all-rounder: not the best at any single thing, but consistently good enough across chat, coding, spreadsheets, and image work to handle most of the day-to-day.

The alternative at $20: Gemini Pro + the Antigravity bundle. Not really a chatbot subscription — an AI developer bundle with a chatbot on the side. For $20: Gemini 3.1 Pro access, the Antigravity IDE subscription (Google's Cursor competitor), $5 of GCP credit per month (auto-applied), 100 Nano Banana generations per day, Veo access via Flow, and NotebookLM with higher quotas. Cheapest way in if you're building a GCP side project or want image generation as a daily tool. Gemini isn't a great coding model compared to Opus or GPT-5.4 — treat this as a second sub.

Trap warning: Claude Pro at $20 is not the right pick for developers. Three or four agentic Claude Code turns, hit the Opus usage cap after 30-40 minutes, and you're locked out of Claude entirely for the next four and a half hours. No Sonnet fallback, no web UI, no Claude Code. Anthropic positions Pro as a light-use plan (verifiable at claude.com/pricing). For developer work that leans on Opus, you need Max 5x or Max 20x.

Use this when: you want one subscription that covers chat, coding, image, and the rest of the daily workflow without specialized power-user limits.

Subscription band — under $20/mo winner: Google AI Plus#

What you get for $7.99/month: Gemini 3.1 Pro (the actual flagship), Deep Research, Nano Banana Pro image generation, Veo 3.1 Fast video via Flow and Whisk, 200 monthly AI credits, NotebookLM, Gemini in Gmail/Docs/Drive, 200 GB storage, no ads. Available in 160+ countries since January 2026.

Flagship-model access is the big deal. Most cheap plans put you on a nano model; AI Plus gives you the same Gemini 3.1 Pro that powers the $19.99 tier with lower daily limits. The only sub under $20 where you're paying for an actual flagship.

Two alternatives in this band:

ChatGPT Go at $8/mo — OpenAI's cheap tier, worldwide since January 2026. GPT-5.3 Instant (not the flagship), Custom GPTs, code interpreter, image generation, roughly 10× the message limits of the free tier. No Deep Research, no video, ad-supported. Pick Go if you specifically want Custom GPTs or you already live in the ChatGPT ecosystem.
NanoGPT at ~$8/mo — the wildcard. Token-generous subscription that works with OpenCode and OpenClaw, exposing open-source and some closed-source models through a single API. Reliability is the gotcha — rate limits hit unexpectedly, models go down. Fine for tinkering, not for production agents. Useful as a second sub for experimentation.

At this tier you're not buying a dev subscription — none of these include Codex, Claude Code, or a real agent harness. But for general chat, research, writing, and occasional image/video work, the picks are genuinely good for the price.

Use this when: you need flagship-model chat access on a tight budget and you don't need Claude Code or Codex.

Pricing tiers change fast in 2026

I track LLM pricing changes and tier restructures as they ship so you don't have to re-research every month.

API band — sub-$1/M: DeepSeek V3.2 vs Gemini 3.1 Flash-Lite#

Two winners in this band, on different axes.

DeepSeek V3.2 at $0.28/M input, $0.42/M output — the cheapest mainstream API on raw per-token cost. 64K context, no batch API. For pure cost optimization on workloads that fit inside 64K, still the cheapest viable option in 2026.

Gemini 3.1 Flash-Lite at $0.25/M input, $1.50/M output — cheaper than DeepSeek on input, more expensive on output, but you get 1M context, 363 tok/s output speed, and a real batch API at the same price tier. The only model here that hits cost + speed + context simultaneously.

The full sub-$1 picture, prices verified May 2026 against each provider's official pricing page:

Model	Context	Input/1M	Output/1M	Cached/1M	Speed	Notes
DeepSeek V3.2	64K	$0.28	$0.42	$0.028	Medium	Cheapest output; small context; no batch
Gemini 3.1 Flash-Lite	1M	$0.25	$1.50	$0.025	363 tok/s	1M context + speed; batch API
GPT-5 mini	400K	$0.25	$2.00	$0.025	Medium	Bigger context than DeepSeek, pricier output
Gemini 2.5 Flash	1M	$0.30	$2.50	$0.03	249 tok/s	Deprecated — migrate to Flash-Lite

Going from $2.50 → $1.50 per million output tokens (vs Gemini 2.5 Flash) saves $1/M. On a 100M-output-token workload that's $100. Real money but not why you switch — you switch for the speed (see below).

Still on Gemini 2.0 Flash or 2.0 Flash-Lite? Those are deprecated. Flash-Lite is your migration path. Model ID: gemini-3.1-flash-lite-preview — still preview, so production pipelines that can't tolerate model changes should wait for GA.

Claude Haiku 4.5 ($1.00/$5.00) sits just above this band — 4x more expensive on input, 3.3x more on output. Hard premium to justify against Flash-Lite unless you specifically need Haiku's ceiling or are already in the Anthropic ecosystem.

Use this when: running high-volume classification, translation, moderation, or chat at scale — either pure cost (DeepSeek V3.2) or cost + context + speed (Flash-Lite).

API band — $1-3/M: Kimi K2.5, Claude Haiku 4.5, GPT-5 mini#

"Cheap enough to scale, good enough to trust" lives here.

Kimi K2.5 ($0.60 in / $2.50 out) is the price/quality winner. Undercuts GPT-5.4 by 4-17x and Claude Sonnet 4.6 by 5-6x on equivalent token counts. For agentic coding workflows calling the model thousands of times an hour through OpenCode or OpenClaw, savings compound fast. Strong for repetitive, constrained, easy-to-validate work; weak on ambiguous architecture decisions and risk-heavy patches.

Claude Haiku 4.5 ($1.00 / $5.00) is the Anthropic-ecosystem pick. Faster and cheaper than Sonnet with the same instruction-following discipline. The natural step down if you're already wired into Anthropic's API.

GPT-5 mini ($0.25 / $2.00) straddles this band and the one below. Cheapest input in the $1-3 tier, 400K context, "medium" speed profile noticeably slower than Flash-Lite. The "OpenAI ecosystem on a budget" pick.

Model	Input/1M	Output/1M	Context	Best for
Kimi K2.5	$0.60	$2.50	256K	Agentic coding at OpenCode/OpenClaw scale
Claude Haiku 4.5	$1.00	$5.00	200K	Anthropic ecosystem, Sonnet-lite tasks
GPT-5 mini	$0.25	$2.00	400K	OpenAI ecosystem on a budget

Use this when: the workload is too quality-sensitive for Flash-Lite or DeepSeek but doesn't need Sonnet/Opus-tier reasoning.

API band — $3+/M: Claude Sonnet 4.6, Opus 4.6/4.7, Gemini 3.1 Pro#

The premium tier. You're paying for frontier coding, multimodal flagship reasoning, or both.

Claude Sonnet 4.6 ($3 / $15) — the production-grade default for serious coding. Strong on Aider-leaderboard tasks, strong inside Claude Code, the Anthropic agent ecosystem ties together cleanly. The workhorse if you can't justify Opus on every call but want Anthropic's instruction-hierarchy discipline.

Claude Opus 4.6 / 4.7 ($5 / $25) — the frontier ceiling. Opus 4.7 ships measurable lifts over 4.6 on SWE-Pro and the broader benchmark sweep (Anthropic announcement), plus 4 breaking API changes worth reading before you migrate. The xhigh effort level unlocks harder reasoning at extra compute. Pay for this when the task warrants — multi-file refactors, agentic workflows that survive 90+ minutes of context, ambiguous debugging where wrong costs you the day.

Gemini 3.1 Pro ($2.50 / $12, tiered above 200K) — the multimodal flagship and the cheapest model in the $3+ tier. Above 200K context, pricing jumps to $4/$18 (verifiable on Google's pricing page). 1M context, strong reasoning, weak as a pure agentic-coding driver vs Claude. Best for tasks Claude can't do (vision, video, deep PDF) or when you need 1M context cheap.

Model	Input/1M	Output/1M	Context	Best for
Claude Sonnet 4.6	$3	$15	200K / 1M	Production coding default
Claude Opus 4.6 / 4.7	$5	$25	200K / 1M	Frontier agentic + reasoning ceiling
Gemini 3.1 Pro	$2.50 (≤200K) / $4 (>200K)	$12 / $18	1M	Multimodal + cheap long-context

Use this when: the task is in production, the model gets called on hard problems, and the cost of getting it wrong exceeds the cost of paying for the premium tier.

Flash-Lite review folded in: 363 tok/s and when speed beats price#

Flash-Lite gets its own section because the speed delta over Gemini 2.5 Flash is unusually large and the use cases where it matters are distinct from "pick the cheapest model."

The numbers: 363 tokens/second output vs 249 tok/s for Gemini 2.5 Flash — a 45% throughput jump. Time-to-first-token is 2.5x faster. Benchmarks that justify taking it seriously: GPQA Diamond 86.9%, LiveCodeBench 72.0%, Arena Elo 1432 (cross-reference Arena.ai and livecodebench.github.io for current standings). 86.9% on GPQA Diamond from a budget model puts Flash-Lite at ~92% of Gemini 3.1 Pro's reasoning (94.3%) at ~10% of the price.

Against the rest of the budget tier on a single reasoning benchmark:

Benchmark	Flash-Lite	Gemini 2.5 Flash	GPT-5 mini	Claude Haiku 4.5
GPQA Diamond	86.9%	72.8%	68.4%	69.2%
MMMU-Pro	76.8%	67.1%	63.8%	64.5%

Where 45% more speed actually matters:

Real-time chat with streaming. Users notice the gap between 249 and 363 tok/s. Snappier responses, first token shows up faster.
Voice AI and telecom. Call-center AI and voice agents live or die by the gap between a customer finishing their sentence and the AI responding. 2.5x faster TTFT can mean the difference between a natural conversation and a robotic pause that makes people hang up.
High-volume classification and routing. Throughput is money in content moderation, intent classification, and ticket routing. 45% more tokens per second is 45% more requests on the same infrastructure.

Where speed doesn't matter: batch processing, async pipelines, anything already waiting on external APIs. If you're using the Batch API (50% discount), you've opted out of caring about latency — go cheaper.

What Flash-Lite is not for: complex multi-step reasoning (use Gemini 3.1 Pro or Claude Sonnet 4.6), long-form content where the quality ceiling matters more than speed, agentic workflows with heavy tool use, tasks where accuracy is non-negotiable.

Use this when: latency is your bottleneck, not cost — or when you need 1M context cheap and can tolerate a preview-tier model.

Routing patterns: Flash-Lite for volume, Pro for hard tasks#

The canonical 2026 multi-model stack isn't "pick one model and stick with it." It's a router that sends each request to the cheapest model that can handle it.

The simplest pattern that actually works:

Default → Flash-Lite (or DeepSeek V3.2 if context fits in 64K). Classification, extraction, summarization, simple generation, routine code review — anything well-bounded with a small answer space.
Escalate to mid-tier (Kimi K2.5, GPT-5 mini, Haiku 4.5) on confidence drop. If the default model returns "I'm not sure" responses, tool-use fails, or output validation rejects, kick it up.
Escalate to premium (Sonnet 4.6, Opus 4.6/4.7, Gemini 3.1 Pro) on critical-path tasks. Production code, agentic multi-step workflows, anything where being wrong is expensive.

For agentic workflows specifically: route the planning step to a high-quality model (Opus 4.7 or GPT-5.4 Pro), execute sub-tasks on a cheap fast model (Flash-Lite or Kimi K2.5), reserve premium for judgment-call steps. The volume-tier savings subsidize the premium spend.

The reason this works in 2026: the gap between cheap and premium on simple tasks has collapsed. GPQA Diamond at 86.9% on Flash-Lite means a $0.25-input model gets PhD-level science reasoning right ~87% of the time. The premium tier only earns its premium on the hard 13%. A router that knows which 13% is the entire optimization.

If you're not sure where the cutoffs live for your workload, my framework for identifying the best model for your work walks through A/B testing model selection without trusting benchmarks. Don't pick router thresholds from a leaderboard — measure them on your tasks.

How to connect: OpenRouter, Continue, Aider#

Most of the API picks above route through a single aggregator. OpenRouter (or Together AI for OSS models) is the lowest-friction approach: one API key, instant model switching, no juggling accounts. Grab a key from OpenRouter.ai, then:

API Base URL: https://openrouter.ai/api/v1
Model names: deepseek/deepseek-v3.2, google/gemini-3.1-flash-lite-preview, moonshotai/kimi-k2.5, anthropic/claude-sonnet-4.6, etc.

VS Code with Continue — install the extension, open the config with Ctrl+Shift+P → "Continue: Open Config File":

models:
  - name: Flash-Lite (OpenRouter)
    provider: openrouter
    model: google/gemini-3.1-flash-lite-preview
    apiKey: <YOUR_OPENROUTER_API_KEY>
    apiBase: https://openrouter.ai/api/v1
    defaultCompletionOptions:
      contextLength: 1000000
      maxTokens: 65536
    roles: [chat, edit, apply]

  - name: Kimi K2.5 (OpenRouter)
    provider: openrouter
    model: moonshotai/kimi-k2.5
    apiKey: <YOUR_OPENROUTER_API_KEY>
    apiBase: https://openrouter.ai/api/v1

Command line with Aider — ~/.aider.conf.yml:

model: openrouter/google/gemini-3.1-flash-lite-preview
api_base: https://openrouter.ai/api/v1
api_key: env:OPENROUTER_API_KEY

Then export OPENROUTER_API_KEY="YOUR_KEY" && aider /path/to/your/files.

Cost calculator concept — to sanity-check which model you actually want, compute monthly spend at real usage:

monthly_cost = (input_tokens × input_price_per_1M / 1_000_000)
             + (output_tokens × output_price_per_1M / 1_000_000)

An average active developer running ~5.2M input + ~2.6M output tokens/month sees roughly: DeepSeek V3.2 $2.55/mo, Flash-Lite $5.20/mo, Kimi K2.5 $9.62/mo, Sonnet 4.6 $54.60/mo, Opus 4.7 $91.00/mo. That's why "subscription vs API" isn't really a question for working devs on Opus-class models — API spend at real usage outpaces a $200 Max 20x sub fast. For Flash-Lite or Kimi, the API stays cheaper than any subscription unless you're doing 50M+ output tokens/month.

The 2025 leftover: DeepSeek R1 as historical context#

A note for anyone landing here from old "best budget coding LLM" results: DeepSeek R1 isn't the answer in 2026.

In mid-2025, R1 was the budget winner — 71.4% on the Aider leaderboard at roughly $8.55/month on an average developer workload, with $0.55/M input and $2.19/M output pricing. A $0.12 cost-per-Aider-point that nothing else in its price range could touch.

The 2026 reality: DeepSeek V3.2 ($0.28/$0.42) is the current DeepSeek pick. Flash-Lite ($0.25/$1.50) didn't exist when R1 launched. Kimi K2.5 ($0.60/$2.50) and the broader OSS agentic-coding category reset the price floor. R1's 128K context looks small next to Flash-Lite's 1M and Kimi's 256K. And the Aider leaderboard has been overtaken by more aggressive benchmarks (LiveCodeBench, SWE-Bench Pro, Terminal-Bench 2.0).

Existing R1 setup that works still works. Picking fresh in 2026, go to Flash-Lite or DeepSeek V3.2 first. R1 is archived here as historical context.

For prompting techniques that close the gap between budget and premium models (cheaper model, more your prompt matters), see my Chain-of-Thought vs zero-shot guide.

FAQ#

Frequently Asked Questions

No. Claude Pro gives you Opus 4.6 access, but the usage cap on the five-hour reset window burns out in 30-40 minutes of real agentic coding. After that you're locked out entirely — no Sonnet fallback, no Claude Code — until the window resets. For developer work that leans on Opus you need Max 5x or Max 20x. Pro is fine for chat and writing; it's a trap if you bought it expecting unlimited Opus.

Both, for most working developers. The crossover: if daily Claude Code or Codex usage would burn more than the sub tier in API tokens, pay for the sub (Max 20x is ~90% discount vs raw API on Opus power-user volume). For high-volume production on cheaper models like Flash-Lite or Kimi K2.5, the API stays cheaper because you control the prompt budget. Serious users run both — a Max sub for daily coding plus OpenRouter API access for production routing.

How to identify the best model for your work — the A/B testing framework for picking models without trusting leaderboards.
Claude Code setup: CLAUDE.md, MCPs, skills — getting more out of your Claude Max subscription.
OpenAI Codex setup: agents.md, MCPs, skills — getting more out of your ChatGPT Pro subscription.
Chain-of-Thought vs zero-shot prompting — how prompting strategy closes the budget vs premium model gap.
Provider pricing pages: Anthropic, OpenAI, Google, DeepSeek, Moonshot/Kimi, z.ai/GLM.

The Cheapest LLM Worth Using in 2026: A Tier-by-Tier Breakdown of API and Subscription Pricing

The 30-second answer: master pricing table#

How the price-band cutoffs work in 2026#

Subscription band — $200/mo winner: Claude Max 20x#

Subscription band — $100/mo winner: Claude Max 5x#

Subscription band — $20/mo winner: ChatGPT Plus#

Subscription band — under $20/mo winner: Google AI Plus#

Pricing tiers change fast in 2026

API band — sub-$1/M: DeepSeek V3.2 vs Gemini 3.1 Flash-Lite#

API band — $1-3/M: Kimi K2.5, Claude Haiku 4.5, GPT-5 mini#

API band — $3+/M: Claude Sonnet 4.6, Opus 4.6/4.7, Gemini 3.1 Pro#

Flash-Lite review folded in: 363 tok/s and when speed beats price#

Routing patterns: Flash-Lite for volume, Pro for hard tasks#

How to connect: OpenRouter, Continue, Aider#

The 2025 leftover: DeepSeek R1 as historical context#

FAQ#

Frequently Asked Questions

Enjoyed this post?

Related Posts

How to Identify the Best AI Model for Your Work (Beyond Benchmarks)

Claude Code Setup: CLAUDE.md, MCPs, Skills (Definitive Guide)

OpenAI Codex Setup: AGENTS.md, MCPs, Skills (Definitive Guide 2026)

The 30-second answer: master pricing table#

How the price-band cutoffs work in 2026#

Subscription band — $200/mo winner: Claude Max 20x#

Subscription band — $100/mo winner: Claude Max 5x#

Subscription band — $20/mo winner: ChatGPT Plus#

Subscription band — under $20/mo winner: Google AI Plus#

Pricing tiers change fast in 2026

API band — sub-$1/M: DeepSeek V3.2 vs Gemini 3.1 Flash-Lite#

API band — $1-3/M: Kimi K2.5, Claude Haiku 4.5, GPT-5 mini#

API band — $3+/M: Claude Sonnet 4.6, Opus 4.6/4.7, Gemini 3.1 Pro#

Flash-Lite review folded in: 363 tok/s and when speed beats price#

Routing patterns: Flash-Lite for volume, Pro for hard tasks#

How to connect: OpenRouter, Continue, Aider#

The 2025 leftover: DeepSeek R1 as historical context#

FAQ#

Frequently Asked Questions

Related reading#

Enjoyed this post?

Related Posts

How to Identify the Best AI Model for Your Work (Beyond Benchmarks)

Claude Code Setup: CLAUDE.md, MCPs, Skills (Definitive Guide)

OpenAI Codex Setup: AGENTS.md, MCPs, Skills (Definitive Guide 2026)