LLM Token-to-Cost Calculator: GPT-4o, Claude, Gemini Pricing (2026)

Token pricing is the closest thing the AI industry has to a unit-economics number, and yet most developers run their app for weeks before they look at it. Paste a prompt below and see what one call — or a million of them — would cost across 16 models.

LLM Token-to-Cost Calculator

Estimate API cost across 16 models. Paste a prompt or enter counts; cost updates live.

Your prompt or document

0 chars · 0 words · 0 tokens (est.)

Expected output tokens
How long is the model’s reply?

Scale

calls per

Quick presets

Cheapest — $0.00

Cost per call Range across all models $0.00 – $0.00

Most expensive — $0.00

Cost per model

Sort by

How AI token pricing actually works

Every major LLM provider charges per token — a fragment of a word that the model’s tokeniser produces when it reads or writes text. A rough rule of thumb that holds across OpenAI, Anthropic, and Google tokenisers is roughly four characters per token, or about three-quarters of a token per word for English prose. Code, JSON, and non-Latin scripts tokenise more densely (more tokens per character), so a calculator like this gives you a working estimate, not a final invoice.

Almost all vendors price input tokens (what you send in: system prompt + user message + chat history + retrieved context) and output tokens (what the model writes back) at different rates. Output is typically 3–5× more expensive than input. For chat-style usage the cost is dominated by input. For long-form generation, it flips.

The 1,000× spread. The cheapest model in this calculator costs roughly 1/200th of the most expensive one for the same task. Switching from a flagship model to its smaller sibling for the 80% of calls that don’t need a flagship is the single biggest cost lever in production AI.

How to use the calculator

Paste a representative prompt into the text box, or switch to the “Enter counts” tab if you already know your token counts.
Set the expected output tokens — how long the model’s response usually is. A chat reply is ~200, a summary ~400, a long explanation ~1,000+.
Scale it up using the calls-per-period control. “100 calls per day” projects a monthly bill; “1 call” gives you a single API-call price.
Pick a quick preset if you want a starting point: chatbot reply, document summary, code generation, or a typical RAG question.
Sort by total cost to see the cheapest model first, then re-sort by vendor or input price to compare a specific family of models.

Realistic cost examples

Chatbot — short reply, light context

~600 input tokens (system prompt + last few turns) · 200 output tokens. Per-call cost on Claude Haiku 3.5 is about $0.0013; on GPT-4o it’s $0.0035; on Claude Opus 4 it’s $0.024. Sounds tiny — multiply by 100k calls and the gap becomes $130 vs $2,400 a month for the same workload.

Document summary — long input, short output

~4,000 input tokens (a long article) · 400 output tokens (a tight summary). This is where the input/output asymmetry bites: input dominates. Gemini 1.5 Flash runs about $0.0004/call here, GPT-4o about $0.014, Claude Opus 4 about $0.090. Models with cheap input win hard on summarisation.

Code generation — medium input, long output

~1,500 input tokens (instructions + file contents) · 1,000 output tokens (refactored file). Output dominates. Claude Sonnet 4 lands around $0.020/call; GPT-4o around $0.014; the smaller models trade quality for an order of magnitude saving.

RAG question — huge input, small output

~8,000 input tokens (retrieved chunks + question) · 250 output tokens (concise answer). The classic enterprise pattern. Input cost is everything; pick a model with cheap input or trim your retrieved context, or both. Gemini 1.5 Pro’s long-context pricing tier matters here.

How to reduce your token bill

Use the cheapest model that meets the bar. Many tasks don’t need a flagship. Run an eval comparing Haiku 3.5 / GPT-4o mini / Gemini Flash against your current model on a hundred real examples — you’ll often find the cheap model is fine.
Trim the system prompt. A 2,000-token system prompt sent on every call is a $20 bill per million calls on GPT-4o that you may not even need. Move static instructions to the model’s training, fine-tuning, or a tighter, shorter prompt.
Cap output length. Set max_tokens deliberately. Letting a verbose model ramble for 1,000 tokens when 200 would do is a 5× cost multiplier on the output side.
Use prompt caching. Anthropic, OpenAI, and Google all now offer cached-input pricing at 50–90% off for repeated prompts — huge for chat apps with long stable system prompts.
Batch where latency doesn’t matter. Most providers offer 50% off via a batch API for jobs you can wait minutes/hours for (overnight enrichments, evals, classification).
Route smartly. Send easy queries to a cheap model and escalate only the hard ones to a flagship. Even a simple keyword-based router can cut blended cost 60–80%.

Frequently asked questions

How accurate is the token estimate?

The calculator uses ~4 characters per token, which is the average across modern English tokenisers (cl100k_base, Claude, Gemini). For English prose it lands within roughly ±10% of the true token count. For code or non-Latin text it can be off by more; if you need exact numbers, run the vendor’s official tokeniser on your text and paste the count into the “Enter counts” tab.

Why are input and output priced differently?

Output requires the model to generate one token at a time auto-regressively, which is the compute-heavy step. Input is processed in parallel during the initial “prefill” and is cheaper per token. Most vendors price output at 3–5× input.

Are the prices in this calculator current?

Prices are public API list prices as of May 2026, in USD per million tokens. Vendors adjust pricing several times a year, so verify against the vendor pricing page before committing to a model in production. The hardcoded prices live in the calculator’s script if you want to edit them.

What about prompt caching, batch discounts, and volume tiers?

The calculator shows standard pay-as-you-go list prices. If you use cached input (Anthropic prompt caching, OpenAI cached input, Google context caching) or the batch API, your effective cost can drop 50–90% on cacheable portions. Large-volume committed-spend customers can also negotiate down. Treat the calculator output as a ceiling.

Does this work for image, audio, or video tokens?

No — this calculator covers text tokens only. Multi-modal pricing has its own per-image or per-second rates that differ wildly between vendors. A separate calculator for multi-modal pricing is on the roadmap.

The bottom line

If you’re building anything serious on top of LLM APIs, the token-cost spread between models is the single biggest variable in your unit economics. Before you ship, run the cheapest plausible model through a real eval and compare it to the flagship you’re defaulting to. The savings compound from the day you switch — and the calculator above is meant to be the back-of-the-envelope you reach for every time someone in a meeting asks “what would that cost in production?”

Stacking AI subscriptions too?

If you also pay for ChatGPT Plus, Claude Pro, Cursor, or any other AI tool, the AI Subscription Cost Calculator tallies them all in one place.

Open the subscription calculator

LLM Token-to-Cost Calculator (2026) — GPT-4o, Claude, Gemini & 13 More Models