LLM Token Counter & API Cost Estimator

Estimate the token count of any text and see estimated API costs across 10 major language models at a glance. Essential for prompt engineering and budget planning.

token-counter.tool
Paste text above to count tokens and estimate costs.

What Is a Token in LLMs

Language models do not process text character by character — they process tokens, which are chunks of text corresponding to common words, word fragments or punctuation. The word 'running' might be one token. 'Unbelievable' might be split into 'un', 'believ', 'able' — three tokens. On average, one token corresponds to about 4 characters or 0.75 words in English. For code, JSON and structured text, token counts can be significantly higher per character.

Why Token Counting Matters

API pricing is per token. A 128,000-token context window has a cost ceiling. Long prompts consume context leaving less room for output. Understanding your prompt's token count helps you optimise costs, avoid hitting context limits, and design efficient prompts. This tool uses a cl100k_base approximation — close to GPT-4 and Claude tokenisation but not identical to any specific model.

Frequently Asked Questions

The estimate is approximate — accurate to within 10-20% for typical English text. Tokenisation varies by model: GPT-4 uses cl100k_base, Claude uses its own tokeniser, Gemini uses another. For exact counts, use the model provider's official tokeniser (tiktoken for OpenAI, or the API's token counting endpoint). This tool is best for budgeting and rough planning.
Code contains more punctuation, unusual character combinations, and specific token vocabulary that causes more splits. A line of Python code like 'for item in items:' might be 6-8 tokens while 6-8 English words is also 6-8 tokens — but compact code often has more semantic content per word, causing higher density.
The context window is the maximum amount of text (measured in tokens) a model can process at once — both your input and its output. GPT-4 Turbo has 128K tokens. Claude 3 has 200K. Gemini 1.5 Pro has 1M tokens. Your prompt plus the model's response must fit within this limit. Longer context enables more complex tasks but costs more.
Be concise — remove filler phrases and unnecessary context. Use structured formats (numbered lists, JSON) which often token-encode efficiently. Summarise long documents rather than pasting them in full. Avoid repeating the same instructions in multiple ways. Remove examples once the model demonstrates understanding.
Larger models (more parameters) require more computation per token. GPT-4 costs more than GPT-3.5 because it is larger and more capable. Input tokens (your prompt) are cheaper than output tokens (the model's response) because generating each output token is computationally more intensive than reading input.