Choosing Models
AI tools give you access to a range of large language models (LLMs). Not every task needs the most powerful model. Picking the right model for the job saves cost, reduces latency and lowers environmental impact.
Model capabilities improve on a cycle of weeks, not years. Specific model recommendations date quickly, so this guidance focuses on principles. Check your tool’s model selector regularly — providers frequently add new models and retire older ones.
Quick start: For general-purpose coding tasks, start with Claude Sonnet 4.5 or 4.6. It handles most day-to-day development well — generating functions, writing tests, reviewing code and refactoring — at a good balance of quality and cost.
Model tiers
Most tools group their models into broad capability tiers. The names differ between providers but the pattern is consistent.
| Tier | Good for | Examples at time of writing |
|---|---|---|
| Lightweight | Autocomplete, simple refactors, boilerplate, quick Q&A | Claude Haiku, GPT-4o mini, Gemini Flash |
| Mid-range | Generating functions and tests, code review, documentation, language conversion | Claude Sonnet, GPT-4o, Gemini Pro |
| Advanced | Multi-file architecture, requirements generation, complex debugging, security analysis, large refactors | Claude Opus, GPT Codex, Gemini Pro with thinking, o-series reasoning models |
Lightweight models are fast and cheap. Mid-range models suit most day-to-day coding — they produce good results when you provide clear context through rules, instructions and requirements. Advanced models consume significantly more tokens per request. Use them intentionally.
Match the model to the task
| Task | Tier | Why |
|---|---|---|
| Autocomplete while typing | Lightweight | Speed matters more than depth |
| Generate a function from a clear spec | Mid-range | Good enough with clear context |
| Write unit tests for existing code | Mid-range | Pattern-based, well scoped |
| Generate product requirements from an idea | Advanced | Needs deep reasoning and structure |
| Debug a cross-service integration failure | Advanced | Requires broad context analysis |
| Refactor an entire module to a new pattern | Advanced | Multi-file reasoning required |
| Scaffold a new route with validation | Mid-range | Repeatable, template-driven |
Cost and token awareness
Every request consumes tokens. More capable models cost more per token and reasoning models use additional “thinking” tokens internally.
- Completions are cheap — they run constantly but use lightweight models
- Chat requests vary — a short question costs far less than a prompt with an entire requirements document attached
- Agentic workflows (multi-step tasks where the AI plans and executes) can consume large amounts of tokens across many calls
To manage cost:
- Start with a mid-range model — move to advanced only when the result is not good enough
- Write clear, scoped prompts — a focused prompt gets a better answer in fewer tokens
- Use rules and instructions files for standing context rather than repeating conventions in every prompt
- Break large tasks into smaller steps
Thinking and reasoning models
Some providers offer “thinking” or “reasoning” variants that spend extra tokens working through a problem step by step before responding. Use them for multi-step logic, design trade-offs, and complex requirements generation. Avoid them for simple completions or short factual questions where speed matters more than depth.
Models for coding (February 2026)
This snapshot will date. Use it as a starting point, not a permanent reference.
Strong code generation models exist across multiple providers. For complex coding tasks — architecture, multi-file refactors, generating requirements — Anthropic’s Claude Opus and Sonnet, OpenAI’s GPT Codex family, and Google’s Gemini Pro models all perform well. For day-to-day coding with clear context, mid-range models like Claude Sonnet and GPT-4o are cost-effective workhorses. For inline completions and autocomplete, lightweight models like Claude Haiku, GPT-4o mini and Gemini Flash are fast and cheap.
Most AI coding tools — including GitHub Copilot, Cursor, Windsurf and Claude Code — let you select which model to use. Check your tool’s documentation for how to switch.
Key points:
- no single provider consistently leads across all tasks — try different models for different scenarios
- staying on the latest available version within your tool is usually the best default
- free-tier and included models are often sufficient for completions and simple chat
- advanced models are worth the extra cost for requirements generation, architecture and large refactors
- share findings with your team when a model works well for a specific task