Strategic Model Selection in Cursor: Balancing Cost and Performance
Checking Cursor’s pricing and a leaderboard back and forth is tedious; using “the best” model for everything can lead to surprisingly high bills. A simpler approach: one stronger model for planning, one cheaper model for execution.
The two-model strategy
Planning (understanding the task, designing steps) benefits from strong reasoning—e.g. Claude Sonnet/Opus. You send context and get back a plan and a few key decisions; token volume is modest, so the extra cost is often worth it.
Execution (implementing the plan, writing code) can be done well by cheaper models like Gemini Flash or GPT-5 Mini when the plan is clear. This phase uses many more tokens, so keeping cost per token low matters.
- Use a premium model for planning: start the task, get a clear plan, maybe one or two critical edits.
- Switch to a cheaper model for execution: implement the rest, iterate, run tests.
You avoid both the “everything on the best model” bill and the “everything on the cheapest” quality hit.
Why it works
Planning is input-heavy (lots of context in, compact plan out); execution is output-heavy (lots of code generated). Benchmarks like BigCodeBench and Arena Code show that mid-tier models are close to the top on code tasks at a fraction of the cost. So: strong reasoning where it matters, lower cost where most tokens are spent.
Cost vs performance at a glance
The chart below plots cost (weighted $/1M tokens: 70% input, 30% output) vs benchmark performance. Data comes from Cursor’s pricing and public benchmarks; the workflow updates it daily.
Loading chart…
How to read it: Lower left = cheaper/weaker, upper right = pricier/stronger. Pick a planning model from the upper-right (e.g. Claude 4.5 Sonnet, GPT-5.2) and an execution model from the lower half (e.g. Gemini 3 Flash, GPT-5 Mini).
Recommended pairings
- Planning: Claude 4.5 Sonnet/Opus or GPT-5.2 / GPT-5.2 Codex.
- Execution: Gemini 3 Flash, Gemini 2.5 Flash, or GPT-5 Mini.
Use a slightly stronger execution model for tricky files, or a cheaper planner for simple tasks.
Example workflow
- Start with the planning model. Describe the goal, attach files, ask for a step-by-step plan.
- Lock in the plan. Review, maybe one short follow-up, then switch model.
- Switch to the execution model. Refer to the plan and implement step by step; do most coding here.
- Use the planning model only when needed. For design decisions or subtle bugs, switch back briefly, then return to the cheaper model.
Rough cost intuition
With ~100k input + 50k output for planning and 200k input + 150k output for execution: all on a premium model (e.g. Claude 4.5 Sonnet) can be several dollars per session; planning on premium, execution on Gemini 3 Flash can be around a dollar or less. Moving the high-token phase to a cheaper model cuts cost a lot.
Takeaways
- Use one stronger model for planning and one cheaper model for execution instead of one model for everything.
- The chart is updated daily—use it to check cost vs performance without tab-hopping.
- Pick planning from the upper-right of the chart, execution from the lower half, and switch as you move from planning to coding.