Strategic Model Selection in Cursor: Balancing Cost and Performance

Checking Cursor’s pricing and a leaderboard back and forth is tedious; using “the best” model for everything can lead to surprisingly high bills. A simpler approach: one stronger model for planning, one cheaper model for execution.

The two-model strategy

Planning (understanding the task, designing steps) benefits from strong reasoning—e.g. Claude Sonnet/Opus. You send context and get back a plan and a few key decisions; token volume is modest, so the extra cost is often worth it.

Execution (implementing the plan, writing code) can be done well by cheaper models like Gemini Flash or GPT-5 Mini when the plan is clear. This phase uses many more tokens, so keeping cost per token low matters.

Use a premium model for planning: start the task, get a clear plan, maybe one or two critical edits.
Switch to a cheaper model for execution: implement the rest, iterate, run tests.

You avoid both the “everything on the best model” bill and the “everything on the cheapest” quality hit.

Why it works

Planning is input-heavy (lots of context in, compact plan out); execution is output-heavy (lots of code generated). Benchmarks like BigCodeBench and Arena Code show that mid-tier models are close to the top on code tasks at a fraction of the cost. So: strong reasoning where it matters, lower cost where most tokens are spent.

Cost vs performance at a glance

The chart below plots cost (weighted $/1M tokens: 70% input, 30% output) vs benchmark performance. Data comes from Cursor’s pricing and public benchmarks; the workflow updates it daily.

Loading chart…

How to read it: Lower left = cheaper/weaker, upper right = pricier/stronger. Pick a planning model from the upper-right (e.g. Claude 4.5 Sonnet, GPT-5.2) and an execution model from the lower half (e.g. Gemini 3 Flash, GPT-5 Mini).

Recommended pairings

Planning: Claude 4.5 Sonnet/Opus or GPT-5.2 / GPT-5.2 Codex.
Execution: Gemini 3 Flash, Gemini 2.5 Flash, or GPT-5 Mini.

Use a slightly stronger execution model for tricky files, or a cheaper planner for simple tasks.

Example workflow

Start with the planning model. Describe the goal, attach files, ask for a step-by-step plan.
Lock in the plan. Review, maybe one short follow-up, then switch model.
Switch to the execution model. Refer to the plan and implement step by step; do most coding here.
Use the planning model only when needed. For design decisions or subtle bugs, switch back briefly, then return to the cheaper model.

Rough cost intuition

With ~100k input + 50k output for planning and 200k input + 150k output for execution: all on a premium model (e.g. Claude 4.5 Sonnet) can be several dollars per session; planning on premium, execution on Gemini 3 Flash can be around a dollar or less. Moving the high-token phase to a cheaper model cuts cost a lot.

Takeaways

Use one stronger model for planning and one cheaper model for execution instead of one model for everything.
The chart is updated daily—use it to check cost vs performance without tab-hopping.
Pick planning from the upper-right of the chart, execution from the lower half, and switch as you move from planning to coding.