AI Productivity Cost-Optimization
View as Markdown Suggest changes

Strategic Model Selection in Cursor: Balancing Cost and Performance

· Reading time: 3 min
Strategic Model Selection in Cursor: Balancing Cost and Performance

Checking Cursor’s pricing and a leaderboard back and forth is tedious; using “the best” model for everything can lead to surprisingly high bills. A simpler approach: one stronger model for planning, one cheaper model for execution.

The two-model strategy

Planning (understanding the task, designing steps) benefits from strong reasoning—e.g. Claude Sonnet/Opus. You send context and get back a plan and a few key decisions; token volume is modest, so the extra cost is often worth it.

Execution (implementing the plan, writing code) can be done well by cheaper models like Gemini Flash or GPT-5 Mini when the plan is clear. This phase uses many more tokens, so keeping cost per token low matters.

  1. Use a premium model for planning: start the task, get a clear plan, maybe one or two critical edits.
  2. Switch to a cheaper model for execution: implement the rest, iterate, run tests.

You avoid both the “everything on the best model” bill and the “everything on the cheapest” quality hit.

Why it works

Planning is input-heavy (lots of context in, compact plan out); execution is output-heavy (lots of code generated). Benchmarks like BigCodeBench and Arena Code show that mid-tier models are close to the top on code tasks at a fraction of the cost. So: strong reasoning where it matters, lower cost where most tokens are spent.

Cost vs performance at a glance

The chart below plots cost (weighted $/1M tokens: 70% input, 30% output) vs benchmark performance. Data comes from Cursor’s pricing and public benchmarks; the workflow updates it daily.

Loading chart…

How to read it: Lower left = cheaper/weaker, upper right = pricier/stronger. Pick a planning model from the upper-right (e.g. Claude 4.5 Sonnet, GPT-5.2) and an execution model from the lower half (e.g. Gemini 3 Flash, GPT-5 Mini).

  • Planning: Claude 4.5 Sonnet/Opus or GPT-5.2 / GPT-5.2 Codex.
  • Execution: Gemini 3 Flash, Gemini 2.5 Flash, or GPT-5 Mini.

Use a slightly stronger execution model for tricky files, or a cheaper planner for simple tasks.

Example workflow

  1. Start with the planning model. Describe the goal, attach files, ask for a step-by-step plan.
  2. Lock in the plan. Review, maybe one short follow-up, then switch model.
  3. Switch to the execution model. Refer to the plan and implement step by step; do most coding here.
  4. Use the planning model only when needed. For design decisions or subtle bugs, switch back briefly, then return to the cheaper model.

Rough cost intuition

With ~100k input + 50k output for planning and 200k input + 150k output for execution: all on a premium model (e.g. Claude 4.5 Sonnet) can be several dollars per session; planning on premium, execution on Gemini 3 Flash can be around a dollar or less. Moving the high-token phase to a cheaper model cuts cost a lot.

Takeaways

  • Use one stronger model for planning and one cheaper model for execution instead of one model for everything.
  • The chart is updated daily—use it to check cost vs performance without tab-hopping.
  • Pick planning from the upper-right of the chart, execution from the lower half, and switch as you move from planning to coding.

AI Chat

Ask me anything about Daniel's experience, skills, or background!