Claude Opus 4.6 Vs. 4.7: The Upgrade That Isn't Free

When Anthropic announced Claude Opus 4.7 on April 16, 2026, just 70 days after Opus 4.6 shipped, the headline seemed almost too good to be true.

The same $5/$25 per million token pricing. The same 1 million token context window. But better performance, sharper vision, and stronger coding abilities.

As it turns out, the fine print matters. Opus 4.7 is a genuinely better model for complex, agentic tasks. But Anthropic quietly introduced something developers are already calling “token inflation,” and it’s changing the cost calculus for anyone running these models in production.

Let’s break down what actually changed between Opus 4.6 and 4.7, and what it means for your budget and workflows.

The Quick Summary

Features	Claude Opus 4.6	Claude Opus 4.7
Release Date	February 5, 2026	April 16, 2026
Sticker Price	$5 / $25 per MTok	$5 / $25 per MTok
Context Window	1M tokens	1M tokens
Max Output	128k tokens	128k tokens
Tokenizer	Previous version	Updated (1.0–1.35× more tokens)
Vision Resolution	1,568px / 1.15MP	2,576px / 3.75MP
Effort Levels	low, medium, high, max	low, medium, high, xhigh, max
temperature/top_p/top_k	Supported	Removed (returns 400 error)
Thinking Mode	Enabled with budget tokens	Adaptive only, off by default

What Got Better: The Real Improvements

1. Software Engineering Gains

Opus 4.7 shows substantial improvements on real-world coding benchmarks, making it Anthropic’s most capable model for autonomous development work.

Benchmark	Opus 4.6	Opus 4.7	Improvement
SWE-bench Verified	80.8%	87.6%	+6.8 points
SWE-bench Pro	53.4%	64.3%	+10.9 points
CursorBench	58%	70%	+12 points
Rakuten-SWE-Bench	Baseline	3× more tasks resolved	200% increase

The CursorBench jump from 58% to 70% is particularly meaningful, it measures a model’s ability to perform autonomous multi-file edits inside an IDE.

For teams building AI coding agents, this is the difference between a model that needs constant supervision and one that can actually ship work.

2. Vision: A Transformative Upgrade

Opus 4.7 can now accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times the resolution of Opus 4.6.

The real-world impact shows up in benchmark results. XBOW, which builds autonomous penetration testing tools, reported their visual acuity benchmark jumped from 54.5% on Opus 4.6 to 98.5% on Opus 4.7.

That’s not incremental, it’s the difference between a model that can’t reliably read dense UI screenshots and one that absolutely can.

3. Instruction Following: Literal vs. Loose

Where Opus 4.6 sometimes interpreted instructions loosely or skipped parts of complex requests, Opus 4.7 takes instructions literally and completely.

It also verifies its own outputs before reporting back, reducing those “I’ve implemented the change” replies that turn out to be wrong at review time.

Migration warning: If your prompts were tuned for Opus 4.6’s looser behavior, you’ll likely need to re-tune them. This model is more precise now, but it means existing workflows may produce unexpected results until adjusted.

4. New `xhigh` Effort Level

Opus 4.7 introduces a new effort level called xhigh, positioned between high and max. Claude Code’s default effort was raised to xhigh for all plans on release day.

For most coding and agentic tasks, Anthropic recommends starting with high or xhigh. max has diminishing returns and can lead to overthinking.

The Token Inflation Problem

Here’s where things get complicated.

What Changed

Opus 4.7 uses an updated tokenizer that processes text differently than Opus 4.6 did. The same input text now maps to 1.0 to 1.35× more tokens, depending on content type. For dense code or system prompts, the increase can be even higher.

Simon Willison ran the Opus 4.7 system prompt through both tokenizers and found the 4.7 version used 7,335 tokens vs. 5,039 on 4.6, a 1.46× multiplier.

What This Means for Your Bill

The sticker price hasn’t changed: $5 per million input tokens, $25 per million output. But your effective cost per task can rise significantly:

Text-heavy prompts: Up to 35% more expensive
Dense code prompts: Closer to 35–46% more expensive
High-resolution images: Up to 3× more tokens (though you can downsample to control costs)

User-compiled data from the Tokenomics tool shows the average token increase across real-world prompts is around 38.6%.

Output Tokens: The Double Hit

Output tokens are five times more expensive than input tokens ($25 vs. $5 per million). Opus 4.7 also “thinks more” before responding, especially at higher effort levels, generating more output tokens on top of the input token inflation.

Breaking Changes: What Stops Working

If you’re migrating from Opus 4.6 to 4.7, these changes will break existing code unless updated:

1. Extended Thinking Payloads

Opus 4.6 format:

python

thinking={"type": “enabled”, "budget_tokens": 10000}

Opus 4.7 format:

python

thinking={"type": “adaptive”, “effort”: “high”}

2. Sampling Parameters Removed

Setting temperature, top_p, or top_k to any non-default value now returns a 400 error. Remove these parameters entirely and use prompting to guide behavior instead.

3. Thinking Content Hidden by Default

Opus 4.7 still performs chain-of-thought reasoning, but the visible text is omitted unless you explicitly opt in:

python

thinking={"type": “adaptive”, “effort”: “high”, “display”: “summarized”}

Benchmark Comparison: Full Table

Benchmark	Opus 4.6	Opus 4.7	Notes
SWE-bench Verified	80.8%	87.6%	+6.8 points
SWE-bench Pro	53.4%	64.3%	+10.9 points
Terminal-Bench 2.0	65.4%	69.4%	+4 points
CursorBench	58%	70%	+12 points
MCP-Atlas (tool use)	75.8%	77.3%	+1.5 points
OSWorld-Verified	72.7%	78.0%	+5.3 points
Finance Agent v1.1	60.1%	64.4%	+4.3 points
GPQA Diamond	91.3%	94.2%	+2.9 points
CharXiv Reasoning (vision)	84.7%	91.0%	+6.3 points

Sources: Anthropic, Cursor, Rakuten, Harvey, Databricks

System Prompt Changes

Anthropic publishes their Claude.ai system prompts, and the diff between 4.6 and 4.7 reveals some interesting shifts:

Added:

Claude in Chrome (browsing agent), Claude in Excel, Claude in PowerPoint
Expanded child safety section with critical instruction tags
Tool search mechanism: models now call tool search before claiming they lack a capability
Guidance to be less verbose and more concise

Removed:

The explicit note that “Donald Trump is the current president” (the 4.7 model’s knowledge cut-off is January 2026, making this unnecessary)
Instructions to avoid saying “genuinely,” “honestly,” or “straightforward”
The section about avoiding emotes or asterisk actions

Migration Strategy: How to Move to 4.7

Step 1: Measure Your Actual Token Inflation

Don’t rely on the 1.0–1.35× range. Run a representative sample of your actual production prompts through both tokenizers to calculate your real multiplier.

Step 2: Update API Calls

Remove temperature, top_p, and top_k parameters
Update thinking payloads to the new adaptive format
Explicitly enable thinking display if your product shows reasoning traces

Step 3: Re-tune Your Prompts

Opus 4.7 takes instructions literally. If your prompts relied on the model “filling in the blanks,” add more explicit guidance.

Step 4: Start with Staged Rollout

Swap a small percentage of coding traffic to claude-opus-4-7, re-run your eval suite, measure token deltas alongside quality metrics, then promote gradually.

Step 5: Consider Keeping Opus 4.6 as a Fallback

Given the breaking API changes, decouple your application logic from specific model versions so you can switch between 4.6 and 4.7 with a single parameter change.

Which Model Should You Use?

Use Opus 4.7 if:

You’re building autonomous coding agents (the SWE-bench gains are real).
You need high-resolution image understanding (UI screenshots, diagrams, dense dashboards).
Your prompts are already well-structured and you want more literal instruction following.
You can absorb a 20–40% effective cost increase for better quality.

Stick with Opus 4.6 if:

You have tight token budgets that can’t accommodate 30%+ inflation.
Your prompts rely on loose interpretation (you haven’t re-tuned for 4.7’s literalness).
You need temperature or other sampling parameters.
Your use case doesn’t need the vision or coding improvements (e.g., simple chat, basic document Q&A).

The Bottom Line

Opus 4.7 is a genuinely better model, especially for software engineering, vision tasks, and long-running agentic workflows. The 87.6% on SWE-bench Verified and the 3× vision resolution upgrade are meaningful, not marketing hype.

But “same price” is misleading. Between the tokenizer inflation (up to 35% more input tokens) and the model’s tendency to “think more” (more output tokens), your effective cost per task could rise 30–50% in practice.

Anthropic has delivered better capability at the same per-token price, but completing any given task now requires more tokens. Your decision hinges on whether the quality gains justify the higher per-task expense.

For teams building production coding agents, the answer is likely yes, the 12-point gain on CursorBench and 3× production task resolution on Rakuten-SWE-Bench justify the cost. For simpler workloads or teams on tight budgets, Opus 4.6 remains a perfectly capable option.

Claude Opus 4.6 vs. 4.7: The Upgrade That Isn’t Free

The Quick Summary

What Got Better: The Real Improvements

1. Software Engineering Gains

2. Vision: A Transformative Upgrade

3. Instruction Following: Literal vs. Loose

4. New `xhigh` Effort Level

The Token Inflation Problem

What Changed

What This Means for Your Bill

Output Tokens: The Double Hit

Breaking Changes: What Stops Working

1. Extended Thinking Payloads

Opus 4.6 format:

Opus 4.7 format:

2. Sampling Parameters Removed

Benchmark Comparison: Full Table

System Prompt Changes

Migration Strategy: How to Move to 4.7

Step 1: Measure Your Actual Token Inflation

Step 2: Update API Calls

Step 3: Re-tune Your Prompts

Step 4: Start with Staged Rollout

Step 5: Consider Keeping Opus 4.6 as a Fallback

Which Model Should You Use?

Use Opus 4.7 if:

Stick with Opus 4.6 if:

The Bottom Line

Kevin James

Claude Opus 4.6 vs. 4.7: The Upgrade That Isn’t Free

The Quick Summary

What Got Better: The Real Improvements

1. Software Engineering Gains

2. Vision: A Transformative Upgrade

3. Instruction Following: Literal vs. Loose

4. New xhigh Effort Level

The Token Inflation Problem

What Changed

What This Means for Your Bill

Output Tokens: The Double Hit

Breaking Changes: What Stops Working

1. Extended Thinking Payloads

Opus 4.6 format:

Opus 4.7 format:

2. Sampling Parameters Removed

Benchmark Comparison: Full Table

System Prompt Changes

Migration Strategy: How to Move to 4.7

Step 1: Measure Your Actual Token Inflation

Step 2: Update API Calls

Step 3: Re-tune Your Prompts

Step 4: Start with Staged Rollout

Step 5: Consider Keeping Opus 4.6 as a Fallback

Which Model Should You Use?

Use Opus 4.7 if:

Stick with Opus 4.6 if:

The Bottom Line

Kevin James

Related Posts

How to Fix Claude Cowork on Windows: A Complete Troubleshooting Guide (May 2026)

Claude Mythos Preview: An Assessment of Its Cyber Capabilities

LLMOps: The Complete Guide to Large Language Model Operations

4. New `xhigh` Effort Level