Claude Opus 4.7: What's New and How It Compares to Opus 4.6

Anthropic just released Claude Opus 4.7 — their latest and most capable generally available model. If you've been using Opus 4.6 for coding, research, or building AI-powered products, here's everything that changed and what the new capabilities actually mean in practice.

The Key Specs at a Glance

Spec	Opus 4.7	Opus 4.6
Pricing	$5 / M input, $25 / M output	$5 / M input, $25 / M output
Context window	1M tokens (~555K words)	200K tokens
Max output	128K tokens	64K tokens
Knowledge cutoff	January 2026	August 2025
Thinking mode	Adaptive Thinking	Extended Thinking
API model ID	`claude-opus-4-7`	`claude-opus-4-6-20260205`
Availability	API, Bedrock, Vertex AI, Foundry	API, Bedrock, Vertex AI

Same price, bigger context, double the output length, and five months of fresher knowledge. On paper, it's a straightforward upgrade. Let's dig into what actually improved under the hood.

1. Agentic Coding: The Headline Improvement

This is where Opus 4.7 shines brightest. Anthropic describes it as "a notable improvement in advanced software engineering, with particular gains on the most difficult tasks."

What does that mean concretely? Three things:

Self-verification. Opus 4.7 doesn't just write code and hand it back — it devises ways to verify its own outputs before reporting completion. If you've ever had an AI agent say "done!" when the code doesn't actually compile, you know why this matters.

Long-running task consistency. The model handles complex, multi-step tasks "with rigor and consistency." Previous models tended to lose coherence on longer sessions. Opus 4.7 stays on track.

Strict instruction following. It pays "precise attention to instructions" — meaning fewer cases where the model ignores your constraints or goes off on tangents.

The Benchmark Numbers

The performance gains aren't marginal. Across real-world coding benchmarks from top AI companies, Opus 4.7 is showing double-digit improvements and solving problems that were previously out of reach:

CursorBench: 70% resolution (vs Opus 4.6 at 58%) — a 12-point jump. Cursor calls it "a meaningful jump in capabilities, particularly for its autonomy and more creative reasoning."
Augment's 93-task coding benchmark: +13% resolution over Opus 4.6, including 4 tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following.
Notion Agent: +14% over Opus 4.6 at fewer tokens and a third of the tool errors. "The first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold."
Rakuten-SWE-Bench: 3x more production tasks resolved than Opus 4.6, with double-digit gains in Code Quality and Test Quality.
Warp Terminal Bench: Passed tasks that prior Claude models had failed, including a tricky concurrency bug Opus 4.6 couldn't crack.
CodeRabbit code review: Recall improved by over 10%, surfacing hard-to-detect bugs in complex PRs while precision remained stable. "A bit faster than GPT-5.4 xhigh on our harness."
Genspark Super Agent: Highest quality-per-tool-call ratio measured. Best loop resistance (a model that loops indefinitely on 1 in 18 queries wastes compute and blocks users), lowest variance, and best graceful error recovery.

These aren't synthetic benchmarks — they're production workloads from companies shipping real products. The pattern is consistent: Opus 4.7 does more work, makes fewer mistakes, and recovers better when things go wrong.

2. Vision: Higher Resolution Image Understanding

Opus 4.7 has "substantially better vision" with higher resolution image support. This isn't just about seeing pictures more clearly — it opens up practical use cases:

Solve Intelligence reports "major improvements in multimodal understanding, from reading chemical structures to interpreting complex technical diagrams." They're using it for life sciences patent workflows including drafting, prosecution, infringement detection, and invalidity charting.
For developers building tools that process screenshots, diagrams, or UI mockups, the higher resolution means fewer misread labels, better layout understanding, and more accurate OCR-like capabilities.

3. Creative and Professional Output

Anthropic says Opus 4.7 is "more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs."

The most enthusiastic endorsement comes from a tester who called it "the best model in the world for building dashboards and data-rich interfaces. The design taste is genuinely surprising — it makes choices I'd actually ship. It's my default daily driver now."

If you use Claude for generating UI components, slide decks, or document layouts, this is a meaningful quality-of-life improvement.

4. Adaptive Thinking (Replaces Extended Thinking)

Opus 4.6 used Extended Thinking — a mode where the model explicitly shows its reasoning chain. Opus 4.7 switches to Adaptive Thinking, which adjusts reasoning depth based on task complexity automatically.

The practical difference: you don't need to manually toggle thinking modes. The model decides how much reasoning a task needs and allocates accordingly. Simple questions get fast answers; complex problems get deeper analysis.

Note: Sonnet 4.6 still supports Extended Thinking. If you specifically need visible reasoning chains, Sonnet remains the option.

5. Context Window: 5x Bigger, New Tokenizer

The jump from 200K to 1M tokens is massive on paper. That's roughly 555,000 words — enough to fit entire codebases, long document collections, or extended conversation histories.

However, there's an important detail: Opus 4.7 uses a new tokenizer. The same text produces more tokens than it did with Opus 4.6's tokenizer. Anthropic notes the 1M window corresponds to approximately 555K words, compared to the typical ~750K words per million tokens with the old tokenizer. In practice, a prompt that cost you 1,000 tokens with Opus 4.6 might now cost around 1,300 tokens with Opus 4.7. The per-token price hasn't changed, but your effective cost per conversation goes up roughly 30%. Worth factoring into your budget if you're a heavy API user.

What this means in practice:

Your prompts will consume more tokens than before
The effective "text capacity" of the 1M window is roughly equivalent to ~740K tokens on the old tokenizer
Still a significant upgrade from Opus 4.6's 200K, but worth being aware of for cost estimation

6. Max Output: Doubled to 128K

Opus 4.6 capped output at 64K tokens. Opus 4.7 doubles that to 128K. This matters for:

Generating long documents or reports in a single pass
Complex code generation that spans multiple files
Detailed analysis tasks where the model previously had to truncate its response

For agentic workflows where the model needs to produce extensive diffs or multi-file changes, 128K output is a practical improvement.

7. Project Glasswing and Cyber Safeguards

Opus 4.7 is the first model released under Anthropic's Project Glasswing framework. Last week, Anthropic highlighted both the risks and benefits of AI models for cybersecurity, and committed to testing new safeguards on less capable models before broadly releasing their most powerful model, Claude Mythos Preview.

What this means for Opus 4.7:

Reduced cyber capabilities: During training, Anthropic "experimented with efforts to differentially reduce" cybersecurity capabilities compared to Mythos Preview.
Automatic safeguards: The model includes built-in detection that blocks requests indicating "prohibited or high-risk cybersecurity uses."
Cyber Verification Program: Security professionals doing legitimate work (vulnerability research, pentesting, red-teaming) can apply for access through the Cyber Verification Program.

This is Anthropic's first real-world test of differential capability controls — intentionally making a model less capable in specific domains while improving it in others. What they learn from Opus 4.7's deployment will shape how (and when) they release Mythos-class models more broadly.

8. Availability and Integration

Opus 4.7 is available across all major platforms from day one:

Claude API — direct access via claude-opus-4-7
Amazon Bedrock — anthropic.claude-opus-4-7 (research preview)
Google Cloud Vertex AI — claude-opus-4-7
Microsoft Foundry — new platform addition

The addition of Microsoft Foundry is notable — it's the first time a Claude Opus model has been available on Microsoft's platform at launch.

What the Early Testers Say

Beyond the benchmark numbers, the qualitative feedback from enterprise testers reveals consistent themes:

On reliability:

Hex: "The strongest model Hex has evaluated. It correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for."
Devin: "Takes long-horizon autonomy to a new level. It works coherently for hours, pushes through hard problems rather than giving up."

On efficiency:

Replit: "An easy upgrade decision. Same quality at lower cost — more efficient and precise at tasks like analyzing logs and traces, finding bugs, and proposing fixes."
Hex: "Low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." — meaning you get the same quality output with less prompt engineering.

On reasoning:

Harvey (legal AI): "90.9% substantive accuracy on BigLaw Bench at high effort with better reasoning calibration. It correctly distinguishes assignment provisions from change-of-control provisions, a task that has historically challenged frontier models."
Quantium: "The biggest gains showed up where they matter most: reasoning depth, structured problem-framing, and complex technical work."

On personality:

Replit: "I love how it pushes back during technical discussions to help me make better decisions. It really feels like a better coworker."
Anthropic's own description: The model brings "a more opinionated perspective, rather than simply agreeing with the user."

9. Who's Already Using It — And What They're Building

The early-access tester list reads like a who's-who of AI-powered developer tools. Here's a quick look at how different companies are putting Opus 4.7 to work:

Coding agents and IDEs: Cursor, Replit, Warp, and Devin are all integrating Opus 4.7 as their primary or top-tier model for autonomous coding tasks. Devin specifically highlights "long-horizon autonomy" — the model works coherently for hours on deep investigation work that wasn't reliably possible before.

Code review: CodeRabbit is lining up Opus 4.7 for their "heaviest review work at launch," citing 10%+ recall improvement on hard-to-detect bugs in complex pull requests.

Enterprise AI platforms: Hebbia saw double-digit jumps in tool call accuracy and planning for orchestrator agents handling retrieval, slide creation, and document generation. Genspark reports the highest quality-per-tool-call ratio they've measured across any model.

Legal and finance: Harvey reports 90.9% substantive accuracy on BigLaw Bench. Hex calls it "the strongest model Hex has evaluated" — it correctly reports missing data instead of hallucinating plausible fallbacks, and resists data traps that even Opus 4.6 fell for. A fintech tester describes it as catching "its own logical faults during the planning phase."

Life sciences: Solve Intelligence is using the improved vision capabilities for patent workflows — reading chemical structures, interpreting technical diagrams, and handling everything from drafting to infringement detection.

Data visualization: One tester called it "the best model in the world for building dashboards and data-rich interfaces," noting that "the design taste is genuinely surprising — it makes choices I'd actually ship."

The breadth of adoption is notable. This isn't just a coding model — it's being deployed across legal, finance, life sciences, and enterprise automation. The common thread: tasks that require sustained reasoning, precise tool use, and reliable output over long sessions.

Opus 4.7 vs Opus 4.6: Summary

Capability	Opus 4.6	Opus 4.7	Change
Agentic coding	Strong	Significantly stronger	+12-14% on major benchmarks
Self-verification	Limited	Built-in	New capability
Vision	Standard	Higher resolution	Substantial improvement
Creative output	Good	"More tasteful"	Quality improvement
Context window	200K	1M	5x larger
Max output	64K	128K	2x larger
Thinking mode	Extended	Adaptive	Auto-adjusting depth
Knowledge cutoff	Aug 2025	Jan 2026	5 months fresher
Tool error recovery	Stops on failure	Pushes through	Major reliability gain
Cyber safeguards	None	Project Glasswing	New safety framework
Pricing	$5/$25 per M tokens	$5/$25 per M tokens	Unchanged

Bottom Line

Claude Opus 4.7 is a focused upgrade that doubles down on what Opus was already good at — complex, autonomous coding work — while adding meaningful improvements to vision, output length, and context capacity.

The biggest wins are in agentic reliability: self-verification, tool error recovery, and long-running task consistency. If you're building AI-powered development tools or using Claude Code for daily coding work, these improvements translate directly into fewer failed tasks and less babysitting.

The new tokenizer and Project Glasswing cyber safeguards are worth understanding, as they affect both cost calculations and the model's behavior on security-adjacent tasks.

For developers already on Opus 4.6, the upgrade path is simple — swap claude-opus-4-6 for claude-opus-4-7 in your API calls. Same price, more capability.

Links:

Anthropic announcement: anthropic.com/research/claude-opus-4-7
API docs: platform.claude.com/docs
Project Glasswing: anthropic.com/glasswing
Cyber Verification Program: claude.com/form/cyber-use-case