Claude Opus 4.7: What's New and How It Compares to Opus 4.6
Claude Opus 4.7: What's New and How It Compares to Opus 4.6 Anthropic just released Claude Opus 4.7 — their latest and most capable generally available model....
4/17/2026

Claude Opus 4.7: What's New and How It Compares to Opus 4.6
Anthropic just released Claude Opus 4.7 — their latest and most capable generally available model. If you've been using Opus 4.6 for coding, research, or building AI-powered products, here's everything that changed and what the new capabilities actually mean in practice.
The Key Specs at a Glance
| Spec | Opus 4.7 | Opus 4.6 |
|---|---|---|
| Pricing | $5 / M input, $25 / M output | $5 / M input, $25 / M output |
| Context window | 1M tokens (~555K words) | 200K tokens |
| Max output | 128K tokens | 64K tokens |
| Knowledge cutoff | January 2026 | August 2025 |
| Thinking mode | Adaptive Thinking | Extended Thinking |
| API model ID | claude-opus-4-7 | claude-opus-4-6-20260205 |
| Availability | API, Bedrock, Vertex AI, Foundry | API, Bedrock, Vertex AI |
Same price, bigger context, double the output length, and five months of fresher knowledge. On paper, it's a straightforward upgrade. Let's dig into what actually improved under the hood.
1. Agentic Coding: The Headline Improvement
This is where Opus 4.7 shines brightest. Anthropic describes it as "a notable improvement in advanced software engineering, with particular gains on the most difficult tasks."
What does that mean concretely? Three things:
Self-verification. Opus 4.7 doesn't just write code and hand it back — it devises ways to verify its own outputs before reporting completion. If you've ever had an AI agent say "done!" when the code doesn't actually compile, you know why this matters.
Long-running task consistency. The model handles complex, multi-step tasks "with rigor and consistency." Previous models tended to lose coherence on longer sessions. Opus 4.7 stays on track.
Strict instruction following. It pays "precise attention to instructions" — meaning fewer cases where the model ignores your constraints or goes off on tangents.
The Benchmark Numbers
The performance gains aren't marginal. Across real-world coding benchmarks from top AI companies, Opus 4.7 is showing double-digit improvements and solving problems that were previously out of reach:
-
CursorBench: 70% resolution (vs Opus 4.6 at 58%) — a 12-point jump. Cursor calls it "a meaningful jump in capabilities, particularly for its autonomy and more creative reasoning."
-
Augment's 93-task coding benchmark: +13% resolution over Opus 4.6, including 4 tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following.
-
Notion Agent: +14% over Opus 4.6 at fewer tokens and a third of the tool errors. "The first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold."
-
Rakuten-SWE-Bench: 3x more production tasks resolved than Opus 4.6, with double-digit gains in Code Quality and Test Quality.
-
Warp Terminal Bench: Passed tasks that prior Claude models had failed, including a tricky concurrency bug Opus 4.6 couldn't crack.
-
CodeRabbit code review: Recall improved by over 10%, surfacing hard-to-detect bugs in complex PRs while precision remained stable. "A bit faster than GPT-5.4 xhigh on our harness."
-
Genspark Super Agent: Highest quality-per-tool-call ratio measured. Best loop resistance (a model that loops indefinitely on 1 in 18 queries wastes compute and blocks users), lowest variance, and best graceful error recovery.
These aren't synthetic benchmarks — they're production workloads from companies shipping real products. The pattern is consistent: Opus 4.7 does more work, makes fewer mistakes, and recovers better when things go wrong.
2. Vision: Higher Resolution Image Understanding
Opus 4.7 has "substantially better vision" with higher resolution image support. This isn't just about seeing pictures more clearly — it opens up practical use cases:
-
Solve Intelligence reports "major improvements in multimodal understanding, from reading chemical structures to interpreting complex technical diagrams." They're using it for life sciences patent workflows including drafting, prosecution, infringement detection, and invalidity charting.
-
For developers building tools that process screenshots, diagrams, or UI mockups, the higher resolution means fewer misread labels, better layout understanding, and more accurate OCR-like capabilities.
3. Creative and Professional Output
Anthropic says Opus 4.7 is "more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs."
The most enthusiastic endorsement comes from a tester who called it "the best model in the world for building dashboards and data-rich interfaces. The design taste is genuinely surprising — it makes choices I'd actually ship. It's my default daily driver now."
If you use Claude for generating UI components, slide decks, or document layouts, this is a meaningful quality-of-life improvement.
4. Adaptive Thinking (Replaces Extended Thinking)
Opus 4.6 used Extended Thinking — a mode where the model explicitly shows its reasoning chain. Opus 4.7 switches to Adaptive Thinking, which adjusts reasoning depth based on task complexity automatically.
The practical difference: you don't need to manually toggle thinking modes. The model decides how much reasoning a task needs and allocates accordingly. Simple questions get fast answers; complex problems get deeper analysis.
Note: Sonnet 4.6 still supports Extended Thinking. If you specifically need visible reasoning chains, Sonnet remains the option.
5. Context Window: 5x Bigger, New Tokenizer
The jump from 200K to 1M tokens is massive on paper. That's roughly 555,000 words — enough to fit entire codebases, long document collections, or extended conversation histories.
However, there's an important detail: Opus 4.7 uses a new tokenizer. The same text produces more tokens than it did with Opus 4.6's tokenizer. Anthropic notes the 1M window corresponds to approximately 555K words, compared to the typical ~750K words per million tokens with the old tokenizer. In practice, a prompt that cost you 1,000 tokens with Opus 4.6 might now cost around 1,300 tokens with Opus 4.7. The per-token price hasn't changed, but your effective cost per conversation goes up roughly 30%. Worth factoring into your budget if you're a heavy API user.
What this means in practice:
- Your prompts will consume more tokens than before
- The effective "text capacity" of the 1M window is roughly equivalent to ~740K tokens on the old tokenizer
- Still a significant upgrade from Opus 4.6's 200K, but worth being aware of for cost estimation
6. Max Output: Doubled to 128K
Opus 4.6 capped output at 64K tokens. Opus 4.7 doubles that to 128K. This matters for:
- Generating long documents or reports in a single pass
- Complex code generation that spans multiple files
- Detailed analysis tasks where the model previously had to truncate its response
For agentic workflows where the model needs to produce extensive diffs or multi-file changes, 128K output is a practical improvement.
7. Project Glasswing and Cyber Safeguards
Opus 4.7 is the first model released under Anthropic's Project Glasswing framework. Last week, Anthropic highlighted both the risks and benefits of AI models for cybersecurity, and committed to testing new safeguards on less capable models before broadly releasing their most powerful model, Claude Mythos Preview.
What this means for Opus 4.7:
- Reduced cyber capabilities: During training, Anthropic "experimented with efforts to differentially reduce" cybersecurity capabilities compared to Mythos Preview.
- Automatic safeguards: The model includes built-in detection that blocks requests indicating "prohibited or high-risk cybersecurity uses."
- Cyber Verification Program: Security professionals doing legitimate work (vulnerability research, pentesting, red-teaming) can apply for access through the Cyber Verification Program.
This is Anthropic's first real-world test of differential capability controls — intentionally making a model less capable in specific domains while improving it in others. What they learn from Opus 4.7's deployment will shape how (and when) they release Mythos-class models more broadly.
8. Availability and Integration
Opus 4.7 is available across all major platforms from day one:
- Claude API — direct access via
claude-opus-4-7 - Amazon Bedrock —
anthropic.claude-opus-4-7(research preview) - Google Cloud Vertex AI —
claude-opus-4-7 - Microsoft Foundry — new platform addition
The addition of Microsoft Foundry is notable — it's the first time a Claude Opus model has been available on Microsoft's platform at launch.
What the Early Testers Say
Beyond the benchmark numbers, the qualitative feedback from enterprise testers reveals consistent themes:
On reliability:
- Hex: "The strongest model Hex has evaluated. It correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for."
- Devin: "Takes long-horizon autonomy to a new level. It works coherently for hours, pushes through hard problems rather than giving up."
On efficiency:
- Replit: "An easy upgrade decision. Same quality at lower cost — more efficient and precise at tasks like analyzing logs and traces, finding bugs, and proposing fixes."
- Hex: "Low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." — meaning you get the same quality output with less prompt engineering.
On reasoning:
- Harvey (legal AI): "90.9% substantive accuracy on BigLaw Bench at high effort with better reasoning calibration. It correctly distinguishes assignment provisions from change-of-control provisions, a task that has historically challenged frontier models."
- Quantium: "The biggest gains showed up where they matter most: reasoning depth, structured problem-framing, and complex technical work."
On personality:
- Replit: "I love how it pushes back during technical discussions to help me make better decisions. It really feels like a better coworker."
- Anthropic's own description: The model brings "a more opinionated perspective, rather than simply agreeing with the user."
9. Who's Already Using It — And What They're Building
The early-access tester list reads like a who's-who of AI-powered developer tools. Here's a quick look at how different companies are putting Opus 4.7 to work:
Coding agents and IDEs: Cursor, Replit, Warp, and Devin are all integrating Opus 4.7 as their primary or top-tier model for autonomous coding tasks. Devin specifically highlights "long-horizon autonomy" — the model works coherently for hours on deep investigation work that wasn't reliably possible before.
Code review: CodeRabbit is lining up Opus 4.7 for their "heaviest review work at launch," citing 10%+ recall improvement on hard-to-detect bugs in complex pull requests.
Enterprise AI platforms: Hebbia saw double-digit jumps in tool call accuracy and planning for orchestrator agents handling retrieval, slide creation, and document generation. Genspark reports the highest quality-per-tool-call ratio they've measured across any model.
Legal and finance: Harvey reports 90.9% substantive accuracy on BigLaw Bench. Hex calls it "the strongest model Hex has evaluated" — it correctly reports missing data instead of hallucinating plausible fallbacks, and resists data traps that even Opus 4.6 fell for. A fintech tester describes it as catching "its own logical faults during the planning phase."
Life sciences: Solve Intelligence is using the improved vision capabilities for patent workflows — reading chemical structures, interpreting technical diagrams, and handling everything from drafting to infringement detection.
Data visualization: One tester called it "the best model in the world for building dashboards and data-rich interfaces," noting that "the design taste is genuinely surprising — it makes choices I'd actually ship."
The breadth of adoption is notable. This isn't just a coding model — it's being deployed across legal, finance, life sciences, and enterprise automation. The common thread: tasks that require sustained reasoning, precise tool use, and reliable output over long sessions.
Opus 4.7 vs Opus 4.6: Summary
| Capability | Opus 4.6 | Opus 4.7 | Change |
|---|---|---|---|
| Agentic coding | Strong | Significantly stronger | +12-14% on major benchmarks |
| Self-verification | Limited | Built-in | New capability |
| Vision | Standard | Higher resolution | Substantial improvement |
| Creative output | Good | "More tasteful" | Quality improvement |
| Context window | 200K | 1M | 5x larger |
| Max output | 64K | 128K | 2x larger |
| Thinking mode | Extended | Adaptive | Auto-adjusting depth |
| Knowledge cutoff | Aug 2025 | Jan 2026 | 5 months fresher |
| Tool error recovery | Stops on failure | Pushes through | Major reliability gain |
| Cyber safeguards | None | Project Glasswing | New safety framework |
| Pricing | $5/$25 per M tokens | $5/$25 per M tokens | Unchanged |
Bottom Line
Claude Opus 4.7 is a focused upgrade that doubles down on what Opus was already good at — complex, autonomous coding work — while adding meaningful improvements to vision, output length, and context capacity.
The biggest wins are in agentic reliability: self-verification, tool error recovery, and long-running task consistency. If you're building AI-powered development tools or using Claude Code for daily coding work, these improvements translate directly into fewer failed tasks and less babysitting.
The new tokenizer and Project Glasswing cyber safeguards are worth understanding, as they affect both cost calculations and the model's behavior on security-adjacent tasks.
For developers already on Opus 4.6, the upgrade path is simple — swap claude-opus-4-6 for claude-opus-4-7 in your API calls. Same price, more capability.
Links:
- Anthropic announcement: anthropic.com/research/claude-opus-4-7
- API docs: platform.claude.com/docs
- Project Glasswing: anthropic.com/glasswing
- Cyber Verification Program: claude.com/form/cyber-use-case