IT Brief New Zealand - Technology news for CIOs & IT decision-makers
Stylized human brain with computer code digital shields ai cybersecurity

Anthropic launches Claude Opus 4.5, boosting coding & security

Wed, 26th Nov 2025

Anthropic has released Claude Opus 4.5, its latest large language model, with claims of improved performance across a broad range of tasks including software engineering, reasoning, mathematics, and general computer use. The model is available through Anthropic's own applications, its API, and leading cloud platforms.

Software engineering

Claude Opus 4.5 has shown measurable gains in various software engineering benchmarks. It outperformed competitors on SWE-bench Verified, a standard test suite for real-world software engineering tasks. The company reports that Opus 4.5 scored higher than any previous candidate, including human engineers, in an internal performance engineering exam. The test, used as a benchmark for technical ability and judgment under time constraints, is considered challenging even for experienced professionals.

The model also leads across seven out of eight programming languages on the SWE-bench Multilingual benchmark. This indicates consistency in code generation and problem-solving across different languages. In practical tests simulating real-world scenarios, Opus 4.5 demonstrated creative problem solving. For instance, when acting as an airline service agent, it found a new approach to modifying a non-changeable basic economy ticket by first upgrading the customer's cabin class before altering the flight itinerary, navigating policy constraints in a novel way.

General capabilities

Beyond coding, Claude Opus 4.5 has recorded improvements in areas such as vision, broader reasoning, and mathematical tasks. The model performed strongly across various benchmarks including Polyglot, BrowseComp-Plus, and Vending-Bench. Its ability to handle longer, more complex conversations in the Claude app has also been enhanced. Users can now conduct extended discussions without hitting conversation limits, due to automated summarisation of earlier context.

New integration features have also been rolled out. Opus 4.5 powers upgrades to Claude Code, including a revised Plan Mode and new availability in desktop applications. This allows multiple local and remote AI sessions to run in parallel. Enhanced support for Chrome and Excel is now available for users on Max, Team, and Enterprise plans.

Efficiency and control

The API for Claude Opus 4.5 introduces an "effort parameter", allowing customers to calibrate the model's performance based on time, cost, or capability. At medium effort, Opus 4.5 matches the performance of its predecessor Sonnet 4.5, while using 76% fewer output tokens. At its highest setting, it exceeds Sonnet 4.5's performance by 4.3 percentage points and reduces output tokens by 48%.

Advanced context management and memory have been included to improve agentic task handling. The model can now better manage teams of AI subagents, supporting construction of complex, multi-agent systems for research and operational use. Testing revealed a nearly 15 percentage point increase in deep research performance using a combination of these techniques.

Security and safety

Anthropic places emphasis on safety with Claude Opus 4.5, reporting lower scores in "concerning behaviour" across a set of misalignment evaluations. These cover issues such as resistance to prompt injection attacks, in which adversarial instructions aim to induce unintended outputs. According to internal tests, Opus 4.5 is less susceptible to such vulnerabilities than any previous model developed by Anthropic or by competitors.

These evaluations reflect ongoing efforts to prevent models from "gaming" rules or objectives in unanticipated ways. The company states that its procedures and model alignment continue to focus on robustness against misuse or adversarial manipulation.

Pricing and access

Claude Opus 4.5 is available at revised pricing of USD $5 and USD $25 per million tokens, intended to expand accessibility for developers, enterprises, and individual users. Usage limits for Max and Team Premium users have also been increased, with caps set specifically for this latest version to support daily work needs.

"Our customers often use Claude for critical tasks. They want to be assured that, in the face of malicious attacks by hackers and cybercriminals, Claude has the training and the 'street smarts' to avoid trouble. With Opus 4.5, we've made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry," said Anthropic.