IT Brief New Zealand - Technology news for CIOs & IT decision-makers
New Zealand
Anthropic hands Petri AI test tool to Meridian Labs

Anthropic hands Petri AI test tool to Meridian Labs

Fri, 8th May 2026 (Today)
Sean Mitchell
SEAN MITCHELL Publisher

Anthropic has transferred development of its open-source AI alignment testing tool, Petri, to Meridian Labs, placing the project under an independent AI evaluation nonprofit.

Petri is a set of tests designed to assess large language models for behaviour such as deception, sycophancy and cooperation with harmful requests. Anthropic says it has used the tool in alignment assessments for every Claude model since Claude Sonnet 4.5.

New features

The handover coincides with the release of Petri 3.0, which changes the software's architecture and expands its uses. The new version separates the auditor model from the target model, allowing researchers to adjust each independently.

It also adds a component called Dish, intended to make evaluations more closely resemble real deployments. Dish runs tests using a model's actual system prompt and the software framework used around it in live settings.

That matters because researchers have long faced a problem in alignment testing: models may detect signs that they are being evaluated and change their behaviour. If that happens, the test may reveal less about how a model would act outside a controlled setting.

Anthropic has also linked Petri with another open-source tool, Bloom, for more detailed assessments of specific behaviours. Petri's broader test coverage and Bloom's deeper analysis are designed to work together as part of a wider evaluation process.

Anthropic launched Petri in October 2025 through its Anthropic Fellows programme. The software can be applied to any large language model and is designed to test alignment-related scenarios using a separate auditor model, while a judge model scores transcripts for signs of misaligned behaviour.

Use beyond Anthropic has already begun. According to the company, the UK's AI Security Institute has used Petri as a major part of its evaluation work on whether models show a propensity to sabotage AI research.

Independent oversight

The decision to move Petri to Meridian Labs reflects a broader debate in the AI sector over who should control the tools used to assess model behaviour. Developers, regulators and outside researchers argue that trust in evaluation results depends in part on whether the underlying methods are seen as independent from the labs building the systems being tested.

Anthropic drew a parallel with its earlier decision to donate the Model Context Protocol to the Linux Foundation. In Petri's case, the goal is to keep the tool independent of any single AI lab so its findings are viewed as neutral across the industry and by public bodies.

Meridian Labs focuses on AI evaluation and already hosts tools including Inspect and Scout. By adding Petri, the nonprofit is building a broader suite of software for laboratories, independent researchers and governments that want to test model behaviour in a more standardised way.

The transfer also highlights the growing importance of open-source infrastructure in AI governance. As model developers face tougher scrutiny over safety and reliability, shared testing tools may give outside groups a way to run checks without relying entirely on methods kept inside the companies building the models.

At the same time, open tools do not resolve harder questions about what should count as acceptable model behaviour, how tests should be designed or whether results can be compared across different systems. Evaluation frameworks can expose tendencies, but they still depend on choices about scenarios, scoring and interpretation.

Petri's structure addresses part of that challenge by separating the generation of test scenarios from the scoring of responses. In Anthropic's description of the system, one model simulates alignment-relevant situations and another assesses the resulting exchanges for problematic conduct.

The new Dish add-on targets another longstanding weakness in model evaluation: realism. If a test environment differs too much from production use, a model may behave differently, making the findings less useful for those trying to predict conduct in actual deployments.

By moving development outside its organisation, Anthropic is giving up direct control over a tool that has been part of the internal assessment process for its Claude family of models. That could help Meridian Labs position Petri as a common resource rather than a system closely tied to a single commercial developer.

Anthropic said the aim is to support alignment tools that are open and useful for the broader AI development community.