IT Brief New Zealand - Technology news for CIOs & IT decision-makers
Psuntitled 1

Beyond ‘black box’ mode - Proving AI’s controls actually work

Wed, 30th Jul 2025

Why 'it looked right' isn't a defence and what smart risk teams are doing to build trustworthy AI outputs.

In the age of AI, the hardest thing to see may just be assumptions. A vendor demo looks great; and the dashboard lights up green. AI maps obligations, links them to controls and even suggests risk ratings for each one. It all seems plausible - until it doesn't.

Remember that regulators don't reward you for being fast …but they do fine you for being wrong. And probabilistic guesses that 'sound about right' don't stack up when the questions get pointed and the stakes become real.

We've entered the era of machine-made confidence in which models trained on vast swathes of regulatory text produce outputs that feel complete, but could turn out to be laced with blind spots, omissions and faulty logic. This is not because the models are malicious but because, by design, they're guessing - mathematically, creatively and occasionally - catastrophically.

So the question compliance leaders must ask has become this: What do we have in place to prove the model's outputs are real, relevant and reliable; and how would we know when they're not?

The black box problem

Let's be clear - most modern AI platforms in compliance are built on large language models (LLMs) and other probabilistic engines.

This means they're great at pulling in patterns and surfacing likely matches but 'likely' is doing a lot of work here. In recent months, Ashurst has seen these systems produce impressive-looking obligation registers only to discover they've misinterpreted key clauses due to poor prompt framing, skipped exceptions, inability to connect related obligations across sources,  and carve-outs buried deep in footnotes, whilst confidently included entirely fictitious requirements. These are not trivial flaws - they're the AI equivalent of a junior analyst making decisions on instinct and Google hits and then assuring you it's all fine. Regulators won't accept that and neither should you.

Vendors - including some big ones - love to claim their AI 'reads the laws' and 'outputs ready-made control maps', but under scrutiny, those outputs often collapse into contradictions because law isn't just about pattern recognition - it's context, application, exception and interpretation. No model, however vast, can divine frontline fit-for-purpose controls just from statutory text and org charts - no matter how advanced.

From black box to glass box - validation layers that work

So how do you join the dots between from plausible-sounding automation to defensible, auditable evidence? The answer lies in building deliberate, layered validation protocols that de-risk model outputs before they get anywhere near a boardroom or a regulator. 

Here's what the sharpest teams are doing right now:

Red-teaming the model > Treat your AI like a new hire in a sensitive role - by challenging it consistently. This means structured red-teaming exercises that deliberately probe the system with things like edge cases, contradictions, jurisdictional nuance and fast-moving guidance updates. The goal isn't just to see what it gets right but to uncover where it breaks…and everything breaks somewhere!

Synthetic test cases > This is akin to crash-test dummies for compliance, i.e. design controlled test data sets that represent real-world legal complexity. You inject overlapping obligations, exceptions, or borderline risk scenarios and should observe how the AI handles ambiguity or whether it even notices it. If it can't explain its decisions or flags every minor clause as high risk - what you have is a glorified word processor not a useful compliance assistant.

Human spot-checks (not just rubber stamps) > AI should never be the last word and senior legal and risk professionals need to manually validate a sample of outputs - not just once, but on a rolling basis not just for accuracy, but for judgment. They should ask - 'Does the model's suggested control actually mitigate the identified obligation?' And 'Can frontline staff operationalise it?' If you can't explain how it works to your regulator, you probably shouldn't be using it.

These testing exercises listed will highlight inherent limitations in AI's capability and therefore the assumptions around which business cases should involve their use. 

The best compliance leaders don't blindly trust the black box - they open it, test it, question it and ensure there's always a human ready to take the wheel when judgment matters most. AI might get you an answer fast, but it's still your job to make sure it's the right one.

That starts with knowing where AI belongs and where it doesn't. In complex areas like obligation identification and legal drafting, the layers of interpretation, nuance and real-world application are simply too great for AI to lead. In these cases, it can support, but never replace, experienced legal and risk professionals. That said, in more structured domains - control design, control testing and incident monitoring - AI may well play a growing role, if surrounded by strong human oversight and a robust system of challenge, validation and escalation.

Reasonable steps ≠ blind faith

The regulators have made their expectations plain - it's not enough to comply;  you must be able to prove you fully understood the risk and picked the right tools to manage it. In other words - you need to show your work. If your AI platform generates a control map, can you: explain how the model reached those matches, show what data sources it used and what it ignored, and document how outputs were reviewed, challenged and corrected? This is where many firms falter - they implement powerful tools but treat them like an infallible oracle. When something goes wrong, they're left explaining why no one thought to check (and double check). Let's be blunt - if your AI-generated compliance logic can't stand up to a regulator's 'reasonable steps' question, then it's not a time-saver -  it's just a liability.

Compliance is still a human job (for now!)

We've said it before but it's worth repeating: AI is a powerful accelerant, not a like-for-like replacement. Used wisely, it can help you detect emerging risk patterns across complaints or incident logs, standardise documentation across jurisdictions and surface potential control gaps during regulatory change reviews… but it can't (yet) interpret intention, resolve legal ambiguity, or understand how humans actually behave under pressure. That's YOUR job.

AI can be a powerful part of your compliance toolkit but the real leadership comes from knowing when to use it - and when to put it back in the box.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X