You Can't Audit a Guess
AI explainability for regulated teams: a model that can't explain why it produced an answer is a liability the moment a regulator asks you to justify the call.
In a regulated business, being right is not enough. You have to be able to show why you were right, and sometimes, being able to explain a defensible decision matters more than the decision itself.
This is the part of accountability that software people often underrate, because in ordinary engineering the explanation is free. The system did what the code says; you can read the code, trace the path, reproduce the result. “Why did it do that” always has an answer, and the answer is sitting in version control. Determinism gave you explainability without anyone having to ask for it.
A model doesn’t give you that, and the gap matters most exactly where the stakes are highest. When you most need to reconstruct why a decision was made, because someone was harmed, because a regulator is asking, because an outcome is being challenged, a system that produces fluent answers it cannot account for is a problem precisely when you can least afford one.
Why this is easy to miss until it’s expensive
In most of building software, you never think about explainability as a separate property, because you can’t make a normal system without it. The traceability comes along for free. So a team can go years treating “we can explain any decision the system made” as a fact of nature rather than a feature they were getting for nothing.
AI quietly withdraws the free version. The same prompt can produce different outputs on different runs. The reason a particular answer came out is not a rule you can point to; it’s a statistical product of weights and context that doesn’t reduce to a line you can read. You can log what went in and what came out, and you should, but “here is the input and the output” is not the same as “here is why this output was correct,” and an audit usually wants the second one. The team doesn’t notice the loss while things go well, because explanation only gets demanded when something has gone wrong, and by then the system that can’t provide it is already in production carrying real decisions.
The principle: match the explainability to the stakes
Here is the lens. Before AI makes or shapes a decision, ask what you would need to be able to say about that decision later, to whom, and under what pressure, then only use the model where it can meet that bar, or where you’ve built something around it that can.
Decisions live on a spectrum of how much account you have to give. At one end, nobody will ever ask why: an internal draft, a suggestion a human freely accepts or discards, a tool whose output leaves no consequential trace. There, the model’s inability to explain itself costs nothing, and you should use it without hesitation. At the other end sit decisions you may have to defend, formally, to someone with authority over you: why this person was assessed this way, why this claim was handled like that, why this risk was rated as it was. There, “the model produced it” is not an account anyone will accept, and you need either a different approach or a structure around the model that produces the explanation independently: a documented rule the human applied, a recorded rationale the model only assisted, a deterministic check that gives the real reason.
The mistake is to place a consequential decision at the explainable end of the spectrum because the model was confident and usually right. Confidence is not explanation. Usually-right is not an audit trail. The question is never how sure the model seems; it’s what you’ll be able to say when someone with authority asks you to justify the outcome.
What this looks like
Consider an insurer using a model to help assess claims, with the model producing a recommendation and a brief rationale that an assessor adopts. It works well, and the rationales read convincingly: they sound like reasons.
The fragility surfaces when a declined claim is challenged and the decision has to be justified to a regulator. The recorded rationale was generated by the model, not derived from a rule the insurer can point to and defend. Asked “why was this claim assessed this way, and would a similar claim be assessed the same,” the honest answer is that the model produced a plausible-sounding explanation that may or may not reflect any consistent, defensible basis, and might phrase the next similar claim differently. The output looked like an audit trail and wasn’t one. A convincing rationale that can’t be tied to a consistent, accountable basis is arguably worse than no rationale, because it gives the appearance of justification while lacking the substance an examination will demand.
The fix isn’t to abandon the model. It’s to make the defensible reasoning live somewhere accountable: a real decision rule, applied and recorded by a person, that the model may help draft but does not silently become. The model can assist the work. It cannot be the thing you point to when asked to justify it.
What it means in practice
Sort the decisions AI touches by how much account you’ll have to give for them, and let that sorting govern how you use it. Where no explanation is ever owed, use it freely. Where a real account may be demanded, ensure the explanation comes from something you can stand behind under scrutiny, a documented basis, a human’s recorded reasoning, a deterministic component, rather than from the model’s own fluent description of what it might have done.
In the Australian regulated context this maps onto duties that already exist: requirements to give reasons, to treat like cases alike, to justify decisions affecting people’s rights and entitlements. None of those duties were softened by the arrival of a tool that can’t meet them on its own. If anything they raise the bar for using it well, because the easiest way to fail them is to let a confident, unexplainable answer stand in for a reason you can defend.
Next: stepping back to the question under all of this, whether speed, the thing AI most obviously delivers, was ever the constraint that was actually holding your serious work back.