The most useful way to think about a model on your team is as a junior colleague: fast, eager, widely read, and not yet trustworthy on anything that matters. With one difference that changes everything. This junior never learns.

A real junior is a temporary state. You invest in them. You review their work, explain what they missed, watch the same mistake not happen twice, and over a year or two they become someone whose work you can trust without reading every line. The review burden is real, but it’s an investment with a return, and the return is a colleague who eventually carries weight on their own.

The model gives you the eager, capable, unreliable junior permanently. The output quality might rise as models improve, but it never internalises your corrections, your context, your hard-won knowledge of why this particular thing matters here. Every session starts fresh. The mentoring never compounds, because there is nobody on the other side of it accumulating judgement.

Why the output flatters you into trusting it

A junior’s work usually looks like junior work. There are tells: the awkward phrasing, the missing edge case, the structure that doesn’t quite sit right. Those tells are useful, because they tell you to look closely.

The model’s tells are mostly gone. It produces fluent, confident, well-structured output that looks like the work of someone senior, whether or not it is correct. The signal that used to say “check this carefully” has been removed, while the underlying reliability is still junior. That is a genuinely dangerous combination, because human review naturally calibrates to appearance: we look harder at work that looks shaky and skim work that looks polished. The model inverts that instinct against you. Its most confident output deserves the most scrutiny, and gets the least.

This is why “it’s usually right” is a trap rather than a reassurance on an accountable team. A tool that is usually right and occasionally, invisibly, confidently wrong demands more review discipline than a tool that is unreliable in obvious ways, not less. The polish isn’t quality. It’s the absence of the warning signs you used to rely on.

The principle: price the review, not just the generation

Here is the lens. When you weigh what a model saves you, count the full cost of trusting its output, not just the speed of producing it. Generation is the cheap, visible part. Review is the expensive, easy-to-ignore part: real review, by someone competent enough to catch a confident error in a domain that matters. It doesn’t shrink just because generation got fast.

For low-stakes work, this maths is wonderful. If a wrong answer costs nothing and a human glance catches the bad ones, the model’s speed is close to free value, and you should use it liberally. For consequential work, the maths is very different. If catching a confident error requires an expert reading the output as carefully as if they’d produced it themselves, you haven’t removed the expensive labour; you’ve moved it from writing to reviewing, and depending on the task that may be a smaller saving than it looked, or none. Sometimes reviewing a plausible-but-wrong artefact to the required standard costs more than producing a correct one from scratch.

The point isn’t that the saving is fake. Often it’s real and large. The point is that you can only tell which case you’re in if you price the review honestly, and the enthusiasm tends to price only the generation.

What this looks like

Consider an aged-care provider using a model to generate care-plan documentation from carers’ shift notes. The output is excellent on its face: thorough, well-organised, in the right format, better-written than much of what staff produce by hand. The temptation is to treat it as basically done, with a light skim before filing.

The light skim is where the risk concentrates. A care plan that reads beautifully and quietly misstates a medication interaction, or omits a risk that was present in the notes, is more dangerous than an obviously rough draft, precisely because its polish discourages the scrutiny that would catch the error. To review it to the standard the work actually requires, a qualified person has to read it against the source notes as carefully as if they were writing it, which is most of the labour the model appeared to save. If the provider priced only the generation, they think they automated the documentation. What they actually did was shift a clinical-quality task from authoring to reviewing, and if they cut the review to bank the saving, they’ve taken on risk they haven’t named.

What it means in practice

Treat the model as a permanent junior and design accordingly. Use it freely where being wrong is cheap and a glance catches the failures: internal drafts, first passes, exploration, the low-stakes volume work where speed genuinely dominates. Be deliberate where being wrong is expensive, and there, budget the expert review as a real and ongoing cost rather than a formality you can trim once you trust it. The thing you’d be trusting never earns the trust the way a real junior does. It doesn’t get better at your job. It gets faster at producing work you still have to check.

And protect the people doing the checking. Reviewing confident, plausible output for subtle error is harder and more draining than reviewing obviously rough work, and a team that’s had its review load quietly doubled while being told the AI is saving everyone time will feel the gap between the story and the day, even before they can name it.

Next: the problem that remains even after the review is done well. You often can’t fully explain why the model produced what it did, and explanation is exactly what an audit demands.

Your AI Is a Brilliant Junior

Why the output flatters you into trusting it

The principle: price the review, not just the generation

What this looks like

What it means in practice

Where AI Earns Its Place