The Data You Can't Paste Into a Prompt
AI and data governance for Australian regulated teams: residency, data sovereignty, and the Privacy Act duties that decide where your data can go.
The most important question about an AI feature is often the most boring one: what happens to the data when it leaves your control.
It’s boring because it has nothing to do with how clever the model is, and everything to do with contracts, jurisdictions, and obligations that were written before any of this existed. It is also, on a serious team, frequently the question that decides whether the feature can exist at all, which means it belongs at the start of the conversation, not in a compliance review bolted on after the build.
A lot of AI adoption skips it, because enthusiasm runs ahead of governance and because the constraint is invisible right up until it isn’t. The data flows somewhere. The somewhere matters. And unlike most engineering constraints, this one doesn’t yield to a better design or a cleverer workaround. It simply holds.
Why the constraint feels negotiable and isn’t
In most of engineering, a hard constraint is an invitation. You’re told something can’t be done, and you find the path around it: a different architecture, a clever cache, a trade-off that buys back what you needed. That instinct is most of what makes a good engineer good.
Data obligations don’t respond to that instinct, and trying to engineer around them is how teams get into trouble. “The records can’t leave the country” is not a performance problem with a clever solution. “This information can only be used for the purpose it was collected for” is not a constraint you optimise away. These are not technical limits that yield to technical ingenuity. They are duties, and the only real options are to satisfy them or to not do the thing.
The danger is that, from inside an engineering mindset, they present exactly like solvable constraints. So a team treats “send the patient notes to the model” as a latency-and-cost question, finds a fast, cheap way to do it, and ships, having answered an engineering question competently while never noticing it was a governance question wearing engineering clothes.
The principle: trace the data before you trust the model
Here is the lens. Before you evaluate what a model can do with your data, establish what you’re permitted to do with that data: where it may go, who may process it, for what purpose, and under what duty. The model’s capability is irrelevant until that’s settled, because no amount of capability buys back a breach of obligation.
So trace the path concretely, for the specific data and the specific feature. What exactly is being sent. Where does it physically go and through whose systems. Is it retained, logged, or used for training by the provider, and have you established that rather than assumed it. What category does the data fall under, whether personal, health, financial, or classified, and what duties attach to that category. Does the purpose you’re now using it for match the purpose it was collected for. These are answerable questions, and answering them is cheaper before the build than after it, when the architecture already assumes a data flow you turn out not to be allowed.
What this looks like
Consider a government services team that wants to use an LLM to help caseworkers draft responses to citizen enquiries. The capability fits well; the drafts would genuinely save time. The build is straightforward.
The data path is where the project actually lives. The enquiries contain personal information, some of it sensitive, collected by a public agency under specific authority and purpose. Sending it to a general-purpose model means sending citizen data to a third party, likely processed offshore, possibly retained, under terms the agency doesn’t control. That isn’t a deployment detail to confirm near launch. It is the first question, and depending on the answer it reshapes everything downstream. It might mandate a model hosted in-country under the agency’s control, or a contractual guarantee of no retention and no training, or a decision that this particular data can’t be used this way at all and the feature is scoped to something less sensitive. Each of those is a different project with a different cost, and you want to know which one you’re building before you build it.
This is sharper in Australia than the global discourse assumes. Data sovereignty isn’t a vague preference here; for many public-sector and regulated workloads it’s a requirement, and “hosted in a region” is not the same as “subject only to Australian jurisdiction.” The Privacy Act, sector-specific rules for health and financial information, and government security classifications all attach duties to data that follow it wherever it goes, including into a prompt. The model being hosted overseas is a legal fact about your data flow before it’s a technical fact about your latency.
What it means in practice
Make the data path part of the initial scope of any AI feature, with the same seriousness as the capability. Bring whoever owns data governance into the room at the start, not as a gate at the end, because their answer changes what you should build, and finding out late is how a finished feature becomes an unshippable one. When you assess a provider, verify the data handling terms rather than assuming the reasonable thing: retention and training policies vary, and the defaults are not always what a regulated team needs.
None of this is a reason to move slowly out of fear. It’s a reason to answer the boring question first, because on an accountable team the boring question is frequently the one that determines whether the exciting one ever gets a real answer.
Next: even when the data path is clean and the system is sound, the model itself behaves like a particular kind of colleague, a capable junior who never actually learns.