The pet boarding group has two sites, one in Brisbane, one on the Gold Coast. I wanted to know the blended occupancy for the previous month — a simple management question. I pulled the data into an AI tool, asked it for the average occupancy across both sites, and got back a confident answer of 74%. The actual number was 62%.

The AI hadn't hallucinated. It had done exactly what I asked: it averaged the two percentage values. But occupancy percentages can't be averaged that way — Brisbane had 800 room-nights and ran at 80% occupancy; Gold Coast had 200 room-nights and ran at 30%. The correct blended occupancy weights by room-night capacity. The AI's answer (74%) treated the sites as equal-weight when they weren't. The right answer (62%) requires the analyst to know that occupancy is a rate, not a quantity, and rates don't average unweighted.

The AI didn't know that. It couldn't know that. The data didn't tell it.

The problem isn’t AI. It’s that without an integrated, verified data layer, AI has nothing good to work with. The same infrastructure that catches variance errors before they land is the infrastructure that lets AI reason over clean facts.

How AI surfaces what was already broken

LLMs don't hallucinate randomly. They hallucinate from gaps in their context. When you ask an AI to analyse financial data and the underlying data is structurally inconsistent — wrong aggregation classes, sign convention errors, double-counted transactions, mislabeled categories — the AI confidently reports answers that look right but aren't.

For most SMEs this is a new problem because most SMEs didn't have an analyst querying their financial data weekly until AI made it cheap to do so. The data integrity issues were always there. They were just hidden because nobody was looking. Now everyone with a Claude or GPT subscription can ask their books questions in natural language. And everyone's books are now answering those questions, sometimes correctly, sometimes catastrophically wrong, with no surface indication of which.

The four failure modes we see repeatedly

Across the engagements we audit, four data integrity patterns produce the majority of AI-generated wrong answers:

1 · Aggregation class errors

This is the failure mode in the opening anecdote. Some financial values are flows (revenue, expenses, cash movements) — they can be summed across time and across business units. Some are stocks (cash balance, AR balance, inventory) — they can be summed across business units but not across time without explicit period semantics. Some are ratios (gross margin %, debtor days, current ratio) — they can't be summed at all, and they can only be averaged when weighted by the denominator. Some are rates (occupancy %, utilisation %, capacity %) — same problem as ratios. Some are distinct counts (unique customers, active subscriptions) — they can be summed across business units only if there's no overlap between the underlying sets.

When the data layer doesn't enforce these classes, AI tools cheerfully aggregate values that shouldn't be aggregated and the analyst doesn't notice because the AI presents its answer with the same confidence whether the underlying operation was valid or not.

2 · Sign convention errors

Refunds entered as positive numbers rather than negative. Expense reductions entered as expenses rather than credits. Returns booked to the wrong side of the ledger. Cross-checked by a human accountant familiar with the business, these errors usually fail the smell test — "wait, that customer's revenue can't be $50k positive this month, they cancelled and got a refund." Cross-checked by an AI relying on the data being structurally valid, they propagate into confident wrong answers.

The compounding consequence: when the AI summarises monthly performance for the board, the sign errors silently push the numbers in the wrong direction without anything in the output flagging that the input was malformed.

3 · Double-counting across systems

Modern SME finance stacks have multiple sources: a vertical PMS (Pet Manager, eSkilled, ServiceM8), Xero for general ledger, Stripe for payment processing, sometimes a separate invoicing system. A single transaction often lives in two or three of these. Without explicit deduplication rules at the data layer, AI tools count the same dollar in multiple buckets — and the analyst doesn't notice because the totals look "about right."

In one engagement we audited, revenue had been over-reported by 14% for an entire financial year because Stripe payments and the corresponding Xero invoices were both being counted as separate revenue events. The bookkeeper hadn't noticed because each individual reconciliation looked clean. The AI hadn't noticed because nothing in the data told it to deduplicate.

4 · Mislabeled categories

Cost-of-goods entered to operating expense. Capital expenditure entered to repairs and maintenance. Owner remuneration entered to consultant fees. The bookkeeping is "close enough" for monthly reporting — the totals on the P&L still match the bank statement — but it produces structurally wrong gross margin, wrong EBITDA, wrong capital-intensity ratios when an AI computes them naively.

This is the failure mode that bites hardest at deal time. A buyer's QoE team runs the numbers, gets a different EBITDA than the vendor's broker has been quoting, and the deal stalls. Or worse, the deal closes, the buyer discovers the discrepancy in the first quarter post-close, and litigation follows.

The data integrity problems were always there. AI just made them visible by exposing them to people who didn't know enough about the underlying structure to catch the inconsistencies.

The structural fix: verified aggregation at the data layer

The way to make AI tools useful on financial data is to encode the aggregation rules at the data layer, not the analysis layer. Specifically: every field should know what aggregation class it belongs to (flow, stock, ratio, rate, distinct) and the analysis layer should enforce that rule at compile time, before the AI ever sees the data.

This is the principle behind verified computation. When occupancy is typed as a rate, the system refuses to sum it. When debtor days is typed as a ratio, the system refuses to average it without an explicit weighting. When revenue is typed as a flow, the system permits summing across time but enforces unit consistency. The AI tool never gets the chance to make the wrong call because the wrong call isn't expressible in the type system.

This is also the principle behind APES 110 and APES 320 — the Australian Professional & Ethical Standards that govern Chartered Accountants and engagement-letter discipline. Every output value must trace back to the rule + the source transactions that fed it. The auditor's question "where did this number come from?" should have a one-click answer, not a week-long investigation.

What this looks like in practice

Newport Pembury runs every client's data through our Field Library — a set of 150+ canonical financial field definitions, each with its aggregation class, unit semantics, and audit lineage baked in. The five aggregation classes (flow, stock, ratio, rate, distinct) are first-class citizens of the data layer, not afterthoughts of the analyst's mental model. Every report we produce is bound to those definitions at compile time. The model can't hallucinate a calculation that breaks them because the calculation literally won't compile.

The compounding consequence is the part that matters most over time. As we work with you, your data structure gets cleaner. Six months in, your financial reporting is structurally sound at every aggregation layer. Your board pack reconciles. Your AI tools work with you instead of against you. Your auditor's questions resolve in minutes rather than weeks.

This is what we mean by "AI handles the volume, a CA handles the decisions." The AI is genuinely useful — but only when the decisions about how data should aggregate are made by a human who understands the regulatory context, encoded into the data layer, and enforced before the AI ever runs a query. Without that structural discipline, AI on financial data is a faster way to be wrong with confidence.

What we do with the Field Library

Every Newport Pembury retainer client gets their financial data bound to our Field Library — 150+ canonical definitions with aggregation classes enforced at compile time. The result: AI tools that work correctly because the data layer won't let them work incorrectly. Available as part of any retainer engagement, anchored by a Financial Systems Review where we identify, improve, integrate, and where needed build the financial intelligence, SOPs, and ways of working your business runs on.

Discuss the Field Library Start with a Financial Systems Review