In the last post we argued that pure-AI vendors will fail at family-office operations work, and pure-human shops will keep losing on cost. This post is about the actual combination that works, and what it looks like operationally.
We’ll be direct: “human-in-the-loop AI” is one of the most overused phrases in fintech. Most of the time it means “we have a chatbot and an offshore reviewer.” That is not what we’re talking about.
What we’re talking about is structurally different. It’s a model where domain experts and AI agents operate as a single team, where the AI does what AI is good at, the human does what only a human can do, and the workflow is engineered so neither one is doing the other’s job.
Why humans alone don’t scale
We covered this last post. Family-office back-office labor runs $200K–$500K a year per relationship. The complexity of HNW balance sheets is increasing faster than the analyst supply. McKinsey’s projected advisor shortfall of 90,000–110,000 by 2034 tells you the labor side is structurally constrained. You cannot fix this by hiring more people. The people don’t exist.
Why AI alone is too sloppy
This is the harder thing to say in 2026, because every keynote in wealth right now is some version of “AI is going to do everything.” So let us be specific.
Bucket one— extraction accuracy. State-of-the-art document extraction on financial documents is excellent. It is not 100%. It is high-90s on clean docs, lower on messy ones. In financial data, a 96% accuracy rate means 4 out of 100 numbers are wrong.
At one of the investment banks our team spent time at, the saying in training was: “In college a 93% is an A. Here, that’s unacceptable.” The other line thrown around was — remember that time you got way too into a simple science fair project and ended up building an entire bridge out of popsicle sticks, and it actually worked, and held like 300 pounds, and then you presented it in class, got a 105% when bonus wasn’t even really an option, and you realized the prompt was “not that deep” — as the kids would say — and you could’ve done 10% of the work and still got a 95%? Well, that 105% obsession — that’s the expectation here every single time. That’s just 100%. That’s just expected.
This is really what a UHNW individual needs and what a family office demands.
Bucket two— entity disambiguation. An AI can extract “Aspen Ranch LLC” from a K-1. It cannot, without context, reliably know that “Aspen Ranch LLC” is the same entity as “Aspen Ranch, L.L.C.” which is owned 60% by the family’s 2014 grantor trust and 40% by the founder personally. That mapping is judgment. Humans hold the judgment. (AI folks will argue with me on this, and while I actually agree that there are ways this can be done — you get the point.)
Bucket three— exception handling. Real client data is 80% routine, 20% exception. AI handles the 80% beautifully. The 20% — the GP that’s restating last year’s distributions, the K-1 that arrived in October with a different EIN than last year’s, the wire that looks like a capital call but is actually a fee — that’s where you need a human who has seen this before.
Bucket four — defensibility. McKinsey’s 2025 research on wealth AI is clear: the value of human advisor judgment is increasing, not decreasing, as AI commoditizes the technical work. Their data shows nearly 80% of affluent households still prefer human relationships for financial decisions. Behind every AI output that touches a tax return or a fiduciary decision, there needs to be a human whose name is on it.
What the combination actually looks like
The operational model that produces clean, defensible, structured financial data at scale looks like this:
(1) AI agents do the volume work. Document intake, classification, extraction, normalization, first-pass reconciliation, routine flagging. The thousand-task work that used to occupy 70% of an analyst’s day.
(2) Humans — domain experts, not generalists — do the judgment work. Entity mapping. Exception resolution. Anything where the answer is “it depends on a fact pattern.” The hundred-task work that used to occupy 30% of an analyst’s day but got squeezed because the volume work ate the calendar.
(3) The handoff between the two is engineered, not improvised. Every AI action is logged. Every human review is logged. Every exception is captured so the AI gets better. The system has memory.
(4) The output is a structured database, not a PDF. The result of the combined work is not a report — it is a queryable, permission-able, render-anywhere data layer.
(5) The AI did it. The human refined and checked it. Then you — whoever you are — need to sign off formally on that work. Sign the tax return. Rep that it’s right. But it’s a lot easier to trust and verify than to bang your head against a wall while holding K-1s.
This is structurally different from “we use AI to draft reports and a human reviews them.” It is closer to how a modern radiology practice works: AI reads every scan first, the radiologist focuses on the ambiguous cases, the throughput goes up, the error rate goes down, and the radiologist’s time gets reallocated to the work only they can do.
Why this is hard to build
If this were easy, every wealth platform would have already done it. There are three reasons it’s hard.
One— domain expertise is rare. You cannot build the AI well unless your team includes people who have actually closed a family-office quarter. CPAs, ex-fund administrators, ex-trust officers. That talent does not sit naturally inside software companies. Finance folks who are any good make a ton of money — getting them into a consultative AI role at a startup? Good luck. They have to really believe the things written herein matter.
Two— the AI tooling has only recently become good enough. Modern LLMs paired with retrieval-augmented generation, paired with structured extraction frameworks, are about 18 months into being production-ready for financial documents. Earlier attempts failed. The thing that’s different now is real.
Three— the business model is unfamiliar. Software companies want gross margins north of 75%. Service companies live at 30–50%. A White Glove × AI company looks like a hybrid — services-flavored in year one, software-margined in year three, as the AI absorbs more of the volume. Investors and acquirers have had a hard time pricing that shape. They’re getting better at it.
Think about the downstream impacts. It’s a beautiful thing that Thoma, Vista, Silver Lake invested in three-year contract, 5% escalator stability of large ACVs for so long. That worked. It made a lot of money. Now when someone accepts that they’re going to run a services business — really a SaS business, Services as a Software, but not even, we have a different proprietary internal name for it not presented here — and take all the margin out of the industry, how exactly do you break those three-year contracts and compete on your side? You renegotiate one and suddenly the house of cards really slips. These things are all related, these people all know each other — this isn’t a secret, what anyone pays, after all. It’s like a really tan jacket without a logo in Milan: if you know it’s a Brunello, you know. If you don’t — that guy looks great over there, huh, wonder why.
The Anthropic / MCP angle
A quick aside, because it matters for where this goes.
When Anthropic released the Model Context Protocol in November 2024, and OpenAI, Google, Microsoft, Bloomberg, and the rest of the ecosystem rallied around it, they were quietly endorsing the architecture we’re describing. MCP assumes a world where AI agents read from structured, permissioned data sources — not from PDFs and screenshots. That world only works if someone has built the structured data layer underneath. Anthropic added MCP support specifically for financial services and insurance organizations last quarter. The protocol is open. The data underneath has to be assembled. That assembly is exactly the work we’re describing.
What this means for the buyer
If you’re evaluating an “AI” vendor for family-office or RIA operations work in 2026, here are the questions that separate the real ones from the demos:
(1) Show me your exception workflow. How does a human review a flagged item, and how does that review feed back into the model?
(2) Show me your audit trail. For any given number on the dashboard, can I trace it back to the source document, the extraction step, the human review (if any), and the timestamp?
(3) Who on your team has actually closed a family-office quarter? Names. Backgrounds.
(4) What happens when the IRS asks a question about a number from three years ago?
The vendors who can answer those questions cleanly are the ones running the model that works. The ones who pivot to “our AI is very advanced” are running the model that doesn’t.
The reframe
Humans alone are too expensive and don’t scale. AI alone is too sloppy and not defensible. The combination — engineered, with real domain experts, with a logged handoff, producing a structured data layer rather than a report — is what works. That is the operational thesis. Everything else in this series sits on top of it.
Real Work. Real Moat. Oh — and that is scalable, actually.