AI Governance FinOps Financial Services Point of View 8 min read

AI Spend in Financial Services: Cost Just Became a Governance Layer

On 19 May 2026 Databricks shipped AI spend controls in Unity AI Gateway: budgets for AI usage set per user, per use case, per workspace, or per account. The feature is useful. The more interesting signal is where they put it. Cost stopped being a line on a cloud bill and became something you govern, next to access and audit.

Views are my own and do not represent IBM. This piece reflects personal analysis of public information; nothing here references confidential client work.

The short version

Databricks added AI spend controls to Unity AI Gateway on 19 May 2026: budget alerts per user, per use case, per workspace, and per account, with every request logged to Unity Catalog system tables priced in DBUs, not just token counts.
Agentic AI cost does not grow in a straight line. Fan-out, tool calls, and retry loops make it super-linear. Databricks names the failure mode plainly: a retry loop can multiply a bill tenfold overnight.
For financial services that is a governance gap, not a finance one. Banks already govern who can reach which data and which decisions get logged. Spend was the dimension that lived on a separate bill, discovered a month late.
Putting budgets in the gateway makes spend a governed dimension: attributable to a desk or use case, capped where a risk owner signs off, queryable in the same place as access and audit.
Caps are guardrails, not a strategy. The real work is unit economics: cost per decision, cost per resolved case. The institution that wins defends its AI spend rather than simply minimising it.

The cost curve was never a line

Most AI budgets rest on a straight-line assumption: price per token, times expected volume, plus a margin. That arithmetic held when AI meant a person typing into a chat box. It does not hold for agents.

An agentic workflow does not make one model call. A single user request enters an orchestrator, fans out to specialist agents, and each of those calls models and tools, often several times over. Then add retries. When a step fails and the system tries again with more context, the next attempt costs more than the last. Databricks describes the failure mode directly in its announcement: retry logic that multiplies cost tenfold overnight, and developer experimentation across multi-agent scenarios that burns a monthly budget in days.

Fig. 1: One request enters the orchestrator and fans out to specialist agents. Each agent calls models and tools; the reasoning agent retries. The meter fills as the billable calls multiply. Agentic cost is a branching tree, not a line.

The shape that matters is in the diagram. One request goes in. Dozens of billable calls come out, and the branches you did not plan for are the expensive ones. This is not a reason to avoid agents. It is a reason to stop budgeting them like chat. IDC has started calling the gap an "AI infrastructure reckoning" and projects that large enterprises will run AI infrastructure costs up to 30 percent over estimate by 2027. Those estimates are not wrong because people are careless. They are wrong because the cost model is linear and the system is not.

Why this lands differently in financial services

Every large institution I work with already governs three things well: who can reach which data, what each system is allowed to do, and what gets written to an audit trail. Those controls are mature because regulators require them to be.

Cost was the exception. AI spend showed up weeks later on a cloud invoice: aggregated, hard to attribute, owned by nobody in particular until finance went looking. In a bank that is not a billing inconvenience. It is a control gap. A risk committee cannot approve a system whose spend is unbounded and unattributed. Procurement cannot model a contract around a number that moves tenfold without warning. The second line cannot sign off on a process it cannot see.

So in practice the cost question has been slowing agentic AI in financial services more quietly, and more effectively, than model accuracy ever did. Teams could prove the agent worked. They could not prove what it would cost at scale, to whom, and with what ceiling. "It depends on usage" is not an answer an investment committee accepts.

Key insight

In a regulated institution, an unattributed cost is a control gap, not a billing detail. A number nobody owns is a number a risk committee cannot approve.

Spend moved into the control plane

Here is what Databricks actually did, and why the placement is the point. AI spend controls are not a separate cost tool. They sit inside Unity AI Gateway, the same layer that already governs which models and tools an agent can reach and logs every call it makes.

Budgets can be set at four levels: per user, to catch one developer's runaway agent before it reaches the bottom line; per use case, so a coding-agent budget alerts past a set figure; per workspace, so production and sandbox are funded separately; and per account, a single ceiling across every model and provider. Every request is logged to Unity Catalog system tables priced in DBUs, with provisioned throughput, pay-per-token usage, and external provider token costs calculated for you. Spend can be grouped by identity, endpoint, model, provider, or tag.

Fig. 2: Requests pass through the gateway into four budget tiers. Each tier tracks spend against a ceiling; the per-use-case gauge crosses its threshold and an alert fires. Cost is now governed in the same place as access and audit.

Read that as an architecture statement, not a feature list. Cost has joined access and audit as a governed dimension. It is attributable to an identity. It carries policy. It is queryable in the same place as everything else a regulator might ask about. For a financial institution that matters more than the dollar figures, because the bank already has the committees, the owners, and the review cadence for governed dimensions. It never had them for a cloud bill.

Guardrails are not a strategy

A budget alert is a good control. It is not a cost strategy, and the difference is worth being honest about.

A cap tells you when to stop. It does not tell you whether the spend was worth it. An agent that costs 40,000 dollars a month and removes two million dollars of manual review is wildly underfunded. An agent that costs 4,000 dollars a month and produces work nobody trusts is overfunded at any price. A budget alone cannot tell those two apart. It will happily throttle the first and protect the second.

The work a spend control does not do for you is unit economics. What is one agent interaction worth? The FinOps community has started using phrases like "cost per thought" and a token budget per project. In financial services the unit should be sharper than that: cost per onboarding decision, cost per resolved dispute, cost per surveillance alert cleared. Once spend is attributable, which is exactly what the gateway now makes possible, those numbers become computable. Until you compute them, a budget is just the place where the surprise stops being visible.

The institution that wins agentic AI is not the one that spends the least. It is the one that can attribute spend to value and defend it to a committee.

What I would tell a financial-services team

Five things, in order.

Instrument before you cap. Turn on the system-table logging and watch real spend by use case for a few weeks before you set a single budget. A cap set on a guess is just a different guess.
Attribute from day one. Tag every agent and endpoint with a desk, a use case, and an owner. Spend you cannot attribute is spend you cannot defend, and that is where the audit conversation goes wrong.
Set the budget where a risk owner signs. The level that matters is not the account ceiling. It is the per-use-case line, because that is where a named person accepts the cost and the risk together.
Treat a retry loop as an incident. A tenfold overnight jump is not a billing event. It is a system that failed in a way that happened to be expensive. Alert on it, review it, fix the cause.
Make the system table the audit artefact. When the second line or an examiner asks what the AI cost and who incurred it, the answer should be a query against Unity Catalog, not a spreadsheet assembled under pressure.

The bill became a governance problem, and that is good news

It is tempting to read AI spend controls as a finance story: another dashboard, another alert. For financial services it is better than that.

A cost that lives on a cloud bill is a number you react to. A cost that lives in the control plane, attributed and capped and logged, is a number you govern. Governance is the thing banks are genuinely good at. They have the committees, the owners, the audit cadence, and the muscle memory. The reason agentic AI spend has been hard is that it sat outside all of that. Moving it inside does not make AI cheaper. It makes AI approvable, and for most institutions approvable is the constraint that has been binding all along.

Reminder: This reflects my personal analysis and opinions. It does not represent the views, strategy, or endorsement of IBM, Databricks, Microsoft, or any other organization. All trademarks belong to their respective owners.

Want to talk architecture?

I work with financial services teams designing and governing agentic AI on Databricks and Microsoft Fabric. Happy to compare notes.

Book a Session