How Revolut runs AI at scale

With Nikolay Donets, Head of Machine Learning Engineering at Revolut, at RAAIS 2026.

Jun 25, 2026

Revolut’s AI assistant, AIR, can break down a customer’s spending, answer support questions, route a voice call, and pull in live financial context. At RAAIS 2026, though, Nikolay Donets, who leads machine learning engineering at the company, made the case that the assistant is the easy part. The model itself, he argued, is no longer where the difficulty lives.

The difficulty is in the control plane around it: one gateway, one governance layer, measurable fallbacks, cost controls, layered human review, and a way to run all of it inside a regulated bank that serves more than 70 million customers across over 40 countries. Revolut ships more than 200 products and has handled over a trillion dollars in transactions, with a machine learning model now in the path of almost every one of them. The leverage, in Donets’s telling, has moved from the model to everything around it.

Four constituencies, one bottleneck

For years, Revolut’s AI was classical machine learning: fraud and transaction models shipped through three internal libraries, one each for training, serving, and performance monitoring. Then, in 2022, the ground moved. Vendors began exposing large models behind an API, and suddenly you did not have to train anything to build something. Generative use cases started growing exponentially while the classical models kept running underneath.

Donets spent as much time on the people problem this created as on the technical one. Four internal groups pull in different directions: researchers who want compute and freedom to explore; builders who want one common API and to ship today; operators who want predictability, rollbacks, and cost under control; and a compliance function that owns human-in-the-loop controls, security audit, and data sovereignty. Left to themselves, every product team solves the same problems its own way, and governance fragments into tribal knowledge spread across hundreds of teams. That is expensive, and it does not scale.

Govern the use case, build one gateway

Rather than govern each model one by one, Revolut made two moves that changed the shape of the problem. The first shifted the unit of governance to the AI use case, a move that lines up with the EU AI Act’s use-case-based view of risk, so that one set of risks, budgets, and rules can cover several models at once and match policy to context. The second put a single gateway at the center of the company, with the governance layer on top of it, rather than shipping capability as libraries each team installs for itself.

But there’s a tradeoff: whereas libraries push reliability onto whichever product team owns the service, a central gateway makes one team responsible for everyone. Even so, the cost of improving a library means cutting a release, then persuading hundreds of busy teams to upgrade and absorb breaking changes they never wanted. With one gateway, the central team ships the improvement once and every product inherits it at, in Donets’s phrase, “zero effort.” Compliance and monitoring move to the same place. As a result, Revolut runs roughly twice as many generative use cases as classical ML ones, all off that single platform.

What breaks when the model is someone else’s

Once you are renting frontier models rather than training your own, you inherit failure modes you do not control. Pay-as-you-go providers run at around 98.5% uptime, which sounds high until you count the hours of dead service it implies each month for a scaled, global business. So Revolut wires a fallback chain into every generative product: if the primary model degrades or stops responding, traffic rolls to the next, and the next. Slightly degraded service beats no service at all.

Subtler, and more painful, was a failure they could not see at all. Because the platform watched only inputs and outputs at the interface, a model buried in the fallback chain quietly stopped working and nobody noticed. “Everything was fine, uptime was high enough, but the model itself was not functional,” Donets said. Or, as one of his slides put it: without per-model visibility, a model doing nothing looks exactly like one that works.

Money was the other lesson, and an easier one to swallow. Teams reach for the newest and most expensive model by reflex, but most workloads are over-provisioned, and right-sizing the model to the task cuts cost by as much as eight times with no loss in quality. Donets’s rule: do not default to the newest model in production. Measure first, then use the smallest model that clears the bar.

A note on the org chart

Underneath the platform sits an org chart doing as much of the work as the code. Revolut is flat and built as a matrix: AI engineers are embedded in product teams, each staffed to ship end to end, with a functional line back to the platform. Standards and tooling flow down; field requirements flow up to Donets’s central group, which sets direction and pushes compliance rules out. He called the product teams “our forward-deployed engineers,” the mechanism by which one team’s hard-won experience becomes everyone’s. The architecture, as one slide noted, ends up shaped like the org chart.

From Rita to AIR

Where all of this lands is a single product that has been running for years. It began as Rita, a support chatbot built on intent models and pre-filled scenarios - the “slot machine” era - that frustrated as often as it helped. In 2022 the team tested large models, Bloom and BloomZ at 175 billion parameters, and found they worked. The first thing they put into production was mundane: paraphrasing a multi-screen FAQ into a short, relevant answer. LLM-based Rita reached production in Q2 2023, then rolled out country by country, Europe first and Japan the hardest, finishing around Q1 2025.

Voice came next, and it runs on a simple pipeline: audio is transcribed, a small LLM decides whether to answer directly or hand off to the full multilingual chatbot, and an end-to-end response comes back in under two seconds. It now runs in 20 countries, handles around 25,000 calls a month, and resolves a customer’s problem roughly eight times faster than a human agent. AIR, the latest layer, followed in Q2 2025 and pulls in transactional data: it can break down your spending, propose hotels inside a budget computed from your own history, or explain why a stock is moving. Across the arc from the old chatbot to today, the share of cases resolved without a human climbed from 17% to 80%, Net Promoter Score went from low to high, and the financial impact, Donets said, ran into double-digit millions of pounds. AIR began rolling out in the UK in April 2026, where Revolut says it has 13 million customers.

Where human oversight is mandatory

Holding all of it up is the monitoring layer. Revolut stores every input and output and runs a panel of LLM “judges” against live traffic - one dedicated to hallucination - currently 9 to 12 mandatory metrics and rising, backed by human review teams that sample chats and transcripts, and by the blunt signal of Twitter and Reddit when something goes badly wrong.

Above all of it sits a hard line: no decision that can change someone’s life is made by an AI system. That position got tested in the room. An audience member who works on regulated healthcare AI pushed back - in his field, he said, “humans make that process unsafe,” so an AI judge might be the safer choice. Donets gave ground on the evidence, agreeing that machines “provide more stable and better help to users,” but not on the principle: the critical calls still do not go to the model. Asked how soon that might change, he did not hedge: “This year, definitely no.”

The frontier gets the headlines, but shipping AI inside a regulated bank across 40 countries is won or lost on the plumbing beneath it: one gateway, the right unit of governance, a fallback for when the vendor fails, and a human who still gets the last word.

Discussion about this post

Ready for more?