AI readiness isn’t about adopting tools—it’s about building an architecture that captures events, consolidates institutional memory, and exposes data in forms AI can reason over. Message bus, lakehouse, and AI data tier together create that foundation.
This is the third and final post in a series about why AI initiatives struggle in banking and mortgage—and what to do about it.
In the first post, I argued that AI isn't failing because of AI. It's failing because most institutions lack the data foundation that AI requires. Systems don't understand each other. There's no shared semantic model. Data is fragmented across dozens of SaaS platforms with no unified view.
.jpg)
In the second post, I introduced event-driven architecture as the solution to integration sprawl. By wrapping existing APIs and publishing business events to a central message bus, institutions can create real-time visibility, decouple their systems, and—critically—produce a persistent stream of meaningful business events.
But I ended that post with a caveat: the message bus isn't the destination. It's the highway.
Today, I want to show you where that highway leads.
Let me start with the complete pattern, then we'll walk through each layer:




Three layers. Each one builds on the one below. Each one serves a distinct purpose. Together, they create an institution that's not just AI-curious, but AI-ready.
Let's work our way up.
We covered this in detail last time, but let me recap why it matters:
The message bus is where your operational systems publish business events—Loan Application Submitted, Document Received, Payment Missed, Customer Contacted. These events flow in real-time, expressed in a common vocabulary that abstracts away vendor-specific schemas.
The message bus solves the integration problem. Systems no longer need to know about each other. They publish and subscribe to events through a shared infrastructure that decouples them from point-to-point dependencies.
But here's what I didn't emphasize enough last time: the message bus also solves the data capture problem.
Every event that flows through the bus can be persisted. Stored. Indexed. Retained for as long as you need it.
.jpg)
This is fundamentally different from point-to-point integration, where data moves between systems and then vanishes. With an event-driven architecture, you're not just integrating—you're recording. Every significant thing that happens in your business becomes a durable fact that can be analyzed, queried, and learned from.
The message bus is the data collection mechanism that most institutions don't realize they need.
Now we get to something that might be unfamiliar if you haven't been tracking the data infrastructure landscape: the lakehouse.
For years, organizations faced a choice between two data storage paradigms:
Data warehouses were structured, governed, and optimized for business intelligence. They were great for reporting and dashboards but rigid—you had to define your schema upfront, and storing unstructured data (documents, images, logs) was awkward at best.
Data lakes were flexible. You could pour any data into them—structured, semi-structured, unstructured—and figure out the schema later. But they often became "data swamps"—vast repositories of raw information with poor governance, unreliable quality, and no clear way to derive value.
The lakehouse is the architectural pattern that combines the strengths of both. It provides:
For a financial institution, the lakehouse becomes the enterprise memory—the place where everything you know about your business comes together.
What goes into it?
Event history from the message bus. Every business event, persisted over time. The complete record of what happened, when, and in what sequence.
Structured operational data. Loan records, account information, customer profiles, transaction histories—the traditional data warehouse contents.
Semi-structured data. API payloads, system logs, configuration data, metadata from various platforms.
Documents. Loan applications, income verification, appraisals, disclosures, correspondence. Not just metadata about documents—the documents themselves, made searchable and analyzable.
External data. Market rates, property valuations, demographic information, credit bureau data—the third-party information that enriches your internal view.
This isn't just storage. It's consolidation. For the first time, you have a single place where the complete picture of your business can be assembled and queried.
.jpg)
When someone asks "show me everything we know about this customer," the answer exists in one place. When someone asks "what was the sequence of events that led to this loan defaulting," the answer can be reconstructed. When someone asks "how does this borrower compare to similar borrowers we've served," the comparison can actually be made.
The lakehouse is the system of record for analytics and the system of context for AI.
Now we reach the layer where AI actually lives.
The lakehouse contains your data, but raw data—even consolidated raw data—isn't what AI models consume. They need data that's been organized, enriched, and structured for their specific needs.
The AI data tier sits on top of the lakehouse and provides four critical capabilities:
A semantic model defines the core entities that matter to your business and how they relate to each other.
For a bank or mortgage lender, these entities might include:
The semantic model doesn't just list these entities—it defines how they connect. A Customer belongs to a Household. A Customer can have multiple
Accounts. A Loan is secured by a Property. An Application generates Interactions and requires Documents.
Why does this matter? Because AI needs to traverse these relationships to answer meaningful questions.
"Which customers in this household have products with us?" requires understanding the Customer-Household relationship.
"What documents are we still waiting for on this application?" requires understanding the Application-Document relationship.
"How have we interacted with this customer in the past 90 days?" requires understanding the Customer-Interaction relationship.
Without a semantic model, AI has to rediscover these relationships every time it runs. With one, the relationships are first-class objects that AI can navigate fluently.
Machine learning models don't consume raw data—they consume features. A feature is a derived, computed attribute that's useful for prediction.
Examples of features for a financial institution:
A feature store is a centralized repository where these computed features are stored, versioned, and served to models. It solves several problems:
Consistency. The same feature is computed the same way everywhere, whether it's used for batch model training or real-time inference.
Reusability. A feature computed for one model can be reused by other models without recomputation.
Freshness. Features can be updated as new events arrive, keeping them current for real-time applications.
Governance. You can track which features are used by which models, understand their lineage, and ensure they're computed from authoritative sources.
Building features manually for every AI initiative is one of the reasons those initiatives take so long and cost so much. A feature store makes the investment in feature engineering reusable across the organization.
This is where things get particularly relevant for generative AI and large language models.
An embedding is a numerical representation of something—a document, a customer interaction, a product description—that captures its semantic meaning. Two things that are similar in meaning will have embeddings that are close together in vector space.
Why does this matter? Because it enables semantic search and retrieval.
Traditional search is keyword-based. If you search for "income verification," you find documents that contain those exact words. But what about documents that talk about "proof of earnings" or "salary documentation"? Keyword search misses them.
Semantic search using embeddings finds documents based on meaning, not just words. This is how modern AI systems retrieve relevant context—they embed the question, find documents with similar embeddings, and use those documents to inform their response.
For a financial institution, embeddings unlock capabilities like:
Intelligent document retrieval. "Find all documents related to this borrower's employment history"—even if they're titled inconsistently.
Similar customer identification. "Show me customers who look like this one"—based on behavioral patterns, not just demographic fields.
Contextual search across interactions. "What have we discussed with this customer about refinancing?"—searching across call transcripts, emails, and chat logs.
Policy and procedure lookup. "What's our process for handling this exception?"—finding relevant guidance even when the question doesn't match the policy title.
The AI data tier includes infrastructure for computing, storing, and querying these embeddings at scale.
Finally, the AI data tier provides the infrastructure to actually run AI models—both the foundation models (large language models) and custom models trained on your data.
This includes:
API endpoints for model inference Orchestration for multi-step AI workflows (agentic AI) Guardrails for safety and compliance Monitoring for model performance and drift Versioning for model lifecycle management
This is where AI stops being a research project and starts being an operational capability.
I've been fairly technical in this post, so let me ground this in practical outcomes. What can you actually do when this architecture is in place?

Agentic AI refers to AI systems that can take autonomous action—not just answer questions, but execute multi-step tasks.
With the architecture I've described, you can deploy agents that:
These aren't chatbots. They're digital workers that can handle complex, multi-step processes that previously required human coordination across multiple systems.
Documents are the lifeblood of lending—and the bane of operations teams.
With documents stored in the lakehouse, embedded for semantic understanding, and connected to the semantic model, you can:
Document review is one of the highest-volume, most time-consuming activities in mortgage lending. Intelligent automation can reduce cycle times dramatically while improving accuracy.
Most banks and lenders know very little about their customers as people. They know them as account holders, as borrowers, as users of specific products—but not holistically.
The architecture I've described creates a unified customer view that enables:
This is how you compete with fintechs that were born with unified data architectures while you're burdened with decades of system sprawl.
Risk management in most institutions is retrospective—you find out about problems after they've occurred. The event-driven architecture changes this.
When every significant event flows through the message bus in real-time, you can:
Risk and compliance often feel like they're at odds with growth and innovation. A solid data architecture makes them enablers rather than obstacles.
I've described a target architecture that might feel distant from where you are today. Let me offer some perspective on the journey.
This is not a big bang transformation. You don't need to build all three layers simultaneously or completely before deriving value. Each layer can be built incrementally, and each increment delivers benefits.
Start with high-value event types. You don't need to publish every possible event to the message bus on day one. Start with the events that matter most—probably loan lifecycle events if you're a lender, or account and transaction events if you're a depository. Prove the pattern with a focused scope.
Let the lakehouse grow organically. As events start flowing, they feed the lakehouse. As you identify high-value use cases, you add the data sources they require. The lakehouse becomes more comprehensive over time, not through a massive upfront loading effort, but through incremental expansion.
Build the AI data tier based on use cases. Don't try to build a complete semantic model of your entire business upfront. Start with the entities and relationships required for your first AI use cases. Expand the model as you expand your AI capabilities.
Use managed services where possible. The major cloud platforms offer managed message bus services, managed lakehouse platforms, and managed AI infrastructure. You don't need to build and operate everything from scratch. Focus your engineering effort on the business logic—the event definitions, the semantic models, the features—not on infrastructure operations.
Expect 18-24 months to foundational capability. Institutions that approach this systematically can have a functioning message bus, a growing lakehouse, and initial AI capabilities within 18-24 months. That's not instant, but it's far faster than the multi-year transformation programs that have burned so many organizations.

And critically: you're not building this instead of doing AI. You're building this so that AI actually works.
Let me close with the core message of this series:
Every financial institution I talk to wants to leverage AI. They see the potential. They feel the competitive pressure. They've allocated budget. They've hired talent.
But most of them are trying to succeed at AI without addressing the architectural foundation that AI requires. They're launching pilots that solve the data problem in narrow, bespoke ways—and then wondering why those pilots don't scale.
It's the decision to stop treating data as a byproduct of operations and start treating it as a strategic asset.

It's the decision to replace integration sprawl with integration architecture.
It's the decision to build a shared language for your enterprise—business events, semantic models, common vocabularies—instead of letting each system speak its own dialect.
It's the decision to invest in a data foundation that makes every future AI initiative faster, cheaper, and more likely to succeed.
This is the work that separates institutions that talk about AI from institutions that actually operationalize it.