It was the early 2010s. I had become a bit obsessed with YouTube on my first iPhone and remember watching Eric Evans in a video talking about “Domain-Driven Design” (DDD) — prompting me to get his book.
As someone who came into technology roles from a business consulting background, the concept made total sense. Understanding and capturing the domain knowledge, or the business context, is the heart of any successful software project. It created a way of thinking about the building blocks of software in usable, interconnected products.
In the realm of data architecture, however, thinking about the smallest domain-based components is not the world. Though gaining traction, the land of data lakes, warehouses, lake houses, and marts makes embracing domain-driven design and microservices approaches somewhat a rarity.
And so, with the opportunity to test out domain-based data products at a global fintech company via a data mesh implementation, we did just that. As a result, here's what we learned about where you can apply DDD to the Data as a Product concept.
We start with the DDD idea that those using the data for their day-to-day responsibilities know what they need. Here's how its applied:
DDD says that the domain model should be the heart of the software development process — meaning everyone involved in the project, from developers to domain experts to business users, should use a common (or “ubiquitous” as defined by Evans) language to talk about the domain. This language should be as specific and unambiguous as possible so that everyone is on the same page and there’s no confusion.
DDD also teaches engineers to design a software system that reflects the complexity and richness of the specific business domain. The system should have capabilities to handle all the different ways that the domain can be used while remaining easy to understand, maintain, and update as the domain evolves.
DDD is a powerful approach to software development that can help you create meaningful systems valuable to the business user. While a bit more complex than other approaches, it’s worth the effort if you want to create an impactful system that improves business performance.
In data mesh design, we can apply the same concept of bounded contexts from DDD. A data domain is like a self-contained unit responsible for storing, processing, and governing analytic data sets. Each data domain aligns with a specific business capability, such as customer segmentation or finance, and has its own dedicated team that owns and is accountable for the data.
In other words, the people who are using the data to do things like forecast, build models, or understand profitability levers are the ones who know the data best and, therefore, should be responsible for overseeing it — a contrast to the traditional approach, where data is managed by a central team of engineers.
The data mesh approach empowers the people using the data — enabling better data quality, more agility, and faster time to insights.
When we build a domain-based data solution, we don’t just focus on one domain at a time. We also look at how the different domains in an organization interact with each other. For example, a holistic customer record might include data from several parts, like segmentation, historical buying patterns, and IDs across products.
We work with the data users to understand how they think about data and how they want to use it. Then, we build data domains that meet their needs. We also create interoperability between domains so that data can flow freely between them — allowing our customers to get never-before-seen insights into their data.
In one implementation, we had a very complex business problem. Data was spread across multiple business functions, each holding a different definition of a key metric depending on the function. We wanted to create a holistic perspective of the data that would show the total customer view.
We achieved this by focusing on the interoperable product — creating a data domain and its associated metrics that could be understood and used by all business functions. We worked closely with the business to understand their needs and empower our data experts to organize the data in a meaningful way.
The result: A holistic view of the data that allows the business to make better decisions that impact their customer. Executives and data scientists alike can now see trends and patterns they've seen before — ultimately improving their products and services.
In Domain-Driven Design (DDD), a bounded context is a conceptual area of focus within a software system. It’s like an imaginary line that divides the system into smaller, more manageable parts — helping teams work independently on different parts of the system while still keeping everything consistent and integrated.
Each bounded context has its own vocabulary, concepts, and rules. This helps to ensure that everyone on the team is working with the same understanding of the domain while helping avoid confusion and errors.
Bounded contexts often align with business capabilities. For example, a customer management bounded context might focus on the concepts of customers, orders, and invoices. A product management bounded context, on the other hand, might focus on the concepts of products, features, and pricing.
The boundaries between bounded contexts are not always clear-cut. There may be overlap between them that must be addressed for communication and coordination purposes. However, by clearly defining the boundaries between bounded contexts, we can make it easier to develop and maintain a complex software system.
Data products are similar to bounded contexts but focus on the data itself. A data product is a well-defined, self-describing data asset that serves a specific business need. It has business context and relevance, which allows data scientists to use it to build machine learning and artificial intelligence (AI) models.
Data products are the building blocks to creating robust, maintainable, and domain-centric data solutions. To achieve these benefits, you must model data products to:
Without a data product view, data scientists are stuck fending for themselves — spending hours asking around an organization and searching through data sets that don’t have any business context to try and guess which data attributes are relevant. Making matters worse, they can’t give feedback to the data providers about the validity or use of the data without going through a lot of red tape.
This is a huge time waster for data scientists transforming and preparing data for single-use cases. On average, 80% of analytic time is lost finding, cleaning, and validating data. Imagine how many more insights they could derive if they had a data product view that gave them the context they needed and the ability to collaborate with the data providers.
A data product view is a way of organizing data around a specific business need. It’s like a map that shows the data scientist where to find the information and includes the business context for the data, so the data scientist doesn’t have to guess what it means.
They can also give feedback on the data product in a continuous improvement loop, which is a powerful tool to ensure the accuracy and usability of data, along with the metrics and definitions of their use as the business evolves.
Applying the concept of bounded contexts to data products is a great way to create data solutions that are easy to find, understand, maintain, and evolve. By decomposing data assets into manageable data sets with relevant business context for specific business use, you get better utilization of the data asset.
These data products are more valuable, relevant, and verifiable through observability functions to affirm data quality. By creating data products, you get to reap these benefits:
One example of the application of data products comes from leading a team to create a solution to redefine risk analysis across an enterprise. We wanted to move away from a single-lens, loss-focused model to a more holistic profitability view that considered all the different domains (risk, finance, sales, product, etc.).
As we built the solution, we needed to achieve a few objectives:
This prompted us to establish and test data mesh as a proof of concept to see if we could separately define data products unique to each domain that could then be brought together for a holistic view of profitability. The solution allowed us to create reusable analytic views of data that could be accessed interchangeably. It could connect and move different products, similar to Legos, each time it generated a new profitability picture. We made the data products easily accessible through self-service tools for data scientists and engineers.
Data mesh is just one example of bringing the productized concept to the data management field. That said, depending on complexity, data culture, and executive buy-in, it may not be the "bull's eye" solution for every company. Still, it’s a great place to start as we embrace exploring these concepts.