New Whitepaper! Getting Data Mesh Buy-in

Download Now!

Data mesh roundtable — Data Product Design

By now, most people know the four cornerstones of data mesh: domain driven ownership, data as a product, self-serve data platform and computational governance. Previously, we already had some how-to’s in our roundtable discussions about domain driven ownership and computational governance. As self-serve data platforms are the technological aspect of things, much more content can be found about it.

But data as product? how do you do that? How do you design such a data product? This was still a blank spot, and therefor the topic of the Data Mesh Learning Community roundtable of May 2024. Our regular hosts Karin Hakansson, Amy Raygada, Andrew Sharp and myself were accompanied by Paolo Platter on this occasion.

Meetup announcement — image by Data Mesh Learning community

What is a data product?

When designing a data product, it is good to understand what a data product actually is. A data product is an atomic unit of data, software and metadata, it is an independent deployable unit which includes the business logic from a given business domain. Historically we have seen many organizations where code was owned by a central data team and data and metadata where owned by a business team. Within a data product, due to the tight coupling of software and metadata, this is no longer possible. Technical and data ownership are brought together and assigned to a business domain.

Technical and data ownership are brought together — Photo by Priscilla Du Preez 🇨🇦 on Unsplash

The biggest difference with a dataset, is the inclusion of the business logic. You know that relevant transformations, or explanations, are applied. As this business logic is bound to a given business domain, this can result in multiple versions of data products, belonging to multiple business domains. Imagine when your company offers both a subscription model for one service, a one-off payment for another product and a freemium service in a third business domain, you can imagine ending up with three distinct data products about customers.

Phases of a data product

Crafting a product always exist of a few distinct phases. First you need to design the product, so that it fulfills a need of your consumer and that is feasible for the producer to create. The next phase is the actual creation, or in the context of data or software engineering: writing code. When you have a valid V1, you enter the deployment phase. This is where the governance, computation governance, comes in play. Typically security information should be checked at deploy-time: you don’t want to notice security issues in a later stage.

When the product is available to everyone, you enter two continuous phases: immediately you start with the operational effort of managing a software product at runtime and quite soon you will enter a change management lifecycle, either project-based or really product management inspired.

Change management demands clear communication — Photo by Nick Fewings on Unsplash

It’s this change management lifecycle, just like in every software engineering project, which asks for good communication between producers and consumers. Small changes like adding fields can result in new minor versions, breaking changes like removing fields result in new major versions and require a deprecation path. This path should be fixed in time and coded in a computational policy. In the end, you want your consumers to have the time to switch over, but also to feel the need to do so.

Consistency and, or flexibility: output ports

We have already established that you can have multiple versions of the same or similar data product(s): similar products belonging to different domains, multiple versions of the same product with a deprecation path. On the other hand, you want to strive for consistency and a slender landscape to lower the maintenance burden.

When relevant and possible, multiple output ports are assigned to the same data product. Think about adding filters, e.g. you can only see the data from your country, or masks, e.g. you are not allowed to see phone numbers. Such ABAC rules in combination with AD-groups allow you to reuse the same logic and data product multiple times without increasing your maintenance efforts.

Balance consistency and flexibility, both in your data product as data platform design — Photo by Gustavo Torres on Unsplash

By the way, the same consistency versus flexibility balancing exercise takes place in your data platform design, where you want to offer a single platform, offering every domain team the toolset of their desire, without introducing a magnitude of tools to maintain. This data platform, contrary to your domain data teams, is build by a single data platform team, and supports a limited set of different output ports, like a data warehousing technology, a dashboarding technology and plain API’s.

How to design your data product?

As a data product manager, you have to take into account two restriction: those of your domain and those of the data platform. Ideally the first is defined quite strict and the second allows for a lot of flexibility.

When designing your data product, you take the needs of your customers into account. Is your data product desired by someone? For source-oriented data products, the answer is quite often yes as these are the building blocks for downstream data products. With regards to these downstream data products, it’s a bit more complicated. Some data products are really customer-facing, think about a recommender engine, others are still internal facing. For those that are internal-facing, the end-user data products are again easier: someone might have asked you for a specific report. But intermediate data products, are probably the hardest part. A typical example might be a customer 360 view. Yet still, whatever you add to your data product, should be desired by a customer.

When you have established a need for your product, you can start to design a data product fulfilling this need. Your direct user should be able to use the product as is. Hence all required business logic should be included in the product and all relevant metadata should be available close to the product. As an atomic unit, both should be updated together, which results in combined ownership.

Once the product is live, you think about lifecycle management, versions and deprecation paths. As well as maintaining its runtime. To limit this maintenance effort, within your design, you take the possibility of multiple output ports into consideration. Having multiple ABAC policies in place, is easier to maintain than multiple versions of the same data.

What’s next? Roundtable on data catalogs

Your data product will be registered in a data catalog, combined with its metadata. Want to learn more about data catalogs? Join our roundtable on data catalogs with Ole Olesen-Bagneux.

Ways to Participate

Check out our Meetup page to catch an upcoming event. Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community Resources