A summary of the discussion during our Data Mesh Learning community panel held in June
In June, the Data Mesh Learning community hosted a panel discussion about data mesh products, including how to define, create, manage, and evaluate them.
To start, the speakers explored the definition of a data product. They recognized the value and impact that data products bring to various aspects of an organization’s operations.
Participants stressed the importance of collaboration, comparing data mesh to a car assembly process. Working across teams and sharing data enables teams to create more significant outcomes and iterate faster, similar to how a car can be built more efficiently with standardized components.
The discussion then shifted to a strategic and user-focused approach to data product design. They recommended a user-centric approach along with a cycle of discovery, technical implementation, and design.
To close the conversation, the speakers shared ideas on how to evaluate and potentially retire data products based on their value and usage.
Data Mesh Products in a Data Mesh World
Panelists:
- Jill Maffeo – Senior Product Manager, Vista
- Miguel Morgado – Senior Product Owner – Digital Products, OneWeb
- Omar Khawaja – Head of Business Intelligence, Roche
Moderator:
- Sanjeev Mohan – Principal, SanjMo
Watch the Replay
Read the Transcript
Download the PDF or scroll to the bottom of this post
Ways to Participate
To catch an upcoming event, check out our Meetup page.
Let us know if you want to share a case study or use case with the community.
Data Mesh Learning community resources
- Engage with us on Slack
- Organize a local meetup
- Attend an upcoming event
- Join an end-user roundtable
- Help us showcase data mesh end-user journeys
- Sign up for our newsletter
- Become a community sponsor
You can follow the conversation on LinkedIn and Twitter.
Transcript
Sanjeev Mohan (00:00):
Welcome everyone to this exciting panel discussion. We’ve got an exciting topic. Today. We will be talking about data products in practice. In a data mesh world, what we mean by that is real life practitioners on this call, these are the people who are living and breathing data products every day. By the way, we, they are also in the space of data mesh, but data products can be handled independent of data mesh, and that’s why we are going to talk about data products specifically. Once we we’ve gone through, done some introductions and talked about the current environment, the idea for this this this meetup is to really pick the minds and dig into, into how Jill, Omar, Migueltaking the space forward so we can learn from their practical examples and, and leave with, with some actionable steps. So with that, I’m gonna hand it over to Jim so she can do a introduction. Then we’ll go to Omar and me, Miguel. So Jim, please introduce yourself.
Jill Maffeo (01:16):
Hi everyone. I’m Jill. Right now I’m in Vista as a company and we work with small business owners to be their marketing design partners. The majority of my early career has been spent there as an individual contributor. On the analytics side, I came up in data, and the last four or so years, that focus has changed into data, product ownership, and now most recently into product ownership generally with a data component as well. So a mixed team of software engineers and data folk. So that’s me.
Sanjeev Mohan (01:50):
Very nice. Thank you, Jill, for coming on on this. Omar.
Omar Khawaja (01:55):
Yeah. Hi everyone. And first of all, thanks Sanjay for including me in the panel discussion. It’s great to be here together with Miguel and Jill. A little bit about myself. Similar to other panelists, I’ve also spent a lot of time in it as well as specifically in data and analytics. My journey with data products is all within the context of data mesh that started back in 2020, 2021. So I am looking forward to sharing what we have done recently. Also, I’ve taken over as a role for it’s like drinking your own champagne, where for our own informatics team, I’m taking the lead for data domain. So I’m I’ll see if I can share a few things with respect to that. So I’m looking forward to it. Thank you.
Sanjeev Mohan (02:43):
Great. I’m Miguel.
Miguel Morgado (02:45):
So good day everyone. So my name is Miguel. So I work for One Web. So One Web is a global satellite telecommunication company headquarters in London. But we have office all around the world. For you guys. In, in us, we are equivalent of your SpaceX starlink. It is just, just is two company in the world that do low earth orbit satellite communications. One is starlink from lms, and another is one way. So I’m a senior product owner for the data and performance service in, in one web. And of course, we deal with vast amounts of data daily.
Sanjeev Mohan (03:28):
Very nice. So I am so intrigued by the fact that Miguel is in London. Omar is in Basil, Switzerland. Jill is in Boston, and I’m on the US West coast. So we’ve covered quite a bit of territory that one websites likes cover, which is missing Asia and Australia, Africa. But so as far as my background is concerned, I have been in the data space for almost three decades now. Started my career in the database management system space, went into consulting. And then about a few years ago, I joined Gartner. I spent almost five years there in the research as a research analyst in the data and analytics space. So this year I’ve been smitten by not just data products, but optimization efforts because of the economic macroeconomic conditions we are in now with LLMs. So, I, space is super exciting. However, I wanna stick to data mesh. Omar, you said you started your journey in 2020, 2021, so you probably were one of the first people to make this foray into data mesh. Can you share with us where are you today in your data mesh? In terms of how is your how are you organized, what processes you have in place, and what technologies constitute your data mesh?
Omar Khawaja (05:02):
Sure, Sanjay, that’s correct. It was the time when I was you know, coming up with our strategy for BI and analytics for Roche Diagnostics. That’s where we started our journey. And it was I if we do, if we haven’t forgotten it was the peak of pandemic where we were still struggling to know how to work in a virtual world. But it has its positives where this whole idea of if we keep on doing the same things like keep on building data lakes and data warehouses, it’ll not make any difference. So what different things are we going to do? And I strongly believe that data topic is always about people, process and technology. Hmm. So just moving things from on-premise to cloud, and I’ll talk about technology a little bit. I’m glad that you’re bringing it upfront.
(05:50):
People don’t talk about this. I think it plays an important role tha and that was the motivation to move ahead with this. We didn’t know what those four principles were as, as to the level that we know now. For example, even of our ex experience luckily some of the smart people in the company have already in the, under the umbrella of data governance, have defined some of the data domains out there. So that was a very good starting point for us. But in terms of bringing the concept to life, you can only do that once you have those data products in the hands of the users to create value. And that’s what we have been doing. We started with one and a half team and I and there is a story there as well, why one and a half team?
(06:38):
But now I was talking to one of my colleagues today, and I think we have crossed approximately a hundred plus teams all across the company. We are not just in one division or in one data domain, but everywhere in different shapes and forms, ranging from very many source oriented data products to many consumer oriented data products. We are still learning, we are still learning all the four around all the four principles. So it’s a, it’s has been a learning journey, and I can only encourage people not to wait for a perfect moment to know everything. Yeah, that’s, that will never happen. I think in terms of technology our key goal in terms of choosing any of the components behind the technology was what capabilities do we need in that platform that will serve as a self-service for the data product teams?
(07:32):
Cause unlike previous practices, we did not want to the data platform team to do anything to do with the what’s going inside the data products, for instance, like the pipelines or the aggregations, or the K P I calculations or the monitoring, et cetera. So we had, we started with just three components. It was hey, how do you get the data in? So like an E L T E T L type of a tool which would bring the data into our cloud platform. We have big presence of snowflake for its instance. And then how can we show the insights out of the data product? So we, even in the beginning, we had multiple variations ranging from our BI tool like Tableau, as well as custom developing things. In our first six months, we did not have integrations in place with number of other things like our data cataloging solutions or we did not have fitness functions, et cetera. But now we have a lot of progress made, which we can share a number of other topics today.
Sanjeev Mohan (08:35):
W what, what do you use for your data catalog, for ingestion, for data management ops? What, what are those products?
Omar Khawaja (08:44):
But depending on the data modalities type of the data, we have different variations of data ingestion tools including Talend, for example, we have some CDC tool from Flink. We are talking about some digital tools, even from a very unique setup of what we have with Domo, for example. We also have script based tools where we had the data on some S3 blob storage, where we did not need to use anything else. Our goal is to bring a data in a place like Snowflake, where you can reuse the data without copying it. As far as cataloging is concerned, we use Collibra at the moment. On top of that, we have our custom search portal, which makes provides a better user experience for the people to make that happen.
Sanjeev Mohan (09:30):
And how do you do orchestration automation? All of those?
Omar Khawaja (09:34):
It’s a combination of things. Primary primary orchestration is within Snowflake itself, because right now people can write any type of literally code from Python to Java, you know Scala, for example, S Q L statement themselves. We are also using D B T, not the D B T cloud, but the libraries itself to do that. So that also one of the key components of our transformation process. In some cases, it’s even talent where it works.
Sanjeev Mohan (10:01):
Yeah. Very nice. Jill, I’m gonna ask you data mesh as Omar said, you know, it started with four principles in your data mesh journey. How close are you to those four principles? Did you deviate and make some changes that you felt are more practical?
Jill Maffeo (10:22):
When we were thinking about picking up data mesh, we had just started like our first C D O at Vista, and they were bringing a lot of the data mesh principles on board. We’d had a central BI team previously. And, and we were looking to figure out, you know, to Omar’s point about data domains, like where do we need to like, organize to make sure that a lot of the products that the BI team had been looking after were re-homed also think a little bit differently about some of the work that the BI team had been doing. So when we think about some of the, some of the principles, like we’ve transitioned a little bit more into like a federated government and space. So we do have a group of folks who are trying to help us with architecture questions and also like frameworks across the data domains.
(11:10):
We also, you know, when we’re starting and you’re thinking about data as a product, I think, you know, you have that question of, okay, like, data is a product. So a data product is a table, a data product is a view. And, and so that’s a little bit more tangible. But then you start, once you develop these data domains and you develop these data teams, you start to also push a little bit further on that definition, and you get to maybe like dashboards and you get to maybe an api and, and then you start to think, okay, a little bit more broadly, like data product being something that can assist with a data-driven decision and or make financial impact. And like, there’s also maybe a group similar to like platform technology that’s enabling additional data products to do things. So, you know, we definitely had to think about data as a product in terms of like, what are the definitions around that?
(11:59):
What does that mean? What does it mean for us? And I think one of the things with the implementation that, you know, is still a bit of a work in progress for us is in the central BI team, you know, you had a little bit more when you had a use case that spanned multiple domains, right? Yeah. There was a, there was a bit more of an entry point to say like, Hey, this is a central resource that is, is combining these things. I think when we’re thinking about the data domains, interoperability plays a huge role, right? Like, we need to be able to moving, we need to be able to move towards an environment where, you know, data domains and data teams aren’t kind of rec siloing themselves, but are actually trying to work together to achieve these use cases. One of the, the one the great shifts is just the mindset change.
(12:49):
When you have data PMs, because you’re no longer thinking about just curating data sets, kind of in that bi space and answering singular questions. I think that shift that we’ve been working through is also a data PM is an advocate and a data PM is thinking about discovery in terms of who can use my data, how can they use my data? And so it’s, it’s creating more visibility into the data that we have, which requires adoption and literacy, which is a separate topic. But yeah, I think that was a, a nice piece that we found. Yeah.
Sanjeev Mohan (13:22):
Yeah, no, this, this is great. You know, we, we’ve gone straight into data products. We, we were gonna ease into that, but that’s great. One thing that that stood out for me was the reason data mesh became so popular is because there was this feeling that a centralized team is getting overburdened with centralized BI team or data engineering team. So we need to take that that knowledge, that work, and move it into the domains. As long as the do, it’s a self-contained domain. If it’s cross domain, then, you know, but I didn’t hear that from you. Do you have separate decentralized teams and a centralized BI team or, or the organization has stayed the same?
Jill Maffeo (14:07):
So we do have individual data product teams who don’t really have that central BI team anymore, I think through the iterations. Yeah. So we’ve, we’ve retired the central BI team and, and have moved a lot of that responsibility to the individual data teams. I think one of the pieces that has been a constant iteration for us which, you know, Zhamak has talked about being a really critical piece, if you’re thinking about adopting data mesh, is the alignment of data as an organization and technology as well, so that you really get into the end to end responsibility from data production to data transformation and data use. So we, we started kind of as a new organization, data a little bit onto ourselves. We were also replatform our website at the same time. So technology was very heads down in that, in that transition. We’ve consistently worked closer to like a, a closer relationship with the technology team, and now they’re under one leader. And so that’s enabled a little bit more speed and a little bit more clarity and ownership. And I think that’s always been the, the goal to get, to get further down that path of end-to-end ownership of, of data so that we don’t have the overburdening in a central team.
Sanjeev Mohan (15:20):
And Jill, what technologies do you deploy?
Jill Maffeo (15:24):
So we use a variety of technologies. I, I had come from a team that was doing a lot of data ingestion. So you, you’d mentioned a little bit, a bit about data ingestion. So we we use anything from BigQuery for our GA stuff. We use Funnel IO for some of our third party marketing data. We use Databricks for some of our generic stuff. We use AWS as well to move things back and forth. We stage a lot of our data in Snowflake. We also use Looker across the organization. So, you know, you have to think about where you’re able to, again, share and then get you know, the sum is greater than, than the whole of its parts kind of opportunities to share data across in Snowflake and Looker. Help us do that right now. We do have a catalog right now we’re working with Elation. We also hold some of our, our documentation on Confluence. I’m sure I’m missing some. But that’s that’s
Sanjeev Mohan (16:16):
Baseline. Great. I love this variety. You know, you’ve got AWS got BigQuery, you’ve got Snowflake, Looker, you know, so, so you’ve quite a rich ecosystem there. I’m gonna move on to Miguel. Miguel has a very different, so Roche’s healthcare Vista is like marketing b, B2B kind of marketing. And Miguel, you and a very interesting satellite business. Please tell us about your data mesh implementation.
Miguel Morgado (16:48):
So our data mesh is what works better for us. So we don’t have the, the data mesh implemented by the rules. We start in that rule and we have a let’s say hybrid approach at, at moment. Let’s start with the stack. So we are a snowflake powerhouse. Our data is mainly stored in, in Snowflake. How big, how big is
Sanjeev Mohan (17:16):
Yours?
Miguel Morgado (17:18):
Sorry?
Sanjeev Mohan (17:18):
How big is it? How many
Miguel Morgado (17:21):
In money or, or, or in tenants? So we have 37 tenants and we are the, the biggest telecommunication company as clients for Snowflake, for data ingestion. We ingest currently 70 billion rows of data each day. So our stack includes snowflake for database in terms of orchestration, data quality and so on, who we are. A big fans of DataOps Live. That’s starts since beginning with the team. So we love DataOps Live for a couple of reasons but mainly because of the quality. We are on the business of selling data, not only produce and analyze data. So all our data need to be quality data. So we don’t have two levels of quality. Just have one, the good one. In terms of journey we started the journey with data mesh back in February, 2021 with just two tenants at that time.
(18:31):
Most of people need to talk about talking about data measure. In thousand, 2021, at least the beginning, Omar and we had the, basically the needs for two of our main business units, the satellite data unit and the ground networks through all these sites around the world to start sharing data and collaborate. So collaboration is key. I think the main, the main value on, on data mesh is a collaboration, but then can loss of efficiency as well. We, we stop with duplication of use case. We stop with duplication of data. We as you can imagine, we, we have satellites already in space, which is kind of uniform stuff, but we have the, the ground networks around the globe with different time zones, collecting, dating services. Standardization was really important. So we have approach, we have, as I said, 37 tenants in our data, data mesh divided by domains. So think about sales, marketing, you know, satellite data, ground network, et cetera, et cetera. All the data share between them.
Sanjeev Mohan (19:50):
Each domain is a tenant physically.
Miguel Morgado (19:54):
Yes.
Sanjeev Mohan (19:55):
I see. So, so, so you started with two domains, satellite and ground, and now you’ve gone up to 37 domains.
Miguel Morgado (20:02):
37 domains. Yeah. Organization domains, if you want
Sanjeev Mohan (20:05):
Organization domains. Each domain has a Salesforce tenant where all their data is kept.
Miguel Morgado (20:12):
Yes. So they have their own warehouse. So we have, as you can imagine, loss of warehouse. We we implement what they call at time modern data architecture. I don’t know if it’s modern now, it thousand 23. But, so we have the separation of ingestion, creation calculation and consumption, all with different warehouse. We have, you know, warehouse for using consumption. And then we can go to, you know, really deep on warehouse just for machine learning warehouse for that, that, that we have just warehouse specific for dashboards, for example. In terms of dashboards, we are, again, we are trying to standardize everything at beginning. We start with Power bi, Tableau then was a little limited for us. And we moved, we, we did the Power BI and Tableau, we moved to Grafana, and currently we are investing muscle in Snowflake, extremely apps.
(21:18):
And we have more than 1000 dashboards in Grafana. Oh, wow. Now, the thing is, none of dashboards actually then by my team, my core team. So this is all a self-service service that’s, you know, each, we have, like, again, in terms of data mesh, we are not pure. We have a couple of domains that act as a po. They are their own architects. They have their own data scientists, data, data engineers, data owners, subject matter experts, data stewards, and so on. So all run basically related from all the, all the tenants, despite they need to follow the standards. We have governance in place that need to follow, but then we have another approach for small tenants where our core team do some of management. So it’s kind of ing I approach for the data mesh.
Sanjeev Mohan (22:18):
I, I love, I love this approach because when you have a clear cut domain and it’s big enough, then you go through this whole decentralized as an owner, they’re responsible for quality data, products, all that. But when you have a new fledgling or a small unit, then you have a centralized team that can bring them on the journey.
Miguel Morgado (22:40):
Yeah. So you guys already mentioned data mesh is not about applications, it’s about people. The fact, yeah. Is this trust between different teams and they can share the data and then learn, you know, share the knowledge. So I look at data measure as like I said to you a couple of times, like a library where have knowledge, you know, divided by teams and people, you know, build from that knowledge.
Sanjeev Mohan (23:06):
That is great. Miguel, I want ask, stay with you for a second because I’m curious now that, you know, we’ve, you’ve explained so clearly your organization is set up and and some of the technology pieces, but I wanna go d dive deeper into data products. How many data products do you have? What is a data product physically? What’s a manifestation of a data product? You mentioned thousand Grafana dashboard, each a product or combination of these.
Miguel Morgado (23:39):
We have lot of products. I, I, I dunno from top of my head, you know, we have lots of products. So we are in business of monetize our, our data. So we look at the products at at the organization domain first, let’s imagine sat satellite data. And inside the satellite data, of course, we have pipelines and warehouse dedicated to certain models because we want to know exactly how much cost each model to run. And of course we have different security levels depend on data products. Some are internal, some are external external. Oh,
Sanjeev Mohan (24:17):
What, what is a data product?
Miguel Morgado (24:20):
Sorry, how does
Sanjeev Mohan (24:20):
It look?
Miguel Morgado (24:22):
How they look? So sometimes look like api sometimes look like a, a data share and sometimes look like a web application. So you, we pick up is definition of data processes is well at least in my limit capacity to, to talk about data products. You know, any product that has data is, is data. And you create some value, you know, that makes life simple for people. And you can sell or you can, other people can get value from it. For me, a data product. So, okay, so I’m not if, if I ever, yeah,
Sanjeev Mohan (25:01):
I’m gonna, I’m gonna move to, to Omar real quick because Omar has some clear cut. By the way, one thing that I want the viewers to, to take away from what Miguel said was, Miguel’s use case is very unique because the data products I are being monetized. And so that’s, that’s very interesting. Not everybody creates data products for monetization, but, but one web does. So Omar, I’m gonna ask you the same question. What is a data product? How many do you have it? And, and then Jill, we’d move to you. So, yeah,
Omar Khawaja (25:38):
Yeah. So I think this is one of the first steps that confuse people that what is data product? So I think that even the definition in the book, for example, is very comprehensive, but not simple to understand. Let’s put it this way. It’s when you start talking about quanta of this, and that doesn’t get through the normal population like me. So I had my own struggles, but I do want to call out Sanjay. The, your recent blog few months ago on this topic was absolutely brilliant. It’s simple, easy to understand. And I wish we knew that two year, two and a half years ago, right? It’s always good to understand things in retrospective. For, for us, we were, we also defined the same thing. What is our data product definition? Key, there was some key aspects, but to simplify things, it’s about data and the technology together combined in one wrapper.
(26:41):
And it’s managed like a product. This is the softer part of the things or the people and the process part of things. Unlike our traditional projects that there is a delivery team, and then you hand it over to an operational team, those things are not there. Plus we also went double clicked into the data product itself. That, okay, fine, it’s data. It can be different type of data. It can be structured, semi-structured. It can be completely unstructured like images. It can include your, one of the output ports as the is used in data mesh, which can be your visualization. It can be just an a p i, it can just be an O D B C port, and that’s it. Which means that the data product team is not responsible for visualization or they’re not provisioning any of that. There is a team of analysts that will use it, or data scientists who will just consume the data in different shapes and formats.
(27:39):
So we have all of those varieties with us. And this is this is a discussion that we should have, that we have these varieties like we have in normal, like outside the data world, there is a product and the, those have varieties and variations. And whether it’s a toothpaste or whether it’s a French fries or whether it is ketchup up or a car, you have those variants. Not all of them have the same things. And it’s okay <laugh> to have that. And depending on the organization needs depending on the domain needs, you can have that there can be certain things, right? So for example, when we started this as there is a, there is a inclination that, oh, everything is a data product. So every single thing became a data product with, okay, this dashboard is a data product, for example, or this table is a data product.
(28:36):
It may be, and that’s where you go a little bit into the details, whether it is those seven characteristics of the data product, whether you are making it easy to find, or is it valuable on its own, for example. And if you are treating that as a product, please what value it is delivering, for example right what who’s using your data product? How many users have been, and do you know your personas who use your product, for example? And that’s where the product thinking comes in. And some of these hard facts, when you try to implement them and you go in execution, that’s where you sought out what is a product and what is not a data product. So it’s we can and, and it’s a, I would say it’s an important thing that one should call out that maybe this is one of those governance bodies activities that they should help define what is really the data product. Otherwise, we are just into the business of renaming slides exercise, and there will be value delivered with this different initiative because we have not essentially changed any behaviors. So we need to really change things to make things different.
Sanjeev Mohan (29:52):
See, this is my, my big concern is that if everything that has value is a data product, then everything could be a data product. In fact, just right now you’ve expanded my own definition by adding an O DPC port. And so I’m like, wait, that’s a data product. Okay, so, so we’ll come back to, you know,
Omar Khawaja (30:10):
It’s an output port, but not a data product on its own. But structured data, I, I know people have love and hate relationship with sql. But I, I’m, I’m a big fan I was listening to a blog and it was like, Hey, SQL can be cute. And I said, yes, it is, right? It’s, it’s easy to understand what’s wrong with it. So it is, it is a query language for structured data. So let’s use it and let’s expose the data so that people can access it.
Sanjeev Mohan (30:43):
It could even be a GraphQL, a p I on a table. You know, there’s no sequel in this case. But you know, actually my friend Guy Adams who has data, data off start, like, he likes to say that it’s less about what is a data product. It’s more about something that you harped on quite a bit, Omar, how it is built. And that product management capability is what distinguishes a data product from a non-data product, although physically gel. What is, what do you have to say? What, what’s a data product in your organization?
Jill Maffeo (31:21):
And I, I hear components across Miguel and Omar’s answers as well in terms of, you know, the, the pieces that you’ve found end up being data products along the way. And I guess I can give, I can give some concrete examples for us and, and the impact that these data products have across a value chain, right? Because we talk about value and we talk about the different roles of data products here at Vista, we think about kind of that like platform or, or enablement sector of products. We talk about financial products. The ones that are actually unlike Miguel, we’re not, we’re not monetizing them to the external market per se, but we’re able to tie them much more directly to financial impact. And then we also have products that help us to make decisions, right? They may not be as close to that financial impact, but they’re, you know, they’re still very valuable for us in terms of strategic purposes.
(32:09):
So in my former domain, and, and I love also hearing that, especially in Miguel’s space, and I’d love to hear, you know, Omar’s opinions on this at, at a different point. But our domains have, have grown and have, have moved with the business a little bit. So it’s, it’s nice to hear that other teams are experiencing this kind of movement and growth within domains. But I was in an omnichannel domain and we were working on optimization measurement, et cetera. So my team, in our portfolio, we did have some enablement products. So for instance, we were pulling in a lot of the third party marketing data. So a lot of our ingestion pipelines and the output of those ingestion pipelines, specific tables that were used by other teams to develop data products were a data product. So like that portfolio of ingestion pipelines was a data product. We also had, oh, please,
Sanjeev Mohan (33:02):
Ingestion pipeline is a data product.
Jill Maffeo (33:04):
So we were thinking about that along the lines of a data product, cuz we did have kind of consumers coming to us and saying, we need this new conversion event added to, you know, this particular display vendor data stream, right? So we would need to make sure that we were keeping our downstream consumers happy. We needed to make sure that our data quality was upkept. That these were kind of visible ways to engage with, with our other teams. And then we would also have an actual taxonomy. So we had a taxonomy that helped us describe marketing placements. So, you know, email 1 23 was for this language, this locale for this purpose you know, went on on this date. And so a taxonomy from an omnichannel perspective that pulled together how we’re tagging and tracking the metadata around the marketing placements that are live in market.
(33:56):
So like that taxonomy and the resulting, you know, table that helped all of the reporting teams and the modeling teams understand and have a level, level playing field of like what was in market, what was the purpose of those things. So like that taxonomy and those mapping tables were also specifically products. Then you have, and those were on my team, then you have another team who’s using that information to model out bidding scenarios like for paid search, right? Or for display and trying to automate a lot of the billing, the, the bidding decisions that we’re making. And so that team had a product around intelligent return on ad spend, right? And then you have another team that had a product that was doing attribution, and that attribution team was also pulling in information from a lot of the placement mapping we were doing was pulling in information from the, the channel data that we, that we were consolidating and also that the bidding team was outputting. So you kind of see this, this value chain of like, okay, so the, the pipelines are being used by other teams that are being consumed to help us strategically, but also help with financial inputs. So that’s just like a, a little bit of a value chain of data products from my previous team.
Sanjeev Mohan (35:06):
So is, is it I, I’m really curious, how are these other teams, like attribution team discovering the work that another team has done? Is there some catalog log of data products? Are these like derived data products? Like you have built one data product and then attribution team is able to discover it and then use that to create their data product? Is, is that a pipeline kind of a thing going on?
Jill Maffeo (35:34):
So we were, we were lucky in the way that our particular domain was organized in the sense that like, PMs that were building towards these goals of optimization and measurement were aligned under, under the same leader. So we were able to kind of march forward in the same direction. But in terms of discovery, like we do have confluence pages, we do have elation, but a lot of it also came down to, you know, the PM’s responsibility to drive awareness of your data and the connectivity to other data and other domains. So I constantly was in communication with a domain that was more about promotionality and pricing, because obviously there’s a connected tissue point there where yes, an email went out the door, but this email went out the door with what coupon, with what promotion. And so if we’re not able to integrate these data sets as well and have those meaningful discussions, then again, the, the mesh is, is kind of failing in a sense, cuz we’re, we’re isolating into like, you own all the promotion data and we own all the marketing data, but like never the twine shall meet. So you have to have those, you know, kind of conversations ongoing to make sure that there is interoperability across domains.
Sanjeev Mohan (36:42):
Very nice. So I’m gonna go to Miguel because I have two, two burning questions and, and we having, we are tackling very tricky ground, if I may, we’ve already spent over 30 minutes just talking about what is a data product, and I suspect we could spend the whole hour just on that. But I wanna shift to two important questions. What are the benefits, tangible benefits that you get out of data products that you did not have before? So I’m gonna go through all of you and then we’ll come to how did you build it? But let’s start, Miguel, what is, you mentioned reduction of data silos, but from a business point of view, what did you get out of data products that you did not have before?
Miguel Morgado (37:27):
Yeah, so I, I could mention the efficiencies and cost reduction, you know, clarity between teams, but the main, the main win of you implement a data mesh for me is collaboration. Collaboration is key. So a data product is a product. So think about the car we have, the tires is a product. We have the windows is a product. We have the, the doors is a is all these are products, but then all these teams, you know, creating products, work together, share data between them, and they build a bigger thing, which is a car. We have you know, in our 37 tenants, in our data mesh, we have teams that produce certain data sets as a, as, as, as, as a data product that’s as value. It, its own can be discovered. We use data.world as our data catalog way, but then other teams from other tenants because they can discover this data, they use this data and create something bigger, you know, bigger and then bigger. And then at some point it, it go back to, to the, to, so the data producers become data consumers so data producers, consumers, and then consumers again. So is collaboration between teams. Again, the data mesh is a shift on the, on the staff mindset and the way you work more is a cultural change shift. So it’s
Sanjeev Mohan (39:02):
Interesting. Yeah. So what, what you’re saying is that a data product introduces using the car analogy, the factory model, where if you’ve got a lot of these data products and you know, you can discover and collaborate, then you can build a car within minutes rather than months because you have standardized components which are also products.
Miguel Morgado (39:25):
Yes, correct.
Sanjeev Mohan (39:27):
Omar, what, what would you say are, are the, the biggest benefits that you have seen at Roche?
Omar Khawaja (39:35):
I there are many similar to building upon actually what Miguel said, right? So one way of looking at us, what was happening when we did not have the data products? Yeah. So in that context, the traditional approaches have been a very siloed single use case solution and no reusability whatsoever. Very simple example, people wants to just want their dashboard after dashboard after dashboard. Everybody goes to their source data or whichever data source <laugh> is available to the poor guy who’s making the dashboard. Yeah. And that’s it. The job is finished. Because at the end of the day, once you have those insights, nobody cares because the team who’s behind the scene is really going through that painful process. And this team size is no longer in it alone in that central BI department or data and analytics team. This whole digitalization, digitization and the awareness of the data across not just it, but all the business roles means that people are getting more and more data savvy, and hence the pain is spreading when there are data scientists and analysts and analytical engineer and all of those roles and different titles would like to have access to the data they need so that they can make informed decisions and take action to it.
(41:05):
Yeah. Without these data products and ownership specifically sitting in these data domains, everybody who’s trying to help was unfortunately becoming a bottleneck. And all of a sudden, once you take this approach of, okay, we will decentralized this creation of data products and the domains, yet staying connected to the hub. So it’s kind of a hub and spoke model that we have. Mm-Hmm. We have empowerment, we have ownership, we do not have bottlenecks since we have automated number of things now, not on the day one, but after a one and a half years of our continuous journey, I would say even longer. Now we know that how many domains are working on what data product. So there is that visibility that you were asking. Also earlier, N Jill was also explaining. So as part of our data ops pipeline, you can’t push things in production unless your metadata of the predefined template of the data product is pushed into the catalog and nobody’s writing that integration code every time.
(42:08):
It’s a matter of entering the fields in a configuration file. That’s where the automation comes in. That’s where the simplification comes in. That’s where that platform as a product concept. The third principle of data mesh comes in. I, I do want to maybe do a shameless plug that I, I personally think that there is a lot of things happening in many companies, but we are still missing the whole idea of how do you create these data products, frankly speaking, because we talk about publishing these things when things are already developed and done, done and dusted.
Sanjeev Mohan (42:49):
So, Omar, yeah, Omar, hold, hold that thought because that is the most critical question I wanna come to, but I wanna ask Jill if she agrees with the benefits that Miguel and Omar have mentioned. I’m sold by the way. Very both of, you’re very convincing. So, but I wanna see what Jill has to say. And is there anything we are missing?
Jill Maffeo (43:11):
Yeah. Having, having come up as an analyst in a world where there was a central BI team, like there’s absolutely things that, that I miss about that in, in a sense, right? In the sense that like there’s, you know, these guardians of quality and somebody I can leave a use case with, and then they work with the upstream teams to try to accomplish those things. But the change to data product management, I think it necessitates a shift in responsibility and accountability. And so I, I absolutely, you know, see what Omar is talking about here too, in the sense that the people who know the most about the data need to be much closer to its use cases. And that that means that, you know, you are able to see the forest through the trees sometimes on a lot of these use cases that come in, you’re say this is similar in enough to this, where without, without inviting, you know, a certain amount of scope creep, we can actually, you know, deliver on something that will please multiple people at the same time and maybe have a, a greater calling or a greater use case.
(44:13):
And so I’m not gonna say that the transition away from central BI is easy. It does not. And also finding out the balance of the hub and spoke and the governance in terms of like what accelerates the teams and what slows them down is also hard. Like, you know, we wanna give teams a lot of freedom to be delivering, you know, kind of in a way that works for them, but then sometimes you get such a diffusion of technologies or approaches that that interoperability again, or the, the collaboration that Miguel spoke to can, can actually become much more difficult. And so finding that balance of, you know, these are the systems that we wanna work in, work in that framework, how you want to is definitely an an ongoing investment. And something that, you know, we’re continuously iterating on. But I I, I definitely am sold on the benefits as well. Just it took And
Sanjeev Mohan (45:08):
How do you measure, how do you measure is your, i, your IT leadership or business management saying that since we got rid of the centralized BI team, which was, you know, I’m so glad you mentioned that, you know, it was hard, but here are a tangible benefit that we can measure. We’ve got sales went up, or we are producing, turning around products marketing literature for businesses faster. Like is there some R O I KPIs?
Jill Maffeo (45:41):
We are definitely at the point where we are trying to measure individual products, right? To, in terms of like their uniqueness, their usefulness, their impact to the bottom line. I’d mentioned before, you know, a little bit about adoption. We’re starting to, to tack into to tap into how many people are, are using this, et cetera. I think where we, where I think every company probably, although I don’t wanna speak for everyone, I know that there’s some folks that have really launched some really nice data literacy programs. But I think what I, what I’d like to see also is, and I, yeah, I, I see talking about easy data product creation and use coming from Scott, I think what I’d like to see is a metric that opens it up to the business and say, and says, like, how, how is data products or the implementation of the way that we’re working with data right now made it easier for you to understand the metrics you need to know, not only like understand those metrics, but know which ones you’re using. Like, just can we get, can we get to a point where we’re actually, you know, increasing that literacy and increasing that level of comfort? And I think to Omar’s point around, like we, we tend to come out with these, you know, produced data products you know, how do we get people, you know, really engage with this, you know, outside of maybe the data organizations. And we’re again, tiptoeing into adoption metrics, but I’d love to get to, to literacy metrics
Sanjeev Mohan (47:09):
Literacy. So that’s important for you. Okay. So now we are gonna go to, Omar is itching and we are all keen to hear how do you build a data product?
Omar Khawaja (47:20):
Yeah, it’s it’s, it’s very simple, right? You put that in a box and it’s a new product. No, I’m, I’m, I’m joking, of course. It on all sincerity, I think it requires a lot of passion and ownership to create a data product like other products as well. Mm-Hmm. Similar to other products, it’s all about that ownership. The softer part, which have been completely missing. I’ve been on both sides of this. So I’ve been on the sites where I was once upon a time, am I yes manager, and I would say, Hey, where is the data? Just get out of my way. I want to create a report 20 years ago. And now on the other side, I’m that thinking that what is the outcome I want to achieve within that domain for the end users, which will result in that benefit, which in one way or another will impact your top line or the bottom line or some other business metrics that you have.
(48:18):
And it can be in any function, for example, which means that the journey of creation of data products starts with the outcome and the end users in mind. What am I trying to do? Completely non-technical topic. Nothing to do with technology, nothing to do with even data. What is the problem I’m solving? Is this a, is is there a problem that, for example, I don’t have visibility in my team on how much my team is selling If I’m a commercial organization, or I have a very big digital program, is the visibility, I don’t know how the omnichannel engagement is, or investment is, what is the return on that investment? Or it can be as complex as that I need some kind of image monitoring program so that I can optimize my production line. Or it is something to do with monitoring a very specific research data, which will result in faster insight for it for decision making.
(49:19):
It can be so many different examples, right? So what, so what you’re saying is step one that makes the data product different is you always start with business outcome, not with data and technology. Oh, and exactly. And once you know, who are you solving for, like a typical product discovery approach, yeah. You will have some kind of hypothesis that maybe option A, option B and option C might be the proper options because you don’t know what your data product looks like at that time. Yeah. So before you invest in, you have those domain experts in the shape of business, and it, in this context coming together, they are deciding what are we going to go after? And only that following that discovery process apply to these data products. Then you have a small team put in place that starts their technical journey. And step two, yes.
(50:14):
Well, many steps in between, but on a higher level, yes. Right? And then you talk about all, all right, it’s, it’s like a cycle. I want to do this, I want to have this outcome. Take us user-centric approach, apply the persona, look at what kind of insights it might be. It might be a dashboard, it might be an alert, it might be an email that is generated out of your data product. It could be all of these things. It might be an integration into, okay, your insights will just generate a score that will appear on a customer profile page and your c crm. It might be as simple as that. It, it might be, I’m not saying it is the only thing. And then you look at what systems and data entities are in play over here. And to your point, if you are starting a journey, there is no simple, there is no place where you can find other data products that you can reuse.
(51:05):
But once you are matured, you should be looking at some kind of your marketplace. Hey, some is somebody else already created those data entities door dataset, then I can simply plug in reuse to generate the outcome I need. Or do I have to really look at deep into my domain or some other domains to get that data, product, data set type of things created. It’s a, it’s a cycle. It’s a cycle, yeah. Which, but one last thing, Sanjay, I think we, we missed the design aspect of it in our practices before. Completely. We don’t have it, it’s nonexisting right now, and not, not in the company itself, but in in general industry. You look at any offering, technical offering of this tool, data ingestion. Where, where am I designing this is it on a piece of paper? And imagine the time spent to reach this point of development. So in three months or six months, depending on the speed of your organization, how many domains and how many teams are working on how many problems, nobody knows. So I think a call out for all the community out there we, we, we have done something internally. We lovingly call it data Mesh Design Studio, which starts the journey from step zero, not based on any technology, but really based on design principles that what are we going to solve it for and who are we going to solve it for? So
Jill Maffeo (52:41):
I think, I think that speaks to both of the questions that we’ve had a little bit too in terms of like, how do you avoid low value data products and how do you figure out, like with the scope of a data product, right? To, to the question about like, is it global, is it local? I agree with Omar. I think a lot of it comes down to a discovery phase, right? I think we’ve briefly talked about experimentation in a conversation I had with Scott. And so he brings up the point there too is like, you know, are there ways that we can be experimenting in terms of like how different individual products need to be? How different should they be? And that goes to the question around, you know, a global organization spanning multiple countries. Do you have a data product for each country?
(53:23):
I think, you know, that’s when you have to have those trade off conversations in a discovery phase that says, you know, is having something for this one country actually really low value when actually having it for the global you know, organization increases its value? Cuz you have so many more people who have access to it. And you know, to the point about, again, combining some of those use cases where you can reducing some of the tech debt where you have multiple versions of the truth out there are the things to consider in this discovery phase to make sure that you’re aiming for something that’s high value.
Sanjeev Mohan (53:55):
Yeah. That, that, that’s so, so valuable to hear your thoughts, Jill. In fact, we were just discussing about what the very question Scott Heliman has asked. So Scott’s question is, what guard dreads or breaks to prevent products that are low value or don’t have a reasonable use case? We were talking before this call, and Miguel, you were talking about this concept of magic quadrant that you create for data products. Can you educate us on how do you determine low value or high value products?
Miguel Morgado (54:27):
You know, in, in my team always look at this from the, you know, from financial point of view. So the data product is a product, and each product costs money. So products are not free. So we don’t develop all products. Our engineers ask us to develop that is first, and then we monitor the, the usage and the value. So it is all about the value. So we have imagine the, the, you know, the, the Gartner quadrant. So we have in in <inaudible> the, the value of, of a product or data product. And then you have the cost. So if it is high cost, high value, you know, stop, right? I’m up with it, you know, if it is low, is, is, sorry, is high cost and low value, then we have a problem. We are very pragmatic in retiring data products.
(55:29):
We already have the experience in past where data products was really valuable, but was just during a certain period of time. So let’s remind everyone one wave, you know, we are, we start back in thousand 20 and start putting our constellation up of satellites and build the ground network and went through stage of, you know, build the network. So certain products was needed at a certain stage in the company maturity if you want, which later on was not needed. So we, we are keeping track of all these usage and value for each product. And we retired the, the, the province, you know, it is if, if again, the knowledge of a car, someone developed a, a, a tire with a a, a square shape, probably, you know, it was a good idea because someone wants a tire as a tire, as a data product, but then it’s no really value or no one is using or, or the, the, the value is, you know, decreasing over time because people start using on, on same kind of, or in opposite direction. We have lots of experimentation that run data products. We have data products that don’t start as a data product start because someone in a certain area, you know, acts something as a data product and become really successful that catch our attention because we have lots of users and then we move it to production as a data, a proper data, data, data, pro product. So two ends of of things, but we like the, the analogy of, you know, high value high cost or low cost. Low value.
Sanjeev Mohan (57:19):
Yeah, that is great. So Omar, I wanna ask you, what is your strategy to retire data products? In my opinion, it, you’ve talked quite a bit about the whole life cycle. So you build, you test, you deploy, and then as per Miguel, you measure the usage and at some point, if the value falls behind, do you retire data products?
Omar Khawaja (57:42):
Full transparency so far I have not come across a case that a state has reached where we have retired a data product yet, having said that we are monitoring the usage, for example, very, very closely. We know share requests, we know the pipelines, et cetera. We have also started to issue alerts if there is a stale data set setting, which nobody has accessed, because you might have certain data which do not change over time, but it is re reused many times during the day or a month or a year for example. And in such cases you might not get any new records, but it is reused all the time. Like if you have some reference data example, they, they’re not changing very, very rapidly. Yeah. So, but they are accessed so many times. So if there is a team who’s managing that reference data and making it available for everybody to reuse, they, they might want to know if this is still being used. So hence those usage metrics are quite a good criteria to assess whether something is being used or not. But we have not done that, so I cannot share any practical tips further to that. In the old days of server, I used to just shut down the server and see what is the, as I call it, the screen test, how many people will come running to you that, oh, sun Server is down.
Sanjeev Mohan (59:06):
That is great. Not
Omar Khawaja (59:07):
The case.
Sanjeev Mohan (59:08):
Thank you so much. I there’s so much more I, I wanna talk to you about. Maybe we’ll do a SQL part two, return of data products soon. But I wanna thank thank you all all of you for joining this. For the listeners, please follow them on LinkedIn if you don’t, because you’ll get a wealth of information. They’re thought leaders in this space. I learn a lot every day. Thank you so much for joining this, and hope to see you soon. Take care. Take care.