A summary of our Data Mesh Learning community roundtable discussion on August 10
This month, the Data Mesh Learning community hosted a roundtable discussion with data mesh practitioners to discuss decoding data quantum.
Some of the questions posed during the discussion included:
- Should AI models reside within the data product or the data quantum itself?
- What is the definition of a data product?
- How can we ensure we’re speaking the same vocabulary when embarking on a project?
- Is data quantum just jargon or is it a key part of the data mesh puzzle?
Facilitators:
- Jean-Georges Perrin, Senior Data, AI, and Software Consultant / Co-founder
- Senior Data, President and Co-founder AIDA User group
- Scott Hirleman, Founder and CEO of Data Mesh Understanding
Participants included:
- Yuliia Tkachova, Co-founder and CEO of Masthead Data
- Tom De Wolf, Senior Architect and Innovation Lead at ACA Group, Host of Data Mesh Belgium Meetup
- Michael Toland, Senior Product Management Consultant & Coach, Pathfinder Product Labs
- Paul Cavacas, Senior Manager. Data Platforms at Ocean Spray Cranberries
- Samia Rahman, Director of Data and AI at Seagen
- Andrey Goloborodko, Data Engineer at Wrike
- Daniel Twomey, Collaborative Project Manager
- Austin Kronz, Director of Data Strategy at Atlan
Watch the Replay
Read the Transcript
Download the PDF or scroll to the bottom of this post
Additional Discussion Resources:
- The Next Generation of Data Platforms is the Data Mesh
- Data as a Product vs Data Products. What are the differences?
Ways to Participate
Check out our Meetup page to catch an upcoming event.
Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community resources
- Engage with us on Slack
- Organize a local meetup
- Attend an upcoming event
- Join an end-user roundtable
- Help us showcase data mesh end-user journeys
- Sign up for our newsletter
- Become a community sponsor
You can follow the conversation on LinkedIn and Twitter.
Transcript
Jean-Georges Perrin (00:07):
Ooh, plenty of people.
Scott Hirleman (00:09):
<laugh>.
Jean-Georges Perrin (00:14):
Hey, Tom <laugh>. Hi. Good evening.
Scott Hirleman (00:18):
Hey, Michael. Yes, Paul. Hey, Scott <laugh>. Hello, Tom. So and we’ve got Yulia as well.
Jean-Georges Perrin (00:27):
Yeah. Hey, me, we got the
Scott Hirleman (00:29):
Whole gang here. Yeah. so so I think getting started on this topic I, I was actually just talking with somebody today over LinkedIn about what is a data product? What is, what is a data quantum and how frustrating that that has been as a topic for a lot of people because it just, it devolves very, very quickly. So, j g p, you, you and I did an episode specifically on this of, of data mesh radio. You wrote a very, very well published or well shared and liked article on Medium about this. So I’d love to hear kind of from you what it is and maybe what it isn’t. And, you know, how do you think about that relative to Zhamak Dehghani’s view of it, of, you know, it’s the, the, the data and the code required to run that data, including the metadata and the blah, blah, blah. And that becomes in the input port and the output port and how that just starts to kind of devolve around how confusing it is for folks. So I’d love to hear kind of how you’re, your thinking about it, and then we could kind of talk about what other people are, are feeling and seeing out there.
Jean-Georges Perrin (01:48):
Okay. I was hoping to learn something today, and that’s <laugh>, but okay. So I don’t, I don’t, I don’t wanna sound arrogant like that, but no, the thing is, for me, it’s been, it’s been a struggle as well, okay, what’s the data product? What’s, what’s, what’s a data quantum? And then when you I, I, so basically what I end up doing is that when I’m speaking to business, it’s a data product. When I’m, when I’m talking to implementation, it’s a data quantum. And, and that’s that’s how I kind of decided to, to, to, to break them apart. Okay. So they’re kind of the same thing, but one is the implementation of the author.
Scott Hirleman (02:37):
Sorry, do you have a specific definition of what is a data product? Because this has been the, the thing, you know, when I first started digging into data as a product, data products, somebody told me a data product is just a product that is heavily backed by data. And so that’d be like an ML thing or anything like that, versus like, because like, and when you are talking to these different people, what is it, right? Like when, when the view of a data product for somebody on the business side, it’s just like, this is some information that you have easy access to, to leverage for what you need to do versus the architect side. So like, do you have phrasing or anything that you’ve kind of showed people around that?
Jean-Georges Perrin (03:20):
I, I also have a, a very simplistic definition of a data product. For me, a data product is a dataset with a data contract, but it can be multiple data sets with multiple data contracts. But the thing is, at the core of it, a data product is a dataset. That’s, but that’s the thing is it kind of is the, the technical part, right? Because when you’re talking to someone and say, Hey, it’s a dataset with a data contractor, really, is that what you’re trying to sell me? Then it does all these characteristics like, like what Zhamak described in the book, like the, you know, it’s DATSIS acronym. But, but the thing is, for me, it’s still, yeah, it’s still at the very base dataset with, with the data contract.
(04:06):
And the thing is, why, why, why am trying to minimize it? It’s because when you’re talking to people and say, oh, especially in the data engineering field, and you’re going to data engineers and you say, Hey, we’re going to build data products, the first thing they do is freak out. Okay? So when you say, no, don’t freak out. It’s the same thing you’ve been doing. We’re just going to add a data contract. Right? Okay. So, so, so I know it’s, it’s oversimplifying and there’s a lot of things that comes after, but if you start freaking them out, then you lose them. That, at least that’s my take on it,
Scott Hirleman (04:41):
That that’s kind of what I’ve heard too. Michael, I know you’re, you’ve been working in a, in a large organization across a whole bunch of different things, so I’d love to hear kind of how you’re thinking about it. But J G P I think that is something that has resonated a lot of people saying it’s really easy to confuse people. So maybe try not to, try not to get too technical and too esoteric in the definition <laugh>.
Michael Toland (05:07):
Yeah. I think I’ve had better success in,
Jean-Georges Perrin (05:18):
Did we just lose Michael.
Scott Hirleman (05:19):
Yeah, I lost Michael as well. <laugh>
Michael Toland (05:23):
His first data product. I think of a data product as really any, any specific amount of data that provides real contextual value to the business and serves an end customer focus in the way that any other product would that seems to gravitate, but it’s still sort of myopic and intangible. but that’s, that’s sort of what I’ve been focusing on, which some people are like, well, how is that not just data as a product? I’m like, that’s a great question I need my own answer to.
Scott Hirleman (05:59):
Yeah, I think that’s, that. I think we get too specific and trying to dig too deep into this, you know, to, Tom, you’ve been working with a lot of clients as well. Like, when you’re trying to talk to, are you, are you giving different stakeholders different, like actual definitions or like, how, how are you going about that and and what are you finding that resonates with people?
Tom De Wolf (06:21):
Well, I do like the, the distinguishing the, the difference that Jean-George is talking about of the data product, when you talk to non-technical people and more about the principle of data as a product that you have to have a contract and those things. and the quantum is, is indeed really how do you implement such a thing? How do you bring that data product or quantum to life, have its own lifecycle and, and really technically integrate it with the different data tools that you want to use. and yeah, not, not many of those tools have the data quantum or data product quantum baked in yet, maybe. so what we see is that you do need some kind of or, or layer of platform engineering to really bring that quantum to life and, and make it have its lifecycle automate things provision things in data warehouse tech or other, other places that you want to go into. and that way have something in your architecture that really encapsulates that code metadata data and all those, those characteristics.
Scott Hirleman (07:39):
Are you finding that the tooling and not like, easily allows you to encapsulate those in one space? ’cause what, what I’ve kind of heard is, is people going, oh, I have to go and populate the catalog and I have to do this. Or do we have to build the thing that that makes it so that they can just do the code? And, and also I’d love to understand formatting, right? Like when, when I think of code, I don’t think of good formatting. So <laugh> like metadata, just looking like it’s just one giant block of text coming into the, the catalog is not gonna be that great. What, what do you mean by formatting? Just before? So, well, like, when I’m thinking about just typing in code, like what does it look like in the actual data catalog? So, you know, somebody who’s really technical and making an API call, there’s you know, they, they’re gonna be used to just kind of the doss type screen of just getting text. But when you have a business user, you have somebody that’s less technical like how are you thinking about that?
Tom De Wolf (08:43):
Well, it depends on who is developing the data product and, and if it’s a data product developer, which is an engineer knows how to code, yeah, then that can be just an i d e, like any other development environment. I, I, when I talked about the platform and layer, it’s really an orchestration layer layer. Sorry,
Scott Hirleman (09:07):
You’ve just got some echo going and I don’t know why <laugh>, ah,
Tom De Wolf (09:10):
Okay. <laugh>. the, the platform layer is more the, the orchestration layer that brings that data quantum to life. But it doesn’t mean for me that everything has to be in one tool in one place. So it can still yeah, go into the catalog, go into the data warehouse or another type of tech that you want to use. And in those places, reserve a kind of a, a segment for that data product or a segment for that output port of that data product. So that’s how we approach it.
Scott Hirleman (09:45):
And, and by the way, for I mean, thank you, Tom, but by the way, for anybody who hasn’t been on before who does wanna chat, just feel free to, to you know, un you know, show up your camera and like raise your hand or anything like that. But Yulia, I, I, yeah, I was actually, I just wanted to talk to you about this because as a vendor, it’s, it’s something that just keeps the, these terms just keep getting circled around and around and around. So I’d love to hear how customers are actually, or prospects and things are thinking about this, and are they as tied up as everyone else seems to be around this, this topic?
Yuliia Tkachova (10:22):
Oh, yeah. Thank you so much. hi everyone. So I wanna touch based on data products because eventually it means different things to different people. And what is interesting enough, because again, as a vendor, I actually observe how data teams are using data. in fact, when it comes back to define what is data products, I would actually lean towards Mike’s definition where he actually highlights that this is data that is put in use by business users that actually generates some value. Why do I emphasize on that? Because I actually can see how many data sets data engineers are muting and don’t wanna hear about any anomalies or errors in those data sets. So that’s why my opinion then this, the defining data products any set doesn’t necessarily makes lots of sense because eventually nobody, they, like, nobody give a shit what is happening in those data sets. I’m sorry for my French, I’m so much sorry. <laugh>,
Scott Hirleman (11:40):
Are you French or something? Yeah, yeah, especially with J G P on, calling it French is the best. but, but I do think, like there was, there was an article about Mon from Monzo Bank a while back, and they had like 2,500 people and they had 4,500. Like, they were super excited about their lineage. They had 2,500 people in total, and they had 4,500 tables in production. And they were like, we have lineage for absolutely everything. And it was like, why do you have that many things? Like, this is terrifying to me. So Andrey, you’ve had your, your, your hand up for a bit, so would love to kind of get your, your view as well.
Andrey Goloborodko (12:21):
Yeah, thank you. actually we thought about what we want to encount this data product and what we don’t see as data product. And we came up with two really simple criteria. And first it’s what like Yuliia said we don’t want see as a data product something that wasn’t intent to use. So we say that if someone put intention that this set of data, this data should be used by someone else. So this can be a data product. If not, this cannot, cannot be so all unfinished work or like internals are not considered as data product. And second, which is more hard to understand. But it lead us, I believe it lead us to more clear environment. We require that data products are developed to meet specific business objective. So like eventually, all data you have from data warehouse should have some business objective. So, because otherwise no one would do it, right? but having this in definition it leads, leads our customers, it leads our data product producers to put this business objective definition in their documents, in their documentation and in the specifications of data product. So that’s it.
Scott Hirleman (13:57):
That’s interesting. ’cause I think like Roche has talked about this a little bit, but they have, like when you said what is, is this a data product or is this not, I mean, we kind of have the, do we have that lifecycle idea as well? Like, it’s not that this is just a static table, it’s that this is evolving and that that business use is evolving. and, but Roche talked about, I think they’ve got like 550 things that could be different termed as a, a data product, but only a certain number, you know, a hundred, 150 or whatever are actually qualified as full data products. Because there’s a bunch of these things that are in that kind of work in process or, Hey, this might be interesting. Can I find a consumer for it? Hey, or Hey, this is data on the inside, we’re consuming it, but we’re exposing it in case it should be data on the outside. Somebody else might wanna use it. Like, I think that’s really a helpful aspect. Michael, you you wanted to, to toss in more as well here?
Michael Toland (14:57):
Well, I think I more have a question on do people have a differentiation between data as a product versus data product? Because for me, I view data as a product, as like physical data set that feeds maybe a data product or potentially sometimes the data set itself can be the data product depending on what the value it is providing. and I don’t know if anyone has ever articulated for me a clear answer that fully I can intuit. And I’m just curious if folks here have thoughts on that.
Scott Hirleman (15:32):
I am not going to give a thought because I, I will take the rest of the time and another hour. ’cause I like, there’s a reason data mesh radio is on the data as a product podcast network. And I wanted to launch it as the data, as a product podcast and not the data product podcast, but specifically that. So Tom and JGP, JGP let’s, let’s go to you ’cause you haven’t talked for, for quite a while, so I wanna make sure we, we we get you as the, as the main host.
Jean-Georges Perrin (15:58):
No, no, it’s so sorry, Tom, but for me the same thing. Okay. Because the thing is, otherwise this is, this is going to be super complicated data product, equal data as a product. And that’s, that’s, that’s I stunned by that. Otherwise, yeah, otherwise it’s, it’s getting ugly. Tom, you your turn <laugh>.
Tom De Wolf (16:17):
Well, I don’t fully agree with that in the sense that for me, data as a product is more about the, the, the way of thinking about what you are making as valuable. And things that Andrey said that it has to be linked to use case, a business value and those things, and we talk about the data product as really the yeah, the, the, the, the implemented solution, the quantum, the architectural component that yeah, realizes that data as a product part. So it’s the principle and the way of thinking about designing it and on the other hand the architectural component, but it’s a way of, of looking at it. Yeah.
Scott Hirleman (17:00):
Well, and for me, I’m, I’m not gonna go super deep into this, but for me it’s also like, it is a cultural approach. So it’s not just relative to the single data product, it is the cultural approach as to how do you think about treating data in a productized way like you do software development, so JGP, Austin, which, whichever…Jean-
Georges Perrin (17:23):
Yeah, but so, so, so just and maybe we should go to something else because otherwise we’re going to spend the bulls session on that. But, but, but the wolf, what I’m thinking is that when you are in software, you’ve got to adopt a product thinking mindset, right? It’s not like and, and that’s, that’s the, the big major difference. So when you’re, you’re just adding data before it. So you’ve got data product and you’ve got data product thinking, and you don’t have data as a product. Okay. I, I’m, I’m, I’m, I’m fin I’m finished. I’m not saying anything anymore. Okay. On, on this one. <laugh> Austin.
Austin Kronz (18:02):
Yeah. I, I, I was disagreeing first and then I think as, as <inaudible> spoke, I, I started to agree a little bit more like data as a product is just meant to trigger that mindset that this is something that should grow and evolve and be associated with business value. The output of that, I think is either where we, internally at my company as well as at our customers, everyone has a different definition, right? Like a Snowflake table might be a data product to them because they want to make sure that table has clear ownership, that table has clear descriptions that table has clear, et cetera. However, some of them do have data product managers who are the ones actually doing the stakeholder management, the ones that are actually making sure that we are documenting that table because it’s associated with this key business use case.
(18:52):
And so it’s kind of an interesting emergence of roles with the whole data product manager thing. I, I had a post on LinkedIn like last week where we do have this interesting thing where it’s either you have data products or you don’t. And I feel like no one wants to talk about bad data products or data products that you should kill. and my argument is whether or not you read the data mesh book, and now you say the words data, product data as a product. If you’re a functioning company that exists, you have data products, right? <laugh> you may not acknowledge them. They may not be the best, they may not be optimal. There may be a lot of clutter. Like you, you know, you were saying, why do you have 4,000 tables? Well, we’re at a unique position at our company to see customers with hundreds of thousands of data assets.
(19:43):
And it’s like, like you said, why? It’s like, well, that was a project six months ago. That person started it, they left. The ones that are actually utilized are, are very small. And so, like getting to that number, I think is part of being a good product manager of data is basically saying it’s one thing to put a lot of effort into these data products that are driving value, but to an engineering stakeholder that I also care about as a data product manager, I should also be encouraged to clean up that clutter to optimize our data warehouse, to optimize our data stores, et cetera. So, I mean, I think they go hand in hand. it, it really is about adopting that mindset, but it seems like people forget, part of that mindset is acknowledging the bad products <laugh> and, and being able to clean those up as well.
Scott Hirleman (20:30):
Yeah, I mean, there’s tech debt within the a product and then there’s tech debt that is the product. And, and you sunset things. And you, I I still haven’t met a single person in data mesh that has sunset a data product. And, and Michael was part of a a panel recently and one of the, the participants kept talking about data generation. Like that’s the type of thing, like when you have a data generation strategy, a data sourcing strategy about like, what data do you need to create? Not what data do you have? What data do you need to create? That’s where I start to think about data as a product. That’s where you start to talk about data, product marketing. You start to talk about actually going and talking to your constituents and not just saying, I’m waiting for them to come to me.
(21:14):
But you go to them and you go like, how do we create this value? How do we create this value six months from now? So I’m gonna start creating, I’m gonna start sourcing this data. So when you’re actually ready to ask this question, and it’s gonna be an important business question that I’ve created this stuff for you, and that we’re, we’re flowing as that type of thing. ’cause you have that roadmap, not just to the data product itself, but to your entire set. Like a lot of what you’re talking about is literally just like, how do you think about this enterprise information set? And how do you think about where do we wanna go? And exactly what you’re talking about of how do we prune our, these crappy things that people are relying on that aren’t good or that people aren’t using at all. Or like, how do we actually have those conversations when everybody has historically been incredibly reluctant to get rid of, to delete any data. So JGP sorry, I’ve been, I’ve been talking the whole time while you’ve got your hand up, <laugh>.
Jean-Georges Perrin (22:10):
No, no, it, it, it’s fine. We all know you and love you, Scott even when you’re taking a third of the time and we’ve got the stats. but no. So, so, so, so what, what I what I wanted to ask the audience first to, to, to quickly answer to Michael’s question. Yeah, Michael, you can, you can push your controversial questions in chat, it’s all okay. We are, we still love you as well. but 4,000 tables, I think, I think this is, this is, this is, okay. Okay. I know, I know some companies are talking about something like 90,000 pipelines, so you can imagine that the number of table after, you know, or before or whatever. but, but so I don’t think, I don’t think that’s shocking. I, I’d like to, to drive a little bit the, the, the discussion to what’s inside a data quantum. Okay. ’cause that was kind of the, the, the topic of, of, of the day. What’s, what the hell or what the, what the “beep” is a data quantum. So that, that’s why I would like to, to drive it a little bit towards, towards this discussion. And now some people have raised hands and they just they just <crosstalk>
Scott Hirleman (23:20):
You scared ’em off. You scared ’em off.
Jean-Georges Perrin (23:21):
I scared them off. Okay. So, so anyone, what, what’s inside your data quantum?
Scott Hirleman (23:28):
Or like, Paul, you’re planning on building these, what what are you planning on starting them as? Are you planning on starting ’em as a table or are you on starting ’em as a dataset, or how are you thinking about that?
Paul Cavacas (23:39):
So my initial plans on them really are both, both sets. So there’ll be data, so that’s kind of almost like the data on the inside, the source line data that really pretty much maps to where it comes from. But then the real goal of what we’re trying to get to is the data set. So basically think of like, you know, it might be a customer one, it might be a product one, it might be like invoices one, but then like the true one that’s gonna really add value would be the dataset that kind of merges them all together and has links between it. so what I’m planning on inside would be, so I have big plans. I don’t know how I’ll get to all of them, but in the we’ll have all the details of kind of, I, from my definition, so the definition of the data product, it’ll have in there the code that needs to be deployed, where it needs to be deployed, how it needs to be deployed, and kind of the steps that can run and actually process it on a daily basis.
(24:36):
So like if they have to run s l o checks, the s l o checks or inside of that as well. So that’s what I’m in the process of kind of building on right now, is the ability to kind of do that. And that’s all defined by the definition. So the quantum, that’s what I basically think of it as that my, my quantum is my definition, which includes both kind of like the public facing, like this is what business people care about, you know, the definitions, tables, the context, and then the technical pot has a different section of it, which kind of has all of the, like this is the repo that it belongs to, and this is like the airflow dags that might be involved inside of it. So that’s my 50,000 foot view of what I’m doing.
Scott Hirleman (25:17):
Like that. ’cause it’s basically like, it’s what people care about and that that needs to be contained in there. And if it’s people that are consumers care about this, if it’s the people that are producers care about this. So this is the self-contained unit, which I think is Zhamak’s view as well. Samia, you’ve had your hand up for a while too as well.
Samia Rahman (25:36):
Yeah, no, plus one, two, what was just mentioned to me, it’s, I I think I heard that from Michael earlier as well. The quantum is the infrastructure, the code and the data, right? That’s how Zhamak defines it. And I’ve seen that consistently in all the implementations I’ve done. You are creating the infrastructure unit in your catalog. You are creating it in your warehouse, in your data lake storage. ’cause you might need different poly glot or multimodal formats of access. Some people want it on snowflake, others just want it on a flat file, right? They are data scientists. They wanna get to their feature engineering approaches. So to me, the quantum really should offer or the platform should offer the bootstrapping of that quantum with all the attributes being satisfied, right? Access management is a big thing. I’ve seen data usage guidelines as necessary. If you’re in biotech like me where you have to call out, this finance data set can only be used for transparency reporting and not for clinical trial design. It, there’s a whole like, no, no in those aspects. So to me it really is that quantum that brings everything together as a whole so that it can be operated, used safely in, in all by all the personas that need to work with that data product.
Scott Hirleman (27:06):
Well and then those policies, that policy as code is really important. But at the beginning of your journey, <laugh> policy as code, it’s not gonna be very good. So do you have to start with simple use cases and like your definition of evolves over time? Or
Samia Rahman (27:21):
Yeah, I actually put together a data product lifecycle. I think your first M V P, you’ve identified you have an Oracle ERP system, right? And you want x data sets from it about suppliers. to me you just need to get it findable, accessible and secure. Those are simple enough constraints by which you can allow for exploration. So your first M V P is give me an explorable data product, then I can figure out how I’m going to wrangle the data and make it usable for whatever the intended uses. So the data product will go through various life or that maturity level as you go through the life cycle. And you don’t end up with 4,000 Oracle data data sets, right? You end up with intended or fit for use data marts. Some people use in the financial line of business. Data marts are very popular. they’ll start modeling the data with the context of their intended use. And then that ties into your, the, the BI reports and so on. So to me, the, the maturity level will keep increasing where it’s explorable, then it’s usable by one use case, then it’s reusable by many. But not all data products need to be reusable. They’re still reducing from two weeks to two hours of decision making. ’cause you’ve automated it for that single business outcome.
Scott Hirleman (28:53):
Yeah, no, I think very much that that’s where it goes. But where it starts is that exactly, if it’s <laugh>, JGP,
Jean-Georges Perrin (29:03):
No, A little more question to Samia. So when, when, when you say you, you’ve got a documented data usage, which I, I love this idea. is it, is it just, is it just text or is it something you actually enforce?
Samia Rahman (29:18):
The desire is always to enforce right now, depending on where you are in your maturity curve, enforcing it by policy is always desired. But to me, even be it, it can be part of your runbook before you go live or even before you start designing, you want to make sure that those data, the intended use, there’s a lot, lot of legal jargon sometimes when you purchase data sets from different organizations or you use data from different systems that you need to adhere to. And that’s something I learned over the last year in biotech where you have to pay a lot of, and you have to pull in legal to understand that, parse that, and then codify it. That’s level zero. And then eventually, hopefully we can automate that so that at bootstrap of like the next use case with those same source data sets, we are thinking along the same lines and it’s guided and we’re safely going out to the next feature that we wanna develop.
Scott Hirleman (30:19):
Yeah. And I’m seeing some people try to do exactly that codify if they can, but it’s also, I’m seeing some people just go, we’re gonna have friction around this dataset when somebody wants access. We have to know, you have to give us a lot of information where we don’t just have automated access control because there is a lot of bad ways that this could be used and there’s some good ways that it could be used. So Tom, you’ve had your hand up for a while.
Tom De Wolf (30:45):
Yeah, I just want to pick begin to the, the, what is your first M v P of your data mesh platform. And also what we do is really focus on that data product as a first step and the lifecycle of that. ’cause you have to have something to build those policy enforcement things, the discovery of things on top of. And if you don’t have that, then you have a very diverse landscape of yeah. data warehouse, data lake tech like all the different kinds of systems. And putting a layer on top and abstraction on top makes that possible to make it more uniform and get it to the discovery to the policy enforcement. So start with that quantum and make sure that’s in your your platform. That’s sort of my guideline for them.
Scott Hirleman (31:35):
Yeah, “minimum viable mesh” is a topic that has come up and is so difficult to to deal with. Austin, you’ve had your hand up for a while. Data products. Yeah, I was, go ahead.
Austin Kronz (31:45):
Yeah no, I just had a question. You mentioned enforcement and it’s something they, that comes up all the time, especially talking about data products, but also just especially data contracts now too. oops, sorry. I saw a notification. enforcement sounds great. And then I think what we end up talking about is like alerting, right? You know, it’s like an alert or an exception table, you know, and it still ends up being someone manually is gonna go in there and actually deal with this. Like, did we build a good—or is enforcement the wrong word, <laugh> or have you seen use cases where there actually is a possibility for automated enforcement or more programmatic enforcement, or is it still, we’re kind of just getting better at automated alerting at scale?
Scott Hirleman (32:33):
Curious. Yeah, it’s, it’s, it’s funny. Reliability engineering practices in data are, it’s like, I don’t understand how nobody’s doing reliability engineering around data. They’re doing engineering around data reliability, which is quality, but like run, you know, Samia mentioned runbooks, when you say runbooks and data, people’s eyes just go, what, what, what we have to do runbooks now for these alerts. Like if I’m getting an alert, I should know what that is. And the automated enforcement, I haven’t heard, you know, like Righto and people like that are trying to do that stuff around like security and, and that, but I’m not hearing of people doing those very well. <laugh> I’m hearing of people going, we’re testing it out with stuff where if we get it wrong, it’s not that big of a deal versus like Samia all of a sudden you get a, a, a, you know, extra million dollar bill from your, your vendor or you find out that when they come in and they audit you, then it’s huge. So Sam’s had her, her her hand up for a while and then J G P and Yulia.
Samia Rahman (33:33):
Yeah, I, I think it’s happening. It, if it’s not happening in your data team, your security risk and compliance team is doing it, right? They are doing zero trust architecture and you should be partnering with them if you’re not, I’ve seen folks who with using privilege access management, you can say, this line of business and this group of people only have access to these financial data sets and the source systems, right? Even in your software systems where operational systems, the same concepts go in. So to me enforcement and alerting is happening effectively. I’ve seen it consistently over the last decade. Without that, you can’t be SOX compliant. That’s the number one thing all organizations have to invest in. and there are very specific rules around how you manage access to data and enforce policies on even what columns are visible to who. So to me, those are just foundational things that have been around for a while.
Scott Hirleman (34:36):
Yeah, I I mean, if the only people you’re satisfying with your platform, if you think you’re only constituents are your producers and your consumers, you’re headed for trouble. You’re headed for a lot of legal trouble, you’re headed for a lot of legal friction, you’re headed for somebody bringing the hammer down, right? Like you’ve gotta figure out how to bring those people into the conversation very far left in the, in the actual development process. J g p, you’ve had your hand up for the longest,
Jean-Georges Perrin (35:07):
And I, I’d like to underst the comments you made in, in, so I don’t know if this is, if you’re going to talk about that, but you made a comment in the chat. Like, my take is that a a a is that quantum will be different from company to company based on obviously defines the actual data product. This is something I kind of fundamentally disagree with, but I’d like to hear your, your your, your take on that.
Yuliia Tkachova (35:35):
well, so I guess we should give the flexibility to organizations to define their data products, what actually matters to the business. This is where I’m coming from. And if for one, let’s say for one pro like business unit, it’s gonna be a data set. So this data set is consists of tables that are updated on a batch cadence. It totally could be a a table. I see it like this, but let’s say, let’s say they have a machine learning model deployed at the production, then their quantum could be even the event.
Jean-Georges Perrin (36:25):
But, and this is so, so, so, so I, so I, I think, I think we’re actually in agreement, but so in, in the case of these AI model, okay, would you put the AI model in the data product itself or in the data quantum itself? Okay.
Yuliia Tkachova (36:39):
Yeah. No, no. the total is the model is consists of smaller pieces that should be defined as a quantums, while the model is actually a product that delivers value by itself. That’s why it’s a product. This is my take. Yeah.
Jean-Georges Perrin (36:55):
Yeah. Okay.
Austin Kronz (36:57):
I sorry, just to, I I completely agree with you because if you have a data product mindset, let’s say you build a customer churn analysis, which is just the diagnostic who left, right? It’s a dashboard that tells me this, well, I could technically improve the customer churn product by creating an actual machine learning model that now proactively tells me this person might churn, or this is the key drivers as to why they churn. And to me, that is still the same product with different, I guess we would call it quantum.
Yuliia Tkachova (37:32):Yeah, yeah,
Jean-Georges Perrin (37:35):
Yeah. It’s a sidecar. It’s a sidecar. You put in your quantum, right? The thing is, what is the implementation? And when you look at the three planes that that jamach described, the infrastructure plane, for example, we don’t care the data product plane, that that’s what we care and we can actually build the interfaces to this, to this world, but the product behaves in the same way. So yes, the products are different because they have different outcomes, but the implementation or the behavior of the quantum is the same. I think we, we kind of, yeah,
Yuliia Tkachova (38:09):
On the same page,
Scott Hirleman (38:11):
A lot of people have said that their internal definition of data product is so convoluted that they’ve started to call them different things. So like mesh data product or yeah, Austin, Shane Gibson calls them an information product because it’s not about just the data, right? The data is the ones in the zeros. So if you don’t have any information or like, there, there’s all of these, these things and data product is an overloaded term. and so, you know, people have a different immediate conception. So some people call ’em something different, you know some people call ’em data quantum, some people call ’em, you know, mesh data products. Some people call ’em all this stuff because it just gets crazy, right? And that, that’s where you’ve gotta actually just ask the other person, what is your definition? What are we talking about <laugh>? So that I’m not saying data product and you’re saying data product and we mean completely different things.
Jean-Georges Perrin (39:07):
Well, the thing is just just to jump on this vocabulary thing, okay? The, it’s, it’s a big part. And, and when you, I don’t know if any of you read the whole ITIL (Information Technology Infrastructure Library) part, the ITIL, you know, standard library, blah, blah, blah, that the British government set up. I just read the very beginning of it. But ITIL, l says, the first thing you should, you should state is your vocabulary. Okay? You start a project, you set up the vocabulary, so everybody aligns to that. And maybe this is something as a group as we need to have more formalized definition as well. Okay? And, and that, that would help the industry as a world.
Scott Hirleman (39:43):
I’m finding that one of the key things for doing data mesh is the last 15, 20% of a meeting is just going, okay, did we understand each other? What did you mean by this? And <laugh>, are we in agreement on next steps? And that, that saves days of work on every single rev, even though it feels like, well, of course we’re on the same page. And then you start to talk <laugh> and it doesn’t, it’s not at all <inaudible>.
Yuliia Tkachova (40:14):
I don’t wanna keep up everyone you know, of time. But the thing is, it’s not just in data mesh or in data development, you know, it’s entirely through the entire organization. Nailing down the definitions will help to use miscommunication you know, shorten the time of the meetings. And it’s so much helpful to establish this shared understanding about words that actually should be beyond data and, and, you know, things like that.
Scott Hirleman (40:44):
It’s, I mean, it’s just human communication end of the day. Yeah, yeah.
Yuliia Tkachova (40:48):
Basic things, right?
Jean-Georges Perrin (40:53):
Okay. On this basic thing, comment, maybe we sh this is where we should wrap up. Unless someone has, you know, we are, we set, we initially set half an hour. We never stay half an hour, but maybe 45 minutes should be the cut of the hard stop. But I don’t want to break. If anyone wants to add a last comment please do. Yeah. Tom,
Tom De Wolf (41:12):
Yeah, I was, was thinking maybe the defining what a data product is and how it is implemented is also part of the, the federated governance principle. So that in a certain setting in which you are going to realize data mesh, you’re going to talk to each other and agree on, okay, this is what we call a data product. And because you need some kind of more like some kind of things that are uniform to make them interoperable so that they can be connected to each other. So, or the extreme of that everyone can define it as its own will not make it interoperable with the rest of the organization. So maybe it’s part of the governance part.
Jean-Georges Perrin (41:58):
I, I think that’s an excellent topic for, for, for our next chat, which will not be next week, unfortunately. I have something, but the week after things happen. but yeah, and I think that’s, that’s, that’s n excellent topic. and I hope you will be there to animate it as well. So with all that said,
Scott Hirleman (42:21):
And, and if, if people can’t get enough of me, and but before then, if they can’t wait another week, I am doing the great data mesh debate or great data debate with, with Austin next week. So <laugh>,
Jean-Georges Perrin (42:34):
Have fun, have fun with them. Oh yeah. Guys, thank you so much for joining again, and see you not next week, but the week after. Bye.
Scott Hirleman (42:43):
Thanks everybody.