Bounding Data Products

A summary of our Data Mesh Learning community roundtable discussion on August 24.

The Data Mesh Learning community hosted a roundtable discussion with data mesh practitioners to discuss bounding data products. Some of the questions posed during the discussion included: 

  • How do you determine the bounds of a data product? 
  • At what point in your data product journey should you bound data? 
  • How many data products should you have per domain?
  • How do you evolve your data products or domains?
  • Can your team’s size affect your data boundaries?

Facilitators:

  • Jean-Georges Perrin, Senior Data, AI, and Software Consultant & President and Co-founder AIDA User group 
  • Scott Hirleman, Founder and CEO of Data Mesh Understanding 

Participants included: 

  • Yuliia Tkachova, Co-founder and CEO of Masthead Data & Community Team Lead for C2C 
  • Matt Harbert, Data Architect at Direct Supply
  • Daniel J. Twomey, Vice President & Financial Advisor at Morgan Stanley
  • Samia Rahman, Director of Enterprise Data Strategy & Governance at Seagen
  • Alpesh Doshi, CEO and Founder of Fintricity and Kendra Labs

Watch the Replay

Read the Transcript

Scroll to the bottom of this post

Additional Discussion Resources: 

Ways to Participate

Check out our Meetup page to catch an upcoming event.  Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community resources

You can follow the conversation on LinkedIn and Twitter.

Transcript

Jean-George Perrin (00:08):Hello everybody. Lots of people. Hey Julia.

Yuliia Tkachova (00:14):Hi. Happy to see you guys.

Jean-George Perrin (00:17):Happy to see you.

Scott Hirelman (00:20):Are you back in Canada? Are you still Europe or

Yuliia Tkachova (00:24):Yeah, let’s disclose it publicly. I’m back in Toronto.

Scott Hirelman (00:31):Well, I figure if you’re going to whatchamacallit next week, that’s makes sense.

Yuliia Tkachova (00:37):Called next in San Francisco. Yeah, so let’s everybody let know

Scott Hirelman (00:43):You’re hosting an event. I think that’s kind of,

Yuliia Tkachova (00:47):Yeah, so please join me for happy hour at San Francisco next Wednesday.

Scott Hirelman (00:52):Yeah, so J G P, you wanted to do bounding data products. This can mean 8 million different things and it can also mean very, very specific things. So I wanted to understand even before we jump into chatting about this stuff, what do you think this means? What do you think that this becomes? What is this exercise for you? Is this at the start of a data product build? Is this something that you have to continually do? How do you think about the bounds of a data product?

Jean-George Perrin (01:29):So the question I have is usually from a lot of people is really what’s kind of the scope of your data product. So we know, and then the quick answer is it’s a domain. Okay. And other thing is the next question is how do you define the domain? And the discussion continues, continues, continues. So I think that’s why I’d like to understand a little bit from the community here is really, okay, how do you define the bonds? How do you define the borders of your data product and what’s kind of your best practices around that? So yeah, it can be 8 million things and that’s why I think this round tables are so exciting. It’s just because it can go in eight different directions.

Scott Hirelman (02:24):So you said one thing there that I think came through in our episode of data mesh radio too was your perspective is that one domain equals one data product. A lot of other people think of, and it depends on what you mean by domain, because you can have a very high level domain and that’s got thousands of people in it. You’re not going to have a single data product for that or you’re going to have a subdomain of a subdomain of a subdomain. And so people start to say, how many data products should I have per domain? When people are asking you that, what is your response there? Because when people ask me that, I can tell them what I’ve heard from everybody different, but it becomes a real difficult question because people want a copy paste kind of answer versus it depends. So when people are asking you that, you’re writing your book now on implementing data mesh, when people are asking you, how does that work?

Jean-George Perrin (03:34):So my view on it is a bit of a, I’ve got a strict position on it, it saying that one data product equal one domain, and then I’m kind of pushing the discussion more as the domain. So when you’re discussing with people and say, Hey, marketing is a domain, but honestly the thing is you can’t have one single data product for all of marketing. But the thing is you’ve got to create this subdomains. And some companies I’ve discussed with, they say, oh, we’ve got this hierarchical list of domains. It was made by enterprise architects and they have, they’ve looked at all the structure of our top level domains and then we’ve got all the sub-domains.

(04:30)
And that’s really the thing is it’s really the technology itself is agnostic to that. The thing is whether you’ve got five tables or 20 million tables in your data product, nobody really caress. The thing is you want still to want to be able to manage it. So it’s more difficult with 20,000 versus five, but the fact the domain itself for me is this small entity, this rather small entity that still is bringing value to the enterprise. So I link the domain to the value it brings and that’s how I do it. But I’m curious about other people do it.

Scott Hirelman (05:21):Yeah. Well, and it almost sounds like you’re working backwards. You’re saying here’s the scope of a data product and then this should be my domain. So that way there’s clear boundaries between that or which do you start with?

Jean-George Perrin (05:36):I usually start with the use case and the problem statement. So what are you trying to solve and what are you trying to accomplish really? Are you going to have this customer use case or are you working on a supplier case or are you trying to build this very complex situation? So it’s really when you’re thinking about why do you need the data for this, I’m driving it from the use case aligning with the data product, and then it doesn’t mean that that’s the only way to do it. So there’s also this producer line domain, but I’m more thinking and I’m usually more thinking about what the customer wants. What are you delivering, where is the value you’re bringing? Usually when you’ve got my perspective at least from is that there’s less value for the customer directly in producer aligned or source aligned data product versus the use case aligned data product.

Scott Hirelman (06:59):That was going to be my next question of when I’m talking to people, the producer aligned or the source aligned data products, the ones that are kind of closer to the source systems data and things like that. There’s kind of all over the map and everybody seems to be kind of okay with things all over the map as to size and things like that. But then the use case, it is very much like you’re creating a data product to that. But a lot of times the point of the data product is for multiple potential users. So you don’t only want to say this is only for that consuming domain. But yeah, I’d love to hear how other people are looking at this too. Is there anybody out there that is kind of starting to build their own data products or has a view on this that wants to chip in here? Because this has been a big question for a lot of people of what should my data product look like?

Matt Harbert (08:01):The approach that I’m starting to take is I’m kind of leaving it to the domain team to decide how many data products they want to create within their domain. I feel like because they know the data best, they would also know best how to divide and conquer from a data product standpoint. But to the point that was already made, it might be a case of really maybe that’s really more they’re creating subdomains within their domain, but I hadn’t thought of it that way before.

Scott Hirelman (08:36):But when you turn it over to them and say, how many I, you should know how many do they create? What are you seeing back from that? Are you seeing that people, I mean obviously they don’t necessarily create all of their data products immediately, but are you seeing that they’re creating many or are you seeing that they’re creating a few or are you seeing that they’re even creating kind of none? I’ve seen that with a couple of implementations where somebody will be like, this domain just doesn’t have enough data to even justify. I think a data product in and of itself for the domain,

Matt Harbert (09:10):It’s early days for our implementation. So as of right now, the domains that have embraced this are really creating their first data product. And so we aren’t seeing multiples yet, but I can picture a situation where we might end up starting to see multiples in certain domains as we move forward. And to your point, there will be some domains, like you said, that it may not make sense for there to be any data products at all. I think that’s even covered in the books, right?

Scott Hirelman (09:40):Yeah, JPMorgan Chase talked about that on a panel where they’re like, we went into these domains and literally this domain doesn’t have enough data for us to justify decentralizing the data into this domain. But then domains are like, well, shouldn’t that just be us? And central team could just manage everything, so it becomes kind of push and pull. But yeah, Ilia, you’ve had your hand up for a while.

Yuliia Tkachova (10:04):I actually have a question. I do not understand why J G P is actually implying that one domain have to have just one data product. For me, it sounds like a limit, and I don’t necessarily understand why it’s supposed to be this way.

Jean-George Perrin (10:23):I’m not saying it’s supposed to be this way, it’s just the rule I’m following.

Yuliia Tkachova (10:29):Why do you have this rule then?

Jean-George Perrin (10:32):Because I’m stubborn,

Yuliia Tkachova (10:36):But maybe there is a reasoning inside.

Jean-George Perrin (10:41):No, maybe I oversimplified my reading of Jean Max’s book and I said, okay, well that’s how I understand it and that’s how I’m seeing it. But what I is that it’s a lot simpler to explain that when you’re doing, when you’re going around and you’re making all this advocacy, all this discussion with the data product owner, with all the, they understand the domains. They don’t understand really the data product. Okay. So yes, as Matt was saying, the thing is I could say that there’s multiple data products within domain, but I think it’s easier to slice the domain and make people think by domain and say, okay, well I can slice this. Okay, this is marketing for, I don’t know, Europe for example or something. Okay, and then, or this is this kind of product marketing specifically or this branch or this activity. So I think it’s easier to have the discussion and maybe I’m stubborn and I should change my mindset and say, well, it’s okay to have multiple data products for one domain, but then I felt it kind of easily easy to have some rules like that which are maybe a little bit basic or just trivial or maybe just let’s say something like one data product, you got one domain, then you push it.

(12:22)
It was about also pushing the burden on the domain really. I think that’s, as I’m thinking more with it, I

Yuliia Tkachova (12:29):Agree. I agree that this is a nice starting point where everybody can understand the concept and actually use it, but I’m not sure that we necessarily need to limit ourselves that it should be the main equals one data product. That’s why I was like, is it supposed to be this way? Okay, I’m glad to know that this is where you started with your data users. In a way,

Scott Hirelman (12:52):One thing that I think JG P’S rule of this or his approach to this is within a data mesh implementation, there are so many questions and if you can just remove four or five questions from the board and just put some hard lines around it, even if you go, this isn’t the right approach in the long run, this is how we’re going to do it right now. So that way somebody goes, okay, I can think these are the bounds of my data product so that I don’t have to go, do I have to create 43 data products and I need to map them out at the start versus how many exactly should I have and blah, blah, blah, versus just do this, just do this.

Yuliia Tkachova (13:39):It makes sense. Totally makes sense. I got it. It’s just baby steps and we are at this stage right now, but it’s not a limit for the future. And this is what I wanted to clarify. That’s it. Yeah,

Scott Hirelman (13:50):Yeah, exactly. But I think that’s good clarification. Sorry, Daniel, did you want to add any

Daniel J Twomey (13:55):Yeah, sorry, I went all formal. You can call me Dan. It’s like my mother and my grandmother called me Daniel, but we didn’t implement a data mesh, but we did implement domain oriented data warehouse in a healthcare space. And to be honest, I would find a single product almost limiting like a single product per domain limiting. And if I explain it like somebody’s thinking in terms of a clinical process, think about going to a physician. I need to follow the encounter data, I want to follow the procedures I want to follow, and they could be considered sub-domains within, but I’m creating a whole series of data sets or data products eventually that would answer different types of questions. Now, we had some domains that were very small, so like providers, patients, et cetera. Those were pretty minimal, and I would agree with you, you could get down to a single dataset, but when you got into some of the transactional domains, it actually opened up quite a bit. And I would almost say that if we had 80 different subjects that we were managing, and if I created domain out of each of those, it would almost be overwhelming in the other direction.

Jean-George Perrin (15:26):How many domain, if I can ask Danielle,

Daniel J Twomey (15:31):We had about 15 roughly. Some of them were mastered domains at that time. So I don’t want to get into the dogma, do we do master data management or not? But we had the basics things. So you had your patients, your providers, you had organizations, people as you needed to, but then we also had clinical domains. We were looking at provider scheduling, provider utilization domains, actually setting up some of those different encounters. And there was a whole series of different clinical measures that we were trying to address at the backend, including the creation of a customer, sorry, a patient 360. So we were trying to get to closing some of the gaps in care and evaluating from the other domains some of the data that had come along the way, but I would almost see it as a bit overwhelming. Our clinical domain was actually pretty big. It was dozens of subjects,

Jean-George Perrin (16:34):Many

Yuliia Tkachova (16:35):Products. I’m sorry. I’m sorry. How many products do you have? 15.

Daniel J Twomey (16:39):Like I said, it wasn’t a straight data mesh implementation, but if I look at the volume of consuming artifacts that we created, it was in the magnitude of dozens because it was a pretty big space. Right. Okay.

Scott Hirelman (16:59):Yeah, I mean Roche has talked about in their implementation, and I don’t know if it’s only Roche Diagnostics, I haven’t talked to Omar Kja super, super deep on this, but he’s talked about that they had 21 domains, and so I think it might be for Roche diagnostics, but that’s still, that’s a decent number, but I think that part of the company is 20, 30,000 people. So you think about some of these domains have a thousand plus people at the high level of what is a high domain, but J G P is talking about those subdomains, and that is where you start to say, Dan, exactly what you’re talking about of what do we lump together so we don’t have just this smattering of tiny data products versus do we try and do this one data warehouse that becomes one ginormous, not even micro lith, it’s just a lith, maybe not a monolith, but it’s a lith.

(18:00)
It just reminds me of the 10 things I hate about you. I know you could be overwhelmed, you could be underwhelmed, you ever just be whelmed. But that concept of how do you find what size you’re doing? I mean J G P, how did you think about that? Was it number of data sets? Was it complication? Was it again, was it just what is just necessary to serve this one use case? And then as additional use cases come on, you might add additional things to it. When we’re talking about the topic is the boundary, how did you find those boundaries?

Jean-George Perrin (18:43):And that’s where I’d like to be very careful about. We are thinking about data products. We’re not thinking about data sets. Okay, so when you are thinking about data sets and not targeting anything, you said that, but the thing is, when you’re thinking about dataset, the structure of a dataset is pretty firm, but the idea is the of a data product is to be a little bit more flexible over this dataset. Okay? So my vision of that is that a data product can have multiple data sets, and a data product can have multiple versions of a given dataset as well. Okay? So then based on that, you can’t carry this evolution of the data. So for example, you comes to me and ask me, Hey, please build a data product that answered this use case and say, Hey, well I can’t do that. Okay, you are the expert in this data, so congratulations.

(19:44)
You’re promoted as a data product owner. And then we build this thing and it’s becoming, let’s say sales, let’s say road transactions for a banking application. And that’s a version one of it. But then Dan comes to me and say, I want to have the aggregation of this data as Julia ask you, but it’s just the aggregation by country or by country on time or whatever we want. Okay? So it’s still the same for me, it’s still the same domain, it’s still transactions, and we can have this multiple data sets within the same data products. This is not incompatible, and that’s where it’s not directly linking about the data are not directly linking a data product to a dataset, but I’m more linking a data product to domain as well, rather. So creating this interface between the logical world and the physical world as well. So physical world being the dataset, logical world being the domain,

Daniel J Twomey (21:12):And I hear you there. It’s a give and take on either direction. It almost feels like, if I think of it from a consuming perspective, does the data product itself get too overloaded for me to consume

Scott Hirelman (21:32):Or evolve? That’s the real question because if you actually want it to be a product and you don’t have it, if you want it to be kind of what we’ve had with the warehouse, it’s like this thing has been put out and it can never change versus evolve. But Carlos sauna at Idre Oio on one episode of Data Mesh Radio, episode one 50 was talking about he went off and he read Schmack article and tried to find people for a year and tried to get people to talk data mesh and he couldn’t find anybody. So he just kind of went off and did his own implementation and they went with really, really small data products, and they don’t have enough data products or domains or things like that to justify that concern of do you have a mesh that just is littered with these tiny data products, which then hurts discoverability unless you have incredibly good documentation and incredibly good tooling around discoverability. So you’re kind of balancing those two of do I overload and make it hard to evolve or do I focus on these small little contained things? And I’m finding people succeeding with both. But what I’m seeing people succeed a little bit more with is what J G P is talking about is trying to collapse it down a little bit more than you’d think because otherwise you’re creating these mini warehouses and those mini warehouses are too rigid to evolve, and that seems to be something that’s coming up. So

Daniel J Twomey (23:16):Dan, go ahead. What precludes the evolution, Scott, just out of curiosity?

Scott Hirelman (23:21):So the more tight coupling you have versus loose coupling, which within a data product most of the time, a lot of the time the datasets are more tightly cod, then when you think about microservices and you think about that loose coupling. So the more that you have things that are very, very tightly one-to-one connected instead of connected just via standards that you have that are more global standards, the harder it is to evolve these things because you go, oh, okay, this representation is no longer really great, so I’m going to change the way that this table works or this dataset works. And if that dataset is super tightly coupled to something else, so you can have collections of data sets and it’s not a big deal, but if they’re tightly coupled, then that evolution of changing a data set can break everything inside the data products. So you want to make sure, I don’t know how people are managing that coupling within data products. I’d love to have that conversation with people, but their eyes glaze over the second I ask that question.

Daniel J Twomey (24:27):That’s where I actually conceptually go to the separation. We actually had an experience like that where we had one larger subject that had to do with medication administrations. So it was an overloaded subject. It had the order the fulfillment or the dispense and the administration of a medication, and we found it was overloaded. We needed to separate it, but by dealing in granular subjects, we were actually able to peel the piece parts apart and provide a migration path for our customers so that they didn’t break at that point. Now they implemented similar interfaces in that respect, which I would expect of a data product. And we had loose ties into other subject domains. So I could put the patient, I could put the facility, I could put the provider links in there. I didn’t lose that capability, but that’s where I actually like the granular a little bit further because the bigger it gets, the less I can change it.

Scott Hirelman (25:47):That’s exactly what I’m saying is the more things get tightly coupled. J G P, you never have to hold your hand up. J G p, you’re,

Jean-George Perrin (25:56):It’s your thing. It’s fine, it’s fine, but it’s the thing, I think it’s more like an order of thing. But one thing I forgot about when you asked the question Yu Julia about why one to one is that also what I’ve noticed is that when we’re thinking about the hierarchy of domains in an enterprise is, so yes, you can start with 15 or 20 or whatever I would say top level domains and then you’ve got this hierarchy. But then if you’ve got a multiple one to end kind of relationship between your domain or your subdomain and your data product, then you’ve got to manage one of domains plus eventually a hierarchy of data products as well. Okay. So if you have this one-to-one, then you only have one hierarchy to deal with. And in some companies I’ve been there, depending on the m and a strategy, a merger acquisition and say you already have an organization with its hierarchy and then you’ve got another hierarchy and some people trying to map the hierarchy of the domains between the different companies. And then so if you add the complexity of, okay, then I’m going to add data products on top of that, and that’s kind of what I wanted to add as well. I forgot to say it before.

Scott Hirelman (27:28):Essentially your way just removes an order of complexity, a potential place for complexity. And so what somebody else might call a domain is not necessarily what you would call a domain in your thing, but it removes that yet. Another thing of okay, now domain you have to figure out should you have one or should you have 10 or how many should you have underneath that versus just like, let’s just clear that yet another question and answer it for you. So I don’t think that that’s necessarily how people will look at it in the long run, but I do agree that it’s just something, it prevents more complications for people. So Samia, where were you bringing on your camera to unmute or

Samia Rahman (28:21):Yeah, I just wanted to add, I think everything you guys shared resonates a lot in the biotech or the life sciences space. I think the number you mentioned 21 domains, life sciences has very well bounded domains, which is a privilege in my opinion. Not all companies have that right molecule to market. You have the research domain, it’s very well bounded, it maps into a specific line of business, whereas commercial sales of a drug is well bounded in commercial sales, clinical trial operations and clinical similar to healthcare that Danielle was mentioning. I see those as each, it’s like at the enterprise level, that’s the enterprise domain. So research has a research data repository or a data product. It’s a collection of subdomains within it, right? Because in research you have the chemistry, the physics associated with how they deal with the molecules. It’s also very unstructured in that space.

(29:25)
And so in that particular domain, they can have n subdomains. And that’s where the art comes in of how big is it and where does this really belong? Where does the source data get generated? Is it coming from the lab by the researchers or is it a post-analysis, right? So to me, the definition of a data product and that boundary is going to vary in research while in commercial it’s a lot more obvious where you have sales marketing, those are the typical roll off the tongue discrete sub boundaries within that domain, and you can manage those data products. And all of that goes back into me guys, we’re talking about identifiers or that interoperability is the key to the evolution you’re seeking. So if you have identifiers built in each of these domains that allow the domains to stitch together, then it becomes easier to get those aggregate insights.

(30:30)
So in commercial it would be sales, marketing insights, and then finance as payments that go out to some of these folks. So when you tie all of that together, aggregate, it goes into a reporting in biotech, or sorry, a compliance report that goes out in biotech space. So to me, at the end of the day, that identification of the boundary and the definition is going to vary in company to company. And the other thing I wanted to add was in the other reflection I’ve had is data products are also that boundary and the infrastructure boundaries another interesting relationship because there is a cost, if I say every small data, I’m going to make it so small that it’s each dataset, each dataset has a warehouse or a container in my data lake. Those are some of the trade-offs people have to make. So in Snowflake for example, you have the beauty or the luxury of you can just create views on top of views, so you get a lot of flexibility and you can just say, okay, for this particular line of business research, you get one warehouse for finance, you get one warehouse, go build your data marts.

(31:52)
And in that data mart, you’re creating those sub data products, right? Again, versioning, et cetera, et cetera. So sorry, I’m just sharing a few loose thoughts there from all the various implementations I’ve seen.

Scott Hirelman (32:09):I don’t think you should apologize. Yeah, Sam, always good to hear from you. And one of the things that you talked about in there is a little bit of this concept that I see a lot in the financial services space and banks and stuff is that they talk about virtual data products, like data products that are not necessarily always composed that they are a recipe that when somebody is going to pull it and it says with these five or six different components, you do some filters and stuff. And so then it pulls that together, especially from source aligned data products instead of always having that together when a lot of times a lot of those are very, very expensive to put together or very rarely used, or there’s very, very specific use cases.

(33:09)
My interview with Bach Yassin from Abian Amro talked about that. My interview with God, the Ani from Bank of America also talked about that, that they’re doing these virtual views because I think that’s a really important point of sometimes there are not real data products, but there are recipes for what people are going to continuously pull the same thing. But you don’t want to actually compile that because it can get incredibly expensive to be always doing all of the analytical prep work when sometimes people aren’t using it. I especially talked about a tax data product for Fidelity or something like that. That’s the financial service that I use. They don’t need to be compiling that full data product every month, even if they need the information to flow through as to what transactions they did and what taxes were paid and what weren’t. But they don’t need to be pulling that every month. They only need to create a statement once a year. And so that’s more of a business process than necessarily an analytics, but somebody might be pulling analytics as to what were our tax profiles and how could we potentially lower our taxes for our users and things like that. But exactly what you’re talking about.

Samia Rahman (34:28):Yeah. I’ll just add to me, analytics is part of the data product. Those views are output ports, right? Virtual output ports, if you want to label it virtual to me, at the end of the day, it’s just an implementation detail. But as a consumer of the data product, when I run, I want my global transparency report that report behind the scenes. It’s pulling and running your recipes and analytics, but I am getting a version of truth at a point in time. And all of that is end to end, has to have integrity and versioning, et cetera for compliance purposes. So to me, the data product versioning and all those notions of trustworthiness, et cetera, are all built in into that one bounded context.

Matt Harbert (35:20):One thing, go right ahead. No, no, go ahead, Matt. Go ahead. I was going to say one other thing, nuance that I wanted to call out is sort of the human element to the bounding. So for example, I only have so many human beings that I can divide into domain teams. And so even if within those domain teams I have to have subdomains, it’s going to be the same human beings doing it. So might as well be a one domain team with multiple data products.

Samia Rahman (35:49):Yeah, Matt, you bring a very good point in my observation. Like our finance domain team, they are not, or not finance, our commercial domain team, they’re not allowed to see clinical trial domain data. So our architecture is influenced by the team and the data they’re allowed to manage. And then you put guardrails of access controls, et cetera. You can allow for interoperability for limited attributes, but team size also impacts that boundary.

Scott Hirelman (36:23):Yeah, I mean, hin sun at JP Morgan Chase said they were trying to decentralize to all of their domains, and then they found some domains that were too small where it just didn’t make any sense to decentralize the data into them because they didn’t have enough data and they didn’t have enough heads. So it was like we’re going to have to spend all of our time on these people from doing their jobs to just doing data. So then they’re not doing their actual value add of the domain as well as the data. So they moved that back into small centralized teams that are handling four or five of these domains. They might have a couple or two or three of those teams because there are these domains where that’s the only thing that really makes any sense. And so I think pragmatism is something that Ack doesn’t talk about a ton in her book.

(37:16)
She does when you actually speak with her, when you actually have a conversation with her. But she couldn’t be like, and here’s the pragmatic approach for every single aspect, it’s like, here’s what it would look like in a world where it was perfect. It’s kind of that economics of holding all else equal. And it’s like that can’t actually happen, but that pragmatism as to figuring out what works for you without cutting corners and just going, well, this is the easy route versus this is the route that’s going to have the biggest impact and kind of testing and figuring that out. But Matt, it’s exactly what you just said. Sometimes I just have to do what I have to do. And sometimes Paul Kabaka has talked about this from Ocean Spray and he’s talked about, I’m never going to be able to fully decentralize into these domains because they just don’t have the need to, but we’re pushing more and more of the ownership to shared because that’s what’s going to drive value.

(38:14)
And then maybe three, four years down the road, we can decentralize, but we can’t provide them with the tooling and the training and the information to do this, and it’s not really justified except for in a couple of domains. So us trying to go full data mesh right at the start doesn’t make sense. So that balance, that pragmatism is something that part of why I’ve done so much stuff with data mesh radio is, and this is why I always grill J G P as much as I do, because he made decisions from pragmatic points of view, and he’s not saying this is what everybody has to do, but he made calls and he’s willing to go out there and put his neck on the line and say, I made these calls and I think they were right, and here’s why. And so I think that kind of thing is really important and there’s a reason why I’ve had you on multiple panels as well as an episode as well. So

Jean-George Perrin (39:16):I want to go back to what Samia said and what she also texted. It’s an art, and I really wish that at some point collectively, we managed to turn the art into some science. Don’t get me wrong, I love arts, see my wall. But the thing is, I’d like to have something like a little bit more methodology, and I’m not saying let’s go back to car and sort, novel form, et cetera, but maybe there is a way that we can explore it together over time and collect this best practices to bring a little bit more of this science and less art, but it’s very young as well.

Samia Rahman (40:08):Yeah, I think domain-driven design has been around for over 20 years. I can’t remember when the first concept came about, but it’s always called as an art. There’s the methodology. There are activities you can do to identify those boundaries, right? There’s a ubiquitous language event, storming those kind of activities. When it comes to the data analytics space, we do data modeling. Even data modeling is an art at the end of the day. And to me, the methodologies, maybe those activities that can help you derive to an answer, but it will always keep changing. In my opinion. Your business is evolving continuously, right? That dimension you modeled is only good for the year, and then next year, hey, you added another thing about another product, your dimension has to change. So to me, I struggle with saying if there is a science or a methodology, really, it’s really a series of activities and teams being pragmatic along the way to capture it. I think historically, everyone wants that methodology, but I haven’t seen anyone come out with one because it just varies so much.

Jean-George Perrin (41:36):And maybe we should,

Scott Hirelman (41:40):I was going to say there’s no right. There’s good enough for now. That’s the thing that if there’s a couple of things that I would say at the beginning of every data mesh document, everything internally, every confluence page. It should be, there’s no right, there’s good enough for now, and you’re not the only one having this question or this challenge. Everybody else is just nobody’s talking about it. Those are the two things that I try and tell everybody.

Jean-George Perrin (42:11):But to bounce back on that, so the first thing is, yeah, let’s keep it an art because if it’s a science judge, GPT is going to get us out of a job. And the second thing is data mesh being this collection of data products enables to have this iterative process, which we don’t always have the luxury of. So actually we can evolve and make the evolution. I think we’re at time. Okay, roughly we said, yeah, we said 30 minutes. It’s 1 45. It’s 45 minutes. I think it was another great episode. Thanks everybody for joining. See you next week. We’ve got a few surprises for September, so stay tuned. Look at LinkedIn and it’s going to be bring a lot of fun in this, let’s say more fun even in this chat. So I’m happy to see friends and new faces as well. Should we

Scott Hirelman (43:22):Have a stupid sign off tagline, like same mesh time, same mesh channel next week,

Jean-George Perrin (43:31):Or do you want to meet mini mesh? Okay. Appreciate the conversation. Thank you.

Scott Hirelman (43:40):Thanks everybody.

Jean-George Perrin (43:41):Mesh next week. Bye-bye.

Samia Rahman (43:43):Bye folks.

Ways to Participate

Check out our Meetup page to catch an upcoming event. Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community Resources