The Benefits & Pitfalls of Data Mesh: Considerations Before Implementation

Sep 28, 2023 by Melissa Logan

A summary of our Data Mesh Learning community roundtable discussion on September 27.

The Data Mesh Learning community hosted a conversation between data engineer and architect, podcaster and co-author of “Fundamentals of Data Engineering” Joe Reis, and Zhamak Dehghani, creator of data mesh and founder of Nextdata.

In this roundtable, Joe and Zhamak explored one of the biggest obstacles users face along their journey to pursue data mesh – change management. The conversation explored what it takes to implement data mesh effectively, including the pitfalls to avoid as well as the best practices to follow.

Watch the Replay

Read the Transcript

Download the PDF or scroll to the bottom of this post.

Ways to Participate

Check out our Meetup page to catch an upcoming event. Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community Resources

Engage with us on Slack
Organize a local meetup
Attend an upcoming event
Join an end-user roundtable
Help us showcase data mesh end-user journeys
Sign up for our newsletter
Become a community sponsor

You can follow the conversation on LinkedIn and Twitter.

Transcript

Speaker 1 (00:09):Okay, so welcome everybody to our Data Mesh Learning virtual meetup. Excited to have you here. My name is Melissa Logan. I’m part of the Data Mesh Learning staff. If you are new to the Data Mesh Learning community, welcome, we are very glad to have you here. Data Mesh Learning is a community of about 8,000 data pros who are on their data mesh journey, whether it’s just getting started, just learning about it or more advanced on their journey. Our mission is to help provide knowledge transfer and resources for you as you go through your journey. That includes a range of different things like hosting these virtual meetups on different topics. We also have a Slack where people have conversation, ask questions, give advice about challenges or successes they’ve had with data mesh. We just launched a new medium publication economist medium publication called Data Mesh Learning that has some writers on there to share their best practices as well.

(01:09):And there are other programs on the website, end user round tables, data mesh days, et cetera. So if you haven’t looked at that, please do data mesh learning.com. We just went to Big Data London last week. We held our very first in-person meetup at that event. It was really heartening to see the progress people are making with data mesh. Just some really fantastic user stories from I T V HelloFresh and others and I am very excited to welcome our guest. Today we have folks who it’s bound to be a really interesting conversation. Two well-known folks in the data space truly need no introduction, but I’ll do it anyway. We have Jamma Dani, who is the creator of Data Mesh, wrote the book on data mesh, also the founder of Next Data where she’s working on a data mesh native tool set. And we have Joe Reese, who is host of the Monday morning data chat author of data engineering book, working on a new course with Coursera and has a new eponymous show, the Joe RI show.

(02:12):Ack and Joe will be exploring one of our favorite topics, biggest obstacles users face along their data mesh journey, which is change management. This will be an open conversation among data friends. If you have questions, please put them in the chat and we’ll be happy to answer as many as we can within the allotted time. And to kick us off, I just wanted to share some data from a community survey. We recently ran in the data mesh learning community. We asked a range of questions about buy-in. One of the questions was around confidence and alignment and we asked it before your data mesh implementation started and after and what did that look like? And what we saw is maybe people had low to mid-level confidence in an alignment before implementation started, but a completely seesawed and they started to have very high levels of alignment after just seeing the success as they began their data implementation. So that was fantastic to see and it really shows how much trust is crucial to change management crucial to this process. I would love to hear today, how does data mesh change these low levels of trust that exist in organizations today? How can we engender that trust sooner? Presumably the more success stories we see, the more I hope that changes. But I’m going to turn it over to the experts to dig into this and many more questions. So jam and Joe, over to you.

Speaker 2 (03:39):Thank you, Melissa. Thank you.

Speaker 3 (03:42):How’s it going, Zhamak?

Speaker 2 (03:43):It’s going well. It’s 9:09 AM where I am and I’ve been doing a databased tutorial from 6:00 AM and I did a jumped and moonlighted on a BART data representation in Berlin and I’m back here physically in the same space again doing this panel. So it’s been a busy morning.

Speaker 3 (04:06):Sounds like it sounds like it. Well this will be fun. We’ll keep it pretty casual. We always do. So I mean interesting survey results, confidence alignment before and after results. I mean you’ve worked in data mesh, I’m guessing. So how important is trust? I mean I think you’ve probably seen the gamut of different types of implementations at different levels. What have you seen with respect to trust?

Speaker 2 (04:37):I mean, I can unpack. This is by the way, trust is one of my most beloved topics to talk about because it’s just so fundamental to exchanging anything and in this case, exchanging data that you are going to use to make decisions that have pretty important consequences. If you’re doing drug development, you’re going to use data that is going to affect the lives of others and many, the spectrum of those trust is so fundamental. I think if I want to talk about trust, maybe let’s categorize it a little bit. It shows itself in many different scenarios. One is I would say that we can unpack the trust around does this thing even work? Should we even bother? Is this real? Is this a thing we should do? And I can talk about what my experience has been and where it has really struck a chord with people and can talk about the trust in terms of even beginning the journey.

(05:45):And I was really, it was curious to see Melissa’s kind of result and I feel good about what happened. So that’s promising. The other piece of trust, I think we should unpack perhaps two buckets is trust around the data itself. How does data meh changes the existing lack of trust that fundamentally exists between the data consumers and data producers and try to make that a stronger removing middleman directly working with the source of the people that are sourcing the data, having data product owners. I think we can unpack this so we can perhaps start with the first one is the trust in the approach and to trust in the possibilities and how that can be institutionalized iteratively in the organization as part of your transformation. And then we can talk about how it changes actually our relationship in terms of trust in the data itself. Does that sound good? Yeah,

Speaker 3 (06:45):Let’s have a chat. It’s good.

Speaker 2 (06:47):Yeah, so I think in terms of what I have seen, and I have this double personality, like a split personality right now that I live with. One personality is kind of the messenger of data mesh or someone that evangelize this topic and created it, which is unbiased to kind of technology and tooling. And then I have this other sleep personality which is now a tool maker.

(07:13):I try to provide tools so I see the world from these two different perspectives. So sometimes I’m sorry if I might switch role in my answer. So what I found recently more focusing on the tool side of it and focusing on who are the people that have been marginalized and haven’t been really in that core data sharing user flows and they always need the data but they haven’t been involved in producing the data and how they are now establishing trust with this new model. What I found is that a lot of previous approaches have been very technology driven.

(07:59):So technology warehouse because we had to be able to correlate data across many dimensions that scale or lakes so that we can store unstructured data. So a lot of the technology driven initiatives and changes they were brought into an organization from the technology department. They were brought in for the C T O or C D O or data work data mission. On the other hand, it really resonated with those domain data hackers, like people that work in the domains day in and day out, they need to work with data. So right now I’m working with a really large pharmaceutical company. I’ve worked with drug researchers, people with three PhDs like chemistry, medicine and so on, working with data to discover basically matching between compounds of drugs and genes mutation. They’re working with data day in and day out, but they’ve been pretty much outside the whole data sharing, data engineering, tech-driven data stuff. And they have been using scrappy tools. They have a hard time to discover data, they have a hard time to find correlation between data across different domains. So they established trust in the idea of data mesh because it landed on their desk as an approach, as a paradigm that is about empowering them first and foremost.

(09:23):It wasn’t brought in as a technology top down is yet another tool to use. So I think that’s the first step in trust by tapping into the incentives and pain points of the people that is actually materially going to make an impact on the business using data, finding data, sharing data and so on. So I think fundamentally there is inherent trust in, at least in the hope of change by people that have been marginalized.

Speaker 3 (09:56):That’s an interesting perspective. I’ve spent my entire career on the opposite end where I’ve been in data. That’s all I’ve known, all I’ve ever done. And so it’s interesting when I hear what I’m hearing is more of a bottom up approach versus top down. And so I come from things I think you and I in a lot of ways are almost diametric opposites, but we’re very good friends nonetheless. But the conversations are always funny. I tend to view things just through the traditional world that I come from. But I can tell you where I can tell you where trust has nose dived, where it’s collapsed. I can tell you where things have succeeded and trust is. So there’s the old trope from Gartner that 85% of data projects fail. This was actually a projection they made long time ago that everyone took it as like, oh, 85%, we’ll just call it that. And it’s always been 85% for 85,000 years or something. I’m joking.

(10:53):But I can tell you the lack of trust typically does come from this sort of battering ram approach with it where they’re like, well, we have this new initiative here or the data team has to do initiatives because that’s what they’ve been mandated to do. And so it very much becomes this sort of Conway’s law exercise of teams doing things because they’re incentivized to do them without I think acknowledging the broader context or the groups that as you point out are marginalized. And so that’s an inherent schism that I feel like, but it’s really the top down nature of just how it has been for ages. And I think certain development groups, software development groups might try and take a bottoms up approach. And I think we’ve seen wins like that in the software world for sure. A lot of the big innovations and software have really come organically from devs themselves. And so

Speaker 2 (11:44):Yeah, in fact I think Melissa just put a comment in the chat for us. The data engineers push back on data mesh, and I a hundred percent agree with that because data engineers are not in the domains today. They are in the centralized data team. They try to facilitate this data sharing in some way or another, but they’re not, the business outcome often is associated with someone else in the domain and they have tools and technologies that have been optimized fully for what their job is move data from it.

(12:27):So of course they’re going to push back because their job is going to reshape their position’s going to change. They need to find a life hopefully in a domain team. So I completely get the pushback because there is a lack of trust inherently in them for a domain expert be able to do what they do today. And that’s not what we’re saying and that’s not what we’re saying. Data mesh is about. Think about the next generation, forget about modern data stack, what is the postmodern data stack that is to allow the data hackers be able to do the data work without this really experience the specialized knowledge that data engineers have today to move data from A to B and push that down to the actual technology to automate. So I completely get why the pushback is there by kind of the data engineering in the middle.

Speaker 3 (13:25):But data engineers I think have traditionally operated in a data X sort of world. So X could be a warehouse, could be a lake, could be a lakehouse, whatever. I think those definitely serve their purposes for the intended use cases, but it’s hard for people to see beyond it. It’s old up in Sinclair quote that sometimes you’re paid not to understand something as well. And that I think is very evident. I wrote the book on data engineering. It’s not to say I’m religiously attached to the idea. I mean I remember back in February at the Starbust Data Nova event, I actually had to take the site of a debate that data mesh would eventually make data engineering obsolete. And I think after I talked to you about why some of the arguments were there, I mean there are some sound arguments as to why this would be, it’s more of a convergence of skillsets and paradigms really in support of providing data to end consumers and empowering the producers of that data.

(14:24):So what I’m seeing right now is there’s actually a massive shift left movement going on within data where everyone is trying to, I think try and couple I more closely with software engineering. A lot of the root causes that I see as a data engineer really stem from the fact that I’m on the receiving end of data. And so my trust already is rock bottom and the data I’m getting from development teams, application teams, third party APIs and so forth. So my expectations are already super low and so is everybody else downstream. For me, I think inherently the problem is our expectations are so low that this is part of the trust issue. I think people when they hear another data thing like data mesh, great, is this going to be the data warehouse that you didn’t actually build for me over the past five years? Because that’s where a lot of projects are.

Speaker 2 (15:13):And again, the question of where do you incrementally establish that trust? Where is the insertion points to change behavior? So trust can be established. Actually back when I started next data company as a company you have to as a startup think about also where’s your insertion point? Who is your I C P kind of the most desirable customer and user that you have? And I was very idealistic as a creator of data submission. I thought, okay, we’re going to go and empower software engineers that are building applications because they are the source of data. We’re going to fix this damn thing from upstream, from the source. And very quickly I realized software engineers also, they don’t care, they don’t want to actually, they’re not at the point that they’re subjected to the pain of consuming data. So they don’t actually care about this database. So for them is give me just one button to press or one command to line to automatically create data products. I don’t care about this thing. Let me just run my e-commerce app on my website. Where’s this data about? So I think to establish trust, you’ve got to find a place where to get the insertion point is where the people are most incentivized and they are writing them kind of domains that are in the middle. There are perhaps maybe technologies, whether they’re software engineers, maybe they’re data analysts, data scientists. These are people that are working with data toward a business outcome and they’re both consuming data and producing data.

(17:01):They’re not the folks that are just building a dashboard and that’s it. It’s just that I’m developing a new drug and I’m reusing data from clinical research and compounds and drug compounds and genes and so on. And I’m producing now this, I don’t know, generate brand of a particular medicine. So I’m both consuming both producing data. My day in and day out life is to work with data. I’m a domain expert first and then data expert second and let’s the start there. So I think that’s the one in my, again, you asked me tomorrow maybe my opinion change, but when also look at Roche and other places where the large multinational organizations managed to move the needle, they moved the needle by going to a domain that was extremely data oriented manufacturing, for example drug manufacturing. And they just needed to move fast and needed to empower that data sharing even within their domain with the people that they had, not the big data engineering team. So I think that’s how we slowly establish trust.

Speaker 3 (18:14):Yeah, I tend to agree with this. There’s again, back to the 85% failure rate at this point in the industry, what I keep seeing are these tropes recycled about how we just need to try a bit harder and I don’t think that this is the approach that we should be taking anymore. I do feel like approaches like data mesh for example, I think it basically flips the entire funnel on its head. The current practices. It’s something I’ve been writing quite a bit about. I feel like if I’m a data engineer right now, I can’t blame the lack of data tooling. There are thousands of wonderful tools and companies and vendors out there that can solve conceivably any problem I have. I don’t think that that’s a problem. I think education and best practices and knowledge of those, that’s a huge, huge problem in our field that also exists in software engineering and probably everywhere in business to be frank. But the things I’ve started to notice this JAC recently where if I met a company or I talk to people, I’m always trying to figure out, so as an example, where’s Excel being used a ton? Those are disenfranchised people because I mean they’re coming up with workflows and basically duct taping things in spreadsheets right now. And I think, so to your point, you’re talking about scientists really who are just doing the best that they can to do their jobs.

(19:43):That seems like it would be a good place to find some wins, right? Cause I think we scratched the surface, we’ve combed the earth of every conceivable place where we could get wind. You tried software engineers first thinking that would be a great idea. You come to find out like, hey, they work in two week sprints and they really don’t care about any extra work right now. So if you want to make it easy for ’em, please If not just

Speaker 2 (20:05):Yeah, yeah, exactly. Yeah. So I think trust is established by you seeing value being saved value and change really fast. I mean I’m just so surprised at how low the bar is when you go to the actual domains that are working with data. The bar is high for data engineering, moving data around, but actually work with data getting value. The bar is very, very low. So I think that’s where, and I’ve noticed we have a super scrappy prototype and even that people really seeing hands-on keyboard that all these processes that they had in place like managing metadata somewhere else so that somebody can produce the data, managing the documentation for the data somewhere else so that people can understand what the data is. And even that documentation was static so they couldn’t really interact with it managing the code somewhere else, the data in some other files.

(21:12):It’s just a kitchen sink of different tools these folks need to use to do something super basic and simple. There is just so much potential to reshape that and provide tooling that change behavior and add value and then open up possibilities. And then you do that 1, 2, 3 times and you have that critical mass of people and critical mass of data products and now can be surface and easily researched and understood. And here you now create that beacon for change, right? Yeah, it’s a wonderful kind of paradigm or I guess approach this, I think it’s older than what I D O proposed, but folks like Brian Walker at Idea created this movement based change that you create change by moving, not talking, not showing the action moving,

Speaker 3 (22:10):Just doing it.

Speaker 2 (22:12):So
(22:13):It’s just creating change and trust by making the people be able to do things that weren’t possible for them to do is how you build trust is how you show value. And it can’t be in the same data team in the same, a lot of these initiatives that start with that centralized team, it has to be in somewhere new, somewhere in the organization that the possibility wasn’t there, right? In the domains and which domain, it depends on your organization. Maybe your organization actually is, I don’t know, Etsy or Spotify or one of these kind of very data-centric app digital teams that they are actually software engineers are the best people to be now part of the data conversation or not. It really depends on your organization.

Speaker 3 (23:06):Yeah, I mean a couple of threads here. You mentioned teams, companies like Spotify and I always make this joke that if you want to know where data is going, just look at what software engineering has been doing for the past 10 or 15 years and just the practices keep getting adopted. So that’s part of the maturity with data. But to harken back actually to 1989, a mutual firm Bill Inman had come up with the data warehouse. He was telling me that when he came up with the idea and unleashed on the world, everyone hated it actually. It fell flat and it took time, right? People were so used to querying transactional systems, this was a paradigm. It’s like why do I need this separate analytical store and these data marts, whatever these data marts. So I mean you got to remember back then what he was proposing was about as I think game-changing is what data mesh is today.

(23:57):That’s why I wanted you two. To me, he actually has a very similar path, but it eventually won out and it was through people experiencing the impacts of what a data warehouse could provide and that leveled up analytics and BI and so forth through the nineties and two thousands, but then times moved on, practices have moved on, software’s become the web, changed everything, the internet. And so everything is webscale and practices are very, everything evolved. And so it’s what got us here won’t really get us there I think. But again, the arguments in the data world, I mean we’re still stuck talking about should I model my data in a star schema or should I use data warehouse or data lake? And I am personally a bit over these conversations. I feel like. Okay, so again, back to trust, back to massive failure rates in the data field. If we’re not doing something that’s working, maybe it’s time to look in the mirror and figure out what could work. And it’s not going to be an incremental approach. I don’t think it will have to be by necessity. Something that we haven’t done before. I would say we’ve been trying the same things for decades and we’re still complaining about it then. Yeah.

Speaker 2 (25:08):And in fact, if I just piggyback on what you just said, should I model my data in a warehouse store schema or a flat file or whatnot? And if you go back to the notion of trust, if you are producing the data as a product and it’s about establishing trust with the consumer of this data,

(25:28):You didn’t really care about your star schema or file or do they really care about what this data semantically is about? What is this data representing? So some esoteric column names in warehouse probably doesn’t tell them a kind of facts and dimension. Tables probably doesn’t tell them what this data about. That’s just topology of your data underneath it’s machine should understand. I don’t think humans should care about that. What the human cares about as part of establishing trust before even trying to use the data in a warehouse is what is this data about? What is this modeling? What’s concept in a universe is modeling how is, what language is using to model this concept. What are the relationships in this model? What are the semantic tags or concepts? Is this a private information? What policies that govern this data? And then what is the shape of this data?

(26:23):Is this for me to understand it, to trust so that I can trust it so I can use it? I need to understand is it if I’m using, let’s say doing a personalization driven kind of recommendation engine and I’m looking at the sales data, am I seeing a data that is kind of biased? It only has data from e-commerce, but it forgot all the data about the retail, what’s happening in the retail shop and a completely different demographic is going to the retail shop. So there is a lot goes into just modeling something that in a way and providing information beyond just the plain old schema, whatever the schema is, so that a data user can discover, understand, and trust, and then it will go to the table or it will go to the file, whatever tool they want to use to get to the data. So the conversation needs to really level up from concerns that matters to the machine and it shouldn’t really matter to the human that is going to use the data, at least decide to use the data. And yes, then you use the machine to use the data, right? Use the program to use it.

Speaker 3 (27:39):Before we get to questions, I got a question on machines, and this will be I think everyone’s dying to know, what do you think about large language models, generative AI with data mesh?

Speaker 2 (27:50):Yeah, I mean they coexist. In fact, I would say when I started next data, the vision for the company was building a world where AI and ML is empowered by responsibly shared secure data and equitably shared. So the idea was that if you want to get to this world, this future that machines make all the decisions and have all the insights and ML is running our lives. ML models are large, whether they’re larger or small models

(28:27):Running our lives to get to that future, there’s two paths ahead of us. One is the path of you have a really large pile of data that you take to your compute to train the model and then you use it. The other path is data stays the data ownership and responsibility data stays with whatever company or entity that has that data. And you bring the model or parts of the model to the data to compute. So the ownership of the data remains with the people. And if you just squint back as a societal level and think about, and data mesh is just a stepping stone to that future. And if you squint back and think about the impact of these two very diverse path as a society level is that who’s going to have the power? The power is going to be with the people who have the data.

(29:19):So in a centralized world, the power is going to be with the very few that manage to collect as much of a data as they could. And the other world, the power is distributed to all of the owners, owners of the different parts of the data. So this is abstract. I know I’m not talking about the specific technology, but an abstract level. I’m in the camp of a distributed power, not centralized power. So I thought next data as building tools for data mesh, we have got five or 10 years to get there and gen AI happened and that vision of a company just changed, moved forward. So I think at the fundamental level, the seduction of large language model, they’re going to have a massive impact in the approach to data ownership. And what can possibly happen is that we kind of throw our arms in the air and say, look, just dump the damn thing into a lake and we slap a model on top of it and we get with some level of probabilistic accuracy of maybe it’s 80% correct or 70% and maybe that’s okay because the data is still, it’s not high quality, it’s not going to the source.

(30:35):So there’s a delay in the data to actually get it. And it’s always incomplete because you’re always in the business of data shoveling. But that is a reality. We may be so seduced by this idea that we forget the kind of responsible decentralized data sharing. So that’s a very possible future. But coming back from kind of science fiction and future kind of prediction to reality, I think gene AI and data mesh have two relationship with each other. One is data mesh at the service of gene ai and the other is gene AI at the service of data mesh. So data mesh at the service of gen AI is a very real thing. I think anybody’s building a data, mesh data product should think about gen AI or predictive ai, whatever version of AI we have, data products should be able to feed to train machine learning models larger, small.

(31:33):So that doesn’t matter. And I think reality, what’s going to happen is that you have some foundational models and you bring them to your organization, which is a lot of AI startups are doing and they’re retraining or creating new memory for these machine learning models on the data. And that data could be a set of data products. So I think that’s just given. That has to happen. Otherwise data, I mean if data can’t support machine learning training model, that’s just useless. So that’s that. And then the other part of it is gen AI at the service of data mesh. And I think that’s the great examples of that where they can be applied discoverability. I mean I’m assuming anything beyond just a chatbot on top of your writing query. But really I think discovery of data products with rich metadata exposed by each of them and be able to ask intelligent questions to discover your data is one space generating even the data products from the data that exists is another. So there’s possibilities both. So you got the full answer?

Speaker 3 (32:43):Yeah, thanks. I was actually joking on the Monday morning data chat yesterday that there’s probably going to be a transformer model for data mesh that just translates different things between, and that’s giant model is that data mesh? I’m just joking actually. But anyway, enough about that. You want to take some questions?

Speaker 2 (33:02):Where do we see the questions?

Speaker 3 (33:04):So they’re in the YouTube. I’ll put the link in the chat here, but should turn your volume down if you click the link too, it’ll just start blaring the dark chat here. Larry Mcco, I think I’ve pronouncing your name. I’m sorry if I’m not, has a few questions here. But Melissa’s mention of the study and the confidence levels during data mission implementation, he would imagine some of the, that is due to early wins. What of where kind of fledgling team get some of those wins that Melissa was alluding to? I know it’s a bit of an open-ended question, but maybe in your experience, what are some ways that you’ve seen teams get wins with data mesh?

Speaker 2 (33:48):Yeah, I mean right in front of my eyes right this moment, not this moment, but is this, it’s really the wings are if you put enough of a plumbing enough of the framework and enough of the foundation pieces in place so that the data that was hidden in Excel sheets and in some random as three buckets and was tribally shared are now systematically shared through data products from domains and across the domains within a business unit, let’s say. Even that’s where the big wins are. The big wins won’t be in your data infrastructure or data engineering team because there still is still going to be localized within that tribe of data engineers. I don’t think that’s where the big wins are. Big wins are where with even minimal technology change, you shift tribal knowledge sharing to systematic data, product sharing and starting from domains,

Speaker 3 (35:05):Let see here, Gustavo asks, do we foresee data engineers embedded in product engineering, cross-functional teams creating updating data products related to the features that they’re building?

Speaker 2 (35:18):Yeah, I think in the interim, in the absence of technology that is serving those marginalized users, yes, I think so. I think what I see is in fact we’re working with a retail organization here in the Bay area and their e-commerce team is one of the big evangelists within the organization. And in the absence of having anyone and everyone like analyst to generate data products, they have a very sophisticated data engineering team. And what they have done is they’ve created ways of encapsulating pipelines and a whole other things as even their own. The whole air flowing stance in the docker container with a pipeline for a particular data product and data product is a docker container. They’ve done really amazing stuff. But this is data engineers doing this work. This are not the actual data user. So I think in the interim that’s what’s needed.

(36:19):But if we continue down this path that data sharing requires this really high level of engineering and specialization, I don’t think that’s right path. And that’s generally what happens, right? Even with technology, we still have software engineers building applications, but we have raised the bar to some degree. People go to a few months of bootcamp and the other coding bootcamp and start coding applications. So we have this guided experiences for people to do the right thing. We have now test driven developments. We have automation, c I C D, there’s a lot of tooling that guides the experience of developers to do the right thing.

Speaker 3 (37:04):And

Speaker 2 (37:04):I think the same thing would happen with data.

Speaker 3 (37:07):I think it will. And I’m going to chapter 11 a book where I talk about the future of data engineering and I have this thing called titles and responsibilities will morph and let me just read a couple things to you, story time. So the boundaries between software engineering, data engineering, data science, demo engineering are increasingly fuzzy. At some point they will converge. I think this is, it’s due to the fact that as time goes on, the necessity of integrating data and now AI and applications will just be a reality. And so software engineers are going to need to know data stuff, data engineers are going to need to know software stuff and so forth. And so some people will push back with me on this argument, but I think that I’m increasingly being proved correct in this argument. For example, data tooling vendors are now integrating into the ML space.

(37:55):The ML space is increasingly integrating to data and software is becoming both. And so I think it’s a matter of time before the convergence. So to answer Gustavo’s question with data engineers embedded in product engineering, cross-functional teams, this is sort of happening implicitly depending on the company you’re at. I would say this isn’t a universal claim, but the willingness of data engineers, I think to explore stuff like there’s a number of data engineers I know who’ve read the data mesh book who I think have adopted it as sort of a path forward. The friction really is when you get back to again, change management. We’ve operated in this world that we’ve in since 1989, which is the centralized data warehouse. But again, you got to understand it wasn’t always this way. It was very much the transactional operational systems where at the helm of everything then a couple people in back in the eighties, nineties spearheaded the data movement. But again, as data becomes more front and center with everything, you mentioned this in your book, you predicted this will happen with data products, the data engineer will become a data product developer, software engineers and so forth. So I think it’s just

Speaker 2 (39:10):There is a generational, I mean if you really look into the future, I think the next generation of folks that are coming, they’re in high school today. I have actually a high schooler, I think a year nine or something next door. And she’s learning python and facing with large language models. That’s what she’s learning as the first programming kind of experience. So even the experience with this next generation of kids coming out of schools and universities, what they recognize even as programs of the future, these programs are not logic driven. If else, I remember when I first learned program language, first learned logic and then learned programming. So these are not actually data-driven, not even logic driven applications. There’s not imperative applications anymore. So that change is happening is going to really happen fast. So the shape of the next task force next technologies will reshape beyond what we even discussing and imagining between you and I here.

Speaker 3 (40:20):Oh yeah, for sure. I picture a world where even the notion of data mesh is considered quite quaint and that was cute, but that was so like 2020s we’ve moved beyond that. So do you see a question? I don’t dunno if you have the,

Speaker 2 (40:35):Tom had a question there around data mesh and the trust is trust in data mesh not linked to making sure that it’s not too intrusive to existing investments that will be done in a warehouse or lakehouse and visibility through evolution. Absolutely. So I think there is a beyond trust, there is a concept when you are moving from a existing model and innovating and going to a new model and they’re called bridge of familiarity that yes, I can build this target future futuristic state that has no connection and no bridge to the reality we are in. And that’s very intrusive and people don’t trust it. It’s alien. I can’t trust this alien future that you’re showing me. So I think what happens just pragmatically is that a lot of data mesh implementations first and foremost data mesh intentionally was proposed as an approach, not as a technology.

(41:37):I could have put an architecture diagram with icons of applications and systems in my writing, but I intentionally not to do that because I wanted us to think about first principles and what impact we wanted those principles have and then adapted to the technology. So as a result of that, the very first generation data mesh became a feature of our house, a feature, a feature of the stuff that we had because that’s the best we can do and that’s not intrusive and it’s just a kind of layer smeared on top of the stuff. We have a lot of catalogs smearing a layer, calling it a database, is it really not? But it’s an incremental improvement. So I think that creates some comfort and trust that I don’t have to throw my 50 million investments out the window just because paradigm has come to exist.

Speaker 3 (42:27):And it’s something I’ve noticed in the vendor space, especially as I was walking around big data London last week and looking at a lot of the vendor exhibits. Everyone’s got a data product now, which is really just a way of interest data sharing of schemas and views and so forth. But I mean I kind of look at it as the comfort blanket, like maybe data mesh light, the sugar-free version, but it’s a way of getting there. But I think maybe, I don’t know if you use the term mesh washing once, but I think there’s the one that I think is described in your book, which is I use that as a canonical version, but then vendors are going to vendor though and now you’re a vendor. So

Speaker 2 (43:13):Yeah, that’s the dark side and I have to be very careful, be explicit which hats I’m putting on when I say something, right. So you are right, the data product is the most tangible thing that we can touch and feel and put a finger on it and it’s the most easiest thing to latch on what you had before. So it’s really easy to put a pull of metadata together and say, no, it’s a day product because it’s not just the data, it has the name of a domain or the name of the owner and just call it a data product or it’s very easier I guess to say I was doing queries and creating views and then just add another bunch of metadata to it and call it a view as a data product. So I think it is an incremental step, but fundamentally, the reason the first pillar was the first pillar, another last pillar, which was that domain ownership and that this centralized and distributed ML training, distributed analysis with ownership remains where it remains. It doesn’t matter what the technology is, it really matters. The lifecycle management is managed independently from the source by a different team and different groups with different cadence. For that to happen, we need to go beyond just a digital product.

(44:39):We need to think about how these are governed. Right now I have this big challenge with even our own technology. I don’t want to build governance. It’s really, really hard. I don’t want that be the first thing we work on, but I need to be able to integrate a solution that is fundamentally distributed to and technology agnostic technology underneath and bring governance to these data products. And that’s really hard because there is no open standard around policy in the data space. There’s tons of proprietary vendors that all they do is convert some sort of a policy declaration and push it down to tell it to a grant statement, constantly merge up and down. This is it. This is nonsense. The amount of millions of billion dollar companies that are doing something that should have been an open standard frankly. Are we doing that with, are we doing that with job to is we defined this factor that we got on without lives. So it’s ridiculous how hard it is to actually bring this concept to life. And I completely understand. You find as a vendor, you find that the quickest path to money you’ve got from investors and you’ve got to respond to them. And some of these harder problems never get addressed.

Speaker 3 (46:04):Well, governance is an interesting issue. I was having dinner last week with a very prominent person in the data governance space and he was lamenting that it’s a hard sell because the R O I isn’t always so obvious and it’s just a ton of work. If you wanted really like quagmires, that’ss a great place for you to go just run through mud all day, that’s all you’re doing and quicksand and other things like that. But it is hard. And I told him at this point, somebody had a post about maybe how large language models could help data governance. I’m like, hell at this point, I think you guys should be able to take any hail Mary, you can get to make this move along because at this point it’s hard. So I think the entire governance discussion needs to be, I think raised with everybody. I feel like this is one of the big cruxes of data mesh and not just data mesh, but the entire industry in general. There is no governance right now. And I would say for this to work, you kind of would need that. I think that’s one of the pillars of data mesh, for example, is the federated computational governance layer. But in general, it’s a lack of governance just paying hell in every area of practitioners lives right now.

Speaker 2 (47:13):And it’s not that we haven’t done this. We went from on-prem secure by perimeter. Let’s take one aspect of the governance secure. We went from building stuff in a data center, putting a firewall, big fat firewall in front of everything. I assume everything inside is secure, everything outside is enemy and insecure. It’s not secure. And we went to the world that said, oh, my one data center turned into 50 different services across three different clouds. That notion just fell apart immediately. And we said, okay, now we need a new paradigm, zero trust architecture, nothing and nowhere is secure. So let’s push security to every little touch point, every little note.

(48:01):Then very quickly out of that, we realized we need to create some standards because a person coming to access these services has identity, needs to have an identity that works across many different platforms because these services are across. We created an author, we created a whole bunch of standards, and I think that’s what’s necessary here. And you’re right that maybe there is no, people can’t see immediate value in that, but also so then maybe the money is not spent, but people see pain and people pay a lot of money for painkillers. There is still a market, but it requires a body or foundation, I guess. Melissa is on the call to keep her ears open about a foundation around decoupling the aspects of governance. Governance is a very loaded and scary word, and start with simple, start with just access control for data in a distributed and standardized way and define policy language. And I don’t know, it’s beyond my ability frankly, to pull all those people together to do it, but that’s what’s needed to happen.

Speaker 3 (49:20):Well, yeah, you have new initiatives like the EU AI Act, which is going to be voted on in December, and that’s going to be game-changing for everybody. For everybody in the audience, if you don’t know about the EU AI Act, please get educated on this. This will have a massive, massive, massive impact on everything in the universe. I think probably more than G D R to some extent considered an extension of GDP d r, but it’s your dataset will have some sort of, they’ll be regulated now. So there’s a lot of nuance. If you aren’t paying attention to this, please put it on your radar. So

Speaker 2 (49:55):Yeah, I mean I think that’s great. But we also see what happened with gdp, D P R, this sort of big kind of regulations. We did a lot of lipstick on the pick, now I have to click freaking a hundred cookie acceptance buttons before,

Speaker 3 (50:11):Oh, just in Europe. I spent half my time just clicking, reject cookies, buttons. It was great. That’s my web experience, but that’s the reality of it. It’s like when I was in Paris giving a talk earlier spring, somebody came up to me and said, you’d realize that the US innovates and the EU regulates. And I’m like, that is the most perfect way of putting this, but everyone gets a deal with the consequences of this. But that’s another thing you’re going to have to figure out with governance, right? Is like, okay, so how does this impact everything else that I was doing? But you’re absolutely right. G D P R, I remember when that first came out in 2018, May 25th, 2018 was a day. I remember all my friends who didn’t pay attention to it, they’re like, oh, crap, I guess I’ll just delete all my data that’s older than 90 days. I’m like, please don’t do that. But they did it anyway because it’s, yeah,

Speaker 2 (50:56):Recognition is one tool, but I think we need more than that to, oh

Speaker 3 (51:00):Yeah, I agree.

Speaker 2 (51:01):To create this kind of standards, the changes and data space. It doesn’t play well. The data vendors, there’s a lot of point-to-point partnership with going on for sales and entry to market, but really collaboration for creating standards. I guess we have to grow, going back to your first point around maturity of software industry, data industry right behind it. I think creating collaborative standards and open standards that happen a lot in that software space. I think that’s kind of where we need to get to in the data space as well. Put our insecurities away aside

Speaker 3 (51:39):A thousand. I agree. A thousand million percent on that one. Yeah. Awesome. And back to the governance too. I mean, you’re seeing these regulations come about because people don’t trust what’s been going on with data. That’s precisely why this is happening. It’s like you let centralized centralization run amuck with people’s data and this is what happens. You’re about to get the hammer dropped on you. So if we have time for one more question here. I know we’ve about three minutes left. You can go pick one.

Speaker 2 (52:15):There was a question around from the tooling that I feel missing in our open source community. I think we just talked about it. Open standard for defining policies that can be enforced or validated, applied to data products or data. Full stop. There you go.

Speaker 3 (52:35):There you go. I agree. Well, awesome. Well, thanks to the audience for everything. Thanks, Mac. Yeah,

Speaker 2 (52:42):Thanks. Thanks everyone on the call and on the chat. It’s wonderful.

Speaker 1 (52:47):Yeah, really great to have you both here. I just want to mention there was another survey point that echoed what you said about the small wins. So I think it very much, the data proves out what you’re seeing in practice about creating trust and change. You need, majority said, most successful approach was starting small and growing incrementally. I think that’s probably no surprise. 0% said a big bang approach is the right way to go. So get those small wins, try to get in there slow and show success. We do have a white paper cut.

Speaker 2 (53:18):Sorry, can I just say one thing? There is this expression that my previous company used, and I steal it all the time. Think big, start small, move fast. So doesn’t mean that start small, doesn’t mean you can’t have this big audacious plan and goal. It’s just how you get there is through smaller, faster moving pieces.

Speaker 1 (53:40):Definitely. Yeah, for sure. And we do try to share as many of those small wins and case studies on the data mesh learning site to show people like, yeah, you can get success. This is what R O I looks like and hopefully inspire people to start their journey. And we have a white paper coming out. It’s going to be community written. It’s summarizing the data from the survey and aggregating some best practices that we are hearing today and others from around the community. So that’ll be on our website and on the slack and all that good stuff sometime in October. Thank you both for the wonderful conversation. Fantastic to have you both on. Anyone out there, if you want to hear more conversations like this, just join the meetup group and you’ll be notified of them. And please enjoy your data mesh journey. Thank you both. Appreciate it. Have a great day. Thank you.

Speaker 2 (54:29):Bye-bye

Speaker 4 (54:30):Bye. Take care.