New Whitepaper! Getting Data Mesh Buy-in

Download Now!

Next Generation Data Mesh Governance

Data Mesh, with its decentralized/federated approach, offers new opportunities for data governance. But what needs to change to allow data governance to adapt to Data Mesh? This session with Charlotte Ledoux, author of the book “Data Governance, Where to Start,” discusses what “next generation” data mesh governance looks like.


Watch the Replay

Read the Transcript

Speaker 1: 00:00  Thank you very much for the data Mesh learning community for hosting us on this panel. Today’s discussion, as Paul mentioned, is going to be about data governance or more specifically data governance for data mesh. So let me kind of set the stage for today’s discussion. As you probably know, data mesh is being implemented in enterprises globally, but as your data mesh grows, and by the way, that’s a great place to be, the governance of your enterprise data mesh and the data that’s in it becomes a crucial consideration. So what are we as practitioners supposed to do about that? Well, one option is we can look at traditional data governance practices. Now, by the way, when I say traditional, I don’t necessarily mean anything’s bad with that, but it does imply that it serves a current need. And data mesh though with this new, I would characterize as somewhat new decentralized and federated approach, probably offers a bunch of new opportunities for data governance. So we clearly have data mesh principles that emphasize decentralized federation, but obviously should data governance be centralized or federated? And if yes, how? These are all things that we’re going to talk about today. So our hope, this group here, Charlotte, JGP and myself, hope that you as a data practitioner gain some insights that’ll help you implement your organization’s data mesh. Now, as Paul mentioned, we have two wonderful guests today, Charlotte Ladu and Jean George Perrin. I’m going to ask each of you to introduce yourself. So Charlotte, tell us what you do for a living.

Speaker 2: 01:34  Yes, hi everyone. Happy to be here again. So I’m Charlotte. I’m right now a freelance working on data governance topics, and especially right now I’m working for PANORA as data governance manager in Paris. That’s it.

Speaker 1: 01:56  Well, Charlotte, you wrote an ebook just recently. Tell us about that. I think the audience will probably benefit.

Speaker 2: 02:03  Yeah, I had a lot of questions about data governance and especially where to start because when you just got the job or just starting this topic, it can seem very complicated and very blurry and a lot of things to do. So I wrote a short ebook to help you to start your program and I tried to give some experience on my side in this ebook.

Speaker 1: 02:34  Perfect. Now a man that does not need any introduction, John George Perran, I’m still sir, going to ask you for an introduction. So tell us what you do for a living and what’s keeping you busy these days, jj? Oh gosh,

Speaker 3: 02:46  I don’t even know what I’m doing for a living these days, but what keeping me busy is I’m going to announce that very soon so I can actually give a little bit of a premier here. I’m the chief innovation officer at a startup company called ABI Data. So that’s the coolest job in the world, right? Because you get to play with a lot of innovation and define future standards and things like that, which tie a little bit to what I’m doing on the side, which is I’m the chair of the technical steering committee of a Linux Foundation project called BE o and Be O is about creating standards for modern data engineering, like data contracts or data products or whatever. So of course this ties a lot of course with data governance because it’s all we are living into this interconnected world, and that’s why I was super happy that you invited me on this panel as well to discuss this thing. And I’m writing a book with you, if you forgot, called Implementing Data Mesh. I’m the slow writer of the group, so if we miss the data for the shelf publication, it’s on me, but my chapters have morphed them.

Speaker 1: 04:14  Fair enough. JGP, just real quickly, as for me, my name’s Eric Rod. I run a small boutique consulting company in Toronto, Canada. We focus on data mesh and generative ai. We really focus on financial services and payments firms, although we’ve done work in retail in a variety of others. Our mission really is quite simple. We want to accelerate a firm’s data mesh generative AI journey. I’ve worked with international nonprofits also to in the climate space where I’m leading a team to build their climate data catalog and a global geospatial data mesh. It’s a lot of words for trying to keep track of a lot of climate data, and as JGP mentioned, I’m having a wonderful time writing a book with my buddy. It’s coming out in I think September if the stars align, I suppose, and ready on the shelf in time for that very important Christmas buying period.

05:16  So anyway, without any further ado, on an administrative note, each of the panel members here are talking on their own behalf and not for their employers or clients. So we get to hear the unvarnished truth. That’s a wonderful thing. Now for the audience, we encourage you to submit questions on the stream and on chat, and we will try and address some of that opportune times in the discussion. So let’s get started. And we’re going to start with the most obvious question, and I’ll probably start with Charlotte on this one. So you’ve written the ebook on data governance. What is data governance and what are these data governance practices just to set the level set for everybody?

Speaker 2: 05:56  So I like to start with an analogy because otherwise it’s just people are like data governance. They look at you and they don’t really know what this means and it seems complicated. So let’s start with an analogy for everyone. So for me, data governance is like, so basically it’s like you have a house and it’s quite messy. You need to renovate your house either to set it or to live inside this house. So what you’ll do, you will check what’s in this house. So you will find all items. So that’s the discovery part of what’s in the house. Then you will try to think about what you are going to keep in the house, what you’re going to sell. So basically this is defining the policies for the different items of the house. And then you can do some categories as well for your items, what’s in the kitchen, what’s in the living room. So this is basically the metadata and with the different categories. And then you can go along very far with this analogy. You can say, okay, which type of items can I rely on? Which ones are safe? So this is also defining some kind of certification on the quality of your items. So yeah, I mean overall data governance is here to help you to live in a space where everything is in order correctly and everyone can live together in a good way in the house.

Speaker 1: 07:44  I love the analogy on the house. It actually resonates very well. I’m going to use that by the way, with some of my clients. But Charlotte, now that we’ve kind of level set on the definition, what typical challenges do big organizations experience with data governance?

Speaker 2: 08:01  So yeah, you’re going to face many challenges starting with defining your framework. So what would be the main pillars you want to tackle and how? So mostly what we find, usually it’s about defining some roles and responsibilities around data in the organization. You’re going to have another topic about defining the policies. Guidelines on how to manage data. You will have the process around data between the different entities who’s doing what as well. And also you’re going to have the tech enablers part, which is more about the different tooling you want to use to manage data to put it in the right quality level. So that’s mostly the main things you’re going to face. And then each pillar is going to have its own challenges. But overall I would say this framework is here to serve only one goal, which is how do you make data a valuable asset for the organization?

Speaker 1: 09:16  Love it. I love it. So JGP over to you. Now, in my experience, I’ve seen many centralized data governance teams. Is that a common practice? You were at many large enterprises, PayPal included in your past. Do you typically see centralized data governance teams?

Speaker 3: 09:42  My experience, it’s either they have a data governance team and it’s centralized, or they don’t have a data governance team. That’s really what I’ve seen. Okay. So I think it’s by definition it’s this enterprise block at the enterprise level of the company. So it’s kind of a shared resources and your data steward are there, your data curator might be associated to that and all the leadership at this level. So it’s mostly what I’ve seen.

Speaker 1: 10:21  Charlotte, you’ve been around many, many organizations talking about data governance, helping them with data governance. JGP seen centralized. I’ve seen centralized. Is that the common practice with your experience?

Speaker 2: 10:36  Yeah, I mean usually it starts, so most of the organizations starts with centralized team, but right now where I am, it’s a federated type of organization, which is really cool, but really hard as well.

Speaker 1: 10:56  Okay, so that’s the first time I’ve actually heard somebody that has actually worked in a federated data governance world. But lemme come back to the centralized piece and then I want to talk about Charlotte, what works better in a federated model. But let’s talk JDP back to you, your experience with a centralized model. What works well and what maybe doesn’t work well in that model?

Speaker 3: 11:27  It really depends on the company. Okay, so what I’ve seen not working well is a tool first approach. You buy a big vendors tool, okay, I’m not saying one, but some tools from Belgium for example, and you hope that there’s going to be adoption and that doesn’t work. I’ve not experienced it working. What I’ve seen working is when there’s a lot of education, a lot of explanation to do to the teams that why data governance is good. Most of the data engineers and data users see data governance as a burden. So you’ve got to change this mindset. So when you’ve got a good leader or a great leader that can actually turn this into showing the benefit and showing that it’s everybody’s responsibility, then in this situation you may have a win. I don’t think it’s sufficient, but you may have a win.

Speaker 1: 12:38  So Charlotte, now when you think about your federated model, like I said, you’re one of the few folks that I’ve heard actually in a federated data governance model. Why did they go, why did they end up in that place? I presume they’re centralized at some point, but why did they end up in that federated place and is it working well and what are things maybe in that situation that are not working?

Speaker 2: 12:59  Yeah, so it’s also because for this example, it’s a big company. So you have some subsidiaries in many countries. It’s an international company. So basically it’s really hard to, if you do the central version, it’s super hard to manage all the countries from a central point because you’re going to face some cultural differences. Some markets specificities that are very local. It’s basically impossible to manage everything centrally unless you’re a superhero, which I’m not. So what they did is we at the central team are here to define some guidelines, some framework that should be common to all countries. But then what we are here for is also to onboard and to upskill people who are then working locally in the different markets. So I spend a lot of time hiring some profiles for the different markets, onboarding them, upskilling them so they can deploy the framework that we built centrally so they can then deploy it locally in each market.

Speaker 1: 14:19  Okay, perfect. So hearing what both of you have said, I’m going to try and dramatically simplify this, but there’s probably two major capabilities, probably more, but I’ll break it down into two major capabilities for data governance and one is policy definition and then policy enforcement. So what I like to call policy and policing, who does the policy work and who does the policing work? So JGP, let’s start with your centralized model. Who does the policy and who does the policing?

Speaker 3: 15:00  What I’ve seen is that both are done by the central team. Okay. No, yeah, it’s why I believe a lot in the federated model. Okay. So I’ve never seen practice. That’s where Charlotte is definitely a leader, but I think that the policy should really be designed and that’s actually, I’ve seen that, but it was not a completely a federated model where everybody shares a responsibility. But where when the data steward and the data governance team is actually having a really serious discussion with the team, as Charlotte was saying, okay, I’ve seen some, you’ve one of the driver for data governance is regulations. So it’s in Europe where shallow is, you’ve got 27 countries and there’s different players of this regulation like GDPR, the European level for example. In the US we’ve got a lot of different regulation as well. But when it comes to financial for example, what I’ve seen is that you don’t have the same rules, let’s say in the US or in Australia. And if you’re not aligning that, if you’re not involving your local teams, that’s failure. When you’re thinking about healthcare, depending on the population and the type of work you’re doing, you don’t have the same policy as well. That’s also where you need to involve different teams to have different policies that adapt to the different use cases. So it’s not straightforward, but if you’ve got a central thing, you’re going to just make enemies

Speaker 1: 16:59  Now. So Charlotte who does obviously I think you said the policy development is done centralized. How do they handle the policy enforcement or the policing in your federated model?

Speaker 2: 17:12  Yeah, that’s where it can get hard. So basically the local teams, they receive the policy from the central team and then they need to enforce it locally, but they need some resources to do that. Only one person locally cannot enforce policies in every systems in every business area. So the idea is that also the central team is here with this convincing part where you need to convince the top management locally that they need to recruit some resources that could be data engineers, data quality analysts to enforce these policies. So this is the tricky part is that you need to do this exercise like you will present something to a VC to convince him it’s more or less the same. You need to convince them that they need more resources and they need to be on board. So that’s also our role as central team to make sure that locally they get the sponsorship and the required resources to do the policy.

Speaker 1: 18:45  Okay, perfect. Now I think at this point we’ve probably set the stage for how data governance is done today. Typically centralized maybe in, I’ll call it federated light Charlotte, where you have the regional, regional groups actually taking care of some data governance capabilities. And what we found in each of these is the policy and the policing kind of blend together. Now I want to shift now into what does this mean for data mesh? So let’s dig into data mesh a little bit. We know the principles, there’s data products are the quantum with you, you will in a data mesh, in a data mesh effectively provides the way for these data quantums these data products to actually interact. But every data product has a boundary. Every data product has an owner. It’s typically self-serve, and it has this thing called federated governance, which really means there’s a bunch of stuff, you can read it in the xx wonderful book, but effectively the federated governance says something along the lines that the data product owner has a pretty significant responsibility for governing their data product. So how in a data mesh world with our data products and data owners, how do we apply these data mesh principles to data governance? Specifically, who does the policy stuff and who does the policing? So JDP, let’s start this time with you.

Speaker 3: 20:10  That’s an easy question. Sure. So my experience, okay, and we unfortunately could, when you build data mesh and deploy it, you’ve got to set some priorities. Okay? So of course what I often say is the four principles that jam gave needs to move along the same way or similar way. You cannot just say, well, I’m not going to do the federated computational governance because it’s too complicated. You’ve got to go with it, you’ve got to bring some of it. And that’s where when I would say our most successful implementation of a data mesh was we brought it with us, but we didn’t finalize up to what we wanted to be doing. But what the central team, okay, because you still have this central team, even when you’re in a federated topology, you want the central team, the central data governance team was one of the easiest to convince to be on board of the project because they see that they cannot scale.

21:32  The thing is, okay, am I going to have 2000 data steward to make sure that everything is in place, or am I going to make sure that this is part of the job that a data engineer and the data scientist, it’s part of his regular tasks. So that way it became a little bit of an easy sell for them to say, Hey, we are offsetting this burden from you to the data engineers and the data scientists that were consuming our data products. So who was doing the policy and who were doing the policies? The policies at this stage were still being done at the central level. The policy however, was not completely switched to the people, but where it was getting to what Jamma is describing as policy, as code where the tool is in charge of the policy. So in a way you design your policies and the policy is being done in a federated way at the data quantum level and report it back both parties to the central data governance team and to the owners of the data product slash data quantum. That was where we were going.

Speaker 1: 23:08  So that absolutely resonates well with me. But if I were to paraphrase, there’s a role for the central organization because they have a window into the myriad of rules regulations across the entire business landscape independent of regions, and that’s their job. They pay attention to the GDPR in the eu. They also pay attention to financial regulations in the United States. They have to have that global view and create the policies and standards that are required for the enterprise. But I think the thing that you mentioned is data mesh has a very strong opinion on how we can police, how we can actually enforce those policies. And it actually says in line with the principles that it’s the data product owners responsibility and maybe it’s the data engineers, the data scientists or even governance specialists who knows on their data product team that actually has that. So Charlotte, over to you. Is this something that resonates to you as a data governance specialist? Is this something as JGP mentioned, a centralized governance team would like this because we shift the burden to those that are closest to the actual data perhaps what’s your thoughts on that?

Speaker 2: 24:26  Yeah, very good point to make sure progressively the domain teams are more autonomous and that they can carry this enforcement of policies for sure. I think at first, I mean in traditional data governance, what we used to do is to define data domains. So basically we had data domains, like we had product domain, customer domain, this kind of things that were defined to manage all the product data of the company, all the customer data. It sounds like, I mean these domains, they are most of the time focusing on master data. But I think these domains are super useful because they need after to serve the different use cases, which are basically the data products. So it’s interesting to have been able to define this kind of domains as well as the data product domains. I don’t know if I’m very clear here, but it’s important because in the data domains you can have indeed some very specific roles like data manager who can make sure that really close to the source of the data, we enforce the right checks, the right policies, while when you are on the data product domain and on the data product owner, you’re a bit further in the lifecycle of the use case.

26:12  And of course it’s going to have as well a role and he must be aware and upskilled as well on the goals and on what he needs to check regarding data governance. But I think the roles needs to be different between the data domains that are close to the sources versus the data product domain.

Speaker 1: 26:37  So I want to ask a follow on question, but I’m going to give you set the stage here. Now, may a little bit of a data governance heretic at this. So my apologies upfront, but when I see the data governance organization structure, I see words like data steward, data manager, and then when I see a great word in there, the data owner, but it’s typically somebody in the business perhaps and doesn’t have an understanding of the technology landscape. What ends up happening is I find that first off from the outside looking in, although I do know a fair amount about data governance, but just as an outside observer it sounds complicated and as an outside observer, but sometimes even inside the four walls practitioner working with the data governance, it’s hard to figure out where the buck stops, who actually gets to make the decisions.

27:33  Now here’s the question in data mesh and in particular with data products in a data mesh, the I to give it a trite phrase, the data owner is the king or queen, they make the decisions, the buck stops there, they own their domain, and ownership means they everything from the roadmap, the strategy, the evolution, the funding, what it does and what it doesn’t. The scope, the boundary, the data owner reigns supreme. So what does that mean? How does a traditional governance organization where we have a lot of these rules or sorry roles and we translate that into the data mesh world, the data where effectively there’s one owner for data governance, one owner for the data product, they’re one and the same and they may delegate it, but it’s delegated locally within the data product. How does that resonate to you? Strikes me the data mesh is onto something. Those principles have let us get to the point where we actually recognize the importance and supremacy, if you will, of that data product owner including governance thoughts.

Speaker 3: 28:45  I think I should take this one. Okay, go

Speaker 1: 28:47  For it. jj V,

Speaker 3: 28:49  You’ve invited. So I, I’m French and American, so I think I can actually speak about that in a very perfect similarities as well. So first you invited two French people to your panel. We have a long tradition of not getting along very well with Queens and kings.

29:17  So the thing is, it’s probably not the analogy you want to bring here, but the second thing is that that’s why the American part of me says, look at how the US is, okay, well okay, let’s not look at the politics, but let’s look at, because I don’t go to politics, but the thing is, let’s look at the structure of the United States. It’s the United States of America. So at the bottom level in a way or at a level, you’ve got the states and the state is focusing on its priorities and its policies are really focused on the need of the state itself and the citizens of the states or the commonwealth in some situation in the US where the federal level is looking at the entire country and looking at that saying, okay, we keep border protection, we keep army, we keep things like that. And I think that’s where you can see where the value is when you’re thinking in a federated way.

30:36  It doesn’t mean that one is the slave of the other is they work together. So the thing is the state is defining, looking at Florida for example, one of the priorities is that the waters raising Miami is sinking and we see some catastrophes because they have different needs than in Louisiana where they’re being hit by. And the climate thing pretty well, Eric. So they’ve been hit by hurricane after hurricane after hurricane where I am in upstate New York where we can have bad weather. So for example, we had electricity cut a few times this winter because we had storms, but it’s not the same need. So the phases of the state when it comes to let’s say climate change is going to be different. So same domain but different ways of tackling and different priorities. And I think this is a very strong thing because if you had a guy in DC for example, saying, okay, well I want to allocate some money to climate change and I’m going to do it from dc I don’t know what’s a consequence in California.

31:52  I don’t know what the consequence in Florida, I don’t know the consequence in upstate New York. So you’ve got to work in this federated way to gather the information to make more educated decision. A concrete example, and Austin was asking a question about a policy, an example of policy data retention in the us I think financial data have a retention period of seven years. It’s 10 years in Australia. So if you’ve got same example, if you’ve got a data governance person in the US in charge, we’ll say, oh, okay, it’s going to be seven years worldwide. And then you get into problems when you go local because you’re not following the local thing. And that’s going back to the examples that Charlotte was saying about the local regulation, but it could be also by business.

Speaker 1: 32:47  No, the analogy is excellent. So let’s just use the USA as the example, but I think it applies just as well to each and every country that we all live in. But what you have is a federated group, the government, the USA in this case that manages and defines policies for global concerns or national concerns. And then the states have policies for local concerns within their boundaries, but they also have to implement the national concerns, whether it’s climate oriented tax policy, whatever the pollution policy, whatever the case may be, which actually is a fantastic analogy for I think how data mesh would probably view the world. So if I were to generalize completely the enterprise, again the analogy to the us, the enterprise has to define the policies that are applicable to everybody, privacy standards, whatever the case may be, the data product, which is the equivalent of the states in J’S analogy, they are responsible for implementing the global policies and also defining the policies that are unique and specific to their needs. So that to me resonates very well. Oh, sorry, JGP,

Speaker 3: 34:07  Just as an analogy for me where you’re saying that the state is a data product, I don’t fully agree on that part. The data product could be the same across the 50 states, but the policies that are being applied to the data product could make it different. For example, you’ve got a patient and you’ve got all the records for a patient in your data product, this is going to be the same data product in all the different states. But in California you will have a little layering of CCPA that will change a little bit this dealing, but intrinsically for me it’s still the same data product.

Speaker 1: 35:00  No, I think that’s a very good distinction. So I dunno if the analogy starts to break down, but I get the fact that there’s some common data products that apply to everybody and should be accessible and used. So I want to talk a little bit about the actual responsibilities of this data product owner, whether they’re the federated version or the data product, the state version if you will. So Charlotte data mesh emphasized among other things, data mesh brings agile practices to data. But importantly data mesh emphasizes local autonomy and they emphasize the local autonomy of the data product and their own data product owner. So when we look at governance, what are the responsibilities that the data product owner actually has?

Speaker 2: 35:54  Yeah, so I think it’s in a federated model. Again, you can imagine that the different data product owners can have some specific committees and moments to discuss the common policies that are proposed by the central team. So they need to make sure they’re aligned with that validate and then guarantee the fact that it is actually these policies are actually respected within their data product. So to me, that’s their number one role. Also, what I see in this role would be some kind of a coaching role as well in the fact that this person must be quite, I mean driving adoption around data governance around why we need to take care of data. It’s important when you entry manually manual stuff to not mess it up. So I think it’s also his role to make sure that within his team, everyone is on board aware and adopts the data governance mindsets.

Speaker 1: 37:39  Okay, perfect. So we have some questions from the audience. I think this is a perfect time to interject. So one, I’ll start with one that came from Paul, actually this one I’m going to put to JGP. So for obvious reasons, JGP with the BITTEL project, the open data contract, standard data contracts obviously play a role in data governance, but for the audience, tell us how they overlap and how do they mix and match.

Speaker 3: 38:12  So obviously we can do data governance without data contracts. So that’s a given. What value will it add to having data contracts? Is that data contracts have a wealth of information in them? You’ve got obviously people think, oh, there’s a schema, but there’s also all the SLAs, all the definition. One of the thing we have in the ODCS, the open data contract standard is what we call authoritative links, which are actually linking fields or tables or any entity to their definition in an external repository like Collibra for example, or GitHub. So you’ve got this richness of information that can be used, for example, to define lineage. So when you’ve got all this information in one place, if you’d consider this as your source of metadata, of your source of truth for metadata, then you can leverage this port into your data governance as an asset. And as for me, the responsibility of the data product owner is to keep the data contract done. This is one feed of information that is going towards your data governance or your enterprise data governance. So it is kind the vehicle. I would say it’s almost a preferred vehicle for sharing data governance information within the enterprise.

Speaker 1: 39:57  Perfect. I love it. I’m going to come back to data contracts in a minute, but I want to take a question from Robert Pat Tini, hopefully I got the name right. My apologies that I messed it up. So this is a question from Robert. Apart from regulation, from your experience, how do you sell data governance? And in particular, how do you sell data mesh governance? Charlotte, tell us your perspective and then JGP.

Speaker 2: 40:20  Very good one. So yeah, you need to find the drivers, and you’re right, a lot of the time it’s about regulation and compliance because it’s a easy way to scare the top management when you say, okay, you risk to pay millions if you don’t comply. So that’s an easy one for sure. But apart from that, you can find some stories around some AI project or data analytics projects who are went really bad because of bad data quality, for example. So it can be a failure or people are just spending a huge amount of time on just data cleaning because nothing is going well from the data collection to the data preparation. So that’s also another way. And then I would say maybe this one is more about efficiency. You want to be more efficient operationally to perform some projects, but also another way, another driver that you could take is the one from the business decisions and meaning you could collect some needs, they might have to take better decisions, they might need to gather an external source of data or that would be super useful for them in a dashboard to take some decisions.

42:06  And that could be also a good driver for data governance because you’re going to have the business supporting it from day one.

Speaker 1: 42:14  Yeah, I like that answer. So to paraphrase, you can do data governance, sell it after the fact, after you’ve been in the news for some data security gap that typically comes with a multi-billion dollar market cap loss and typically the resignation of several executives. So that’s one way, not the right way, perhaps The other way is we can be proactive. So things that I heard is we sell it based off of the additional quality to JG P’S point around contracts. And as we all know, if you’re going anywhere near generative ai, data quality is the precursor to getting good outputs from your large language models. But I would say the simple truth of the matter is if you’re proactive and inject data quality into the process, then your BI reports that communicate your sales, I can’t remember, I think it was one of the big car driving apps that had a very big, they stated their earnings incorrectly, they were off by a half billion dollars I think it was, which caused quite an uptick in their market cap until actually the CEO came out and corrected themselves. So you can inject, there is ways to inject data quality into the process all the way through your BI capability, all the way through your reporting, et cetera. So I would suggest that with the data mesh approach, making it a part of the accountability for the data product owner, I would suggest that we could probably do it for significantly less cost. Charlotte, am I dreaming on that or do you think there’s even an opportunity with the data, product alignment and data mesh to actually reduce the cost of data governance and increase its efficacy?

Speaker 2: 44:07  Oh, you mean the data products and the data measure approach

Speaker 1: 44:11  By embedding this, so-called steward the managers and the owner into an empowered owner with a clear boundary, et cetera into the data product. My proposition is I think data governance can become much more effective and efficient. Am I dreaming?

Speaker 2: 44:27  Yeah, it reminds me, I think it was a few days ago, last week I did a post on LinkedIn saying that data governance teams will disappear.

Speaker 1: 44:41  I guess how that went over, but tell us what happened. Tell us what happened.

Speaker 2: 44:46  There was kind of a huge debate in the comments, but it was interesting actually because what I was saying is if we do our job, the different teams will be completely autonomous. There will be some committees where they could gather, make sure they’re still aligned on the common policies and standards and then it should work. No, but lots of people were actually disagreeing and saying that we will still need a central team and some kind of a police or to perform some audits to make sure that everything is respected. But I mean to me at least when you think indeed about just the fair rated model with the central team, it should decrease costs indeed, because you don’t need this huge central team to manage everything. You give the skills to the people within each team and each domain. So yeah, it should decrease the cost, but I haven’t calculated it in my, okay, no problem. Current customer,

Speaker 1: 46:12  I want to come back to, there’s a question from Austin Kranz, and I want to focus on one particular aspect to it, but there’s two questions, but the second one was many of these, so-called processes should be automated and then they require behavioral change. So that’s the question I want to answer, but I want to answer it from you. I’m going to set the stage though for an approach that I think actually resonates very well for me around this automation aspect. And I want to give you an example of a governance process that works extremely well and every single one of us, just as a result of our daily lives probably feel the benefits of that governance process. And it’s very simple. In the US it’s called ansi, used to be the American Standards Organization, now it’s the American National Standards Institute, I think. But anyway, in Canada we have the Canadian Standards Organization in Europe, we have the European community for standardization.

47:14  They do, one thing is they manage a process by which you get a product certified. So when you look at your toaster in Canada, anyway, I don’t know about the US and the eu, but there’s a little thing that says CSA, and if it says CSA on it means it’s gone through rigorous product testing and for a toaster anyway, you’re guaranteed that it’s not going to burn your house down. When you’re trying to do some toast, what they do is they turn governance on its head, at least centralized governance on its head, and they actually adopt an approach that actually probably would work very well for data products. What they do is they, for any given product, you request certification. There’s a group, a central group that says these are the relevant regulatory bodies and here’s the test criteria require that you need to go through to require certification.

48:09  But once you do, which is your entire obligation, and you prove that you’re certified, you get to bear that brand and immediately the consumers of that product, the toaster, have confidence that it’s going to work. In other words, it engenders trust. And ANSI is a very lean organization. The EU organization is similarly lean as is the Canadian standards organization extremely lean. There’s no product stewards out there. Everything goes back to the product owner, the company that owns the product. So here’s the question. With respect to Austin’s automation, if everything is within the scope and power of the data product owner and we have data contracts, should we not be able to have an automated mechanism of checking the data against the data contracts and actually publishing our certification status of a data product? In other words, the policing goes away and now we’re doing certification. If you think about it, there is no police except when somebody violates the law. There’s no police aspect to the EU organization, the ANSI organization or the Canadian Standards organization. It is a certification exercise start with JDP. Does that resonate with you? Can that be automated and if so, would it make a difference?

Speaker 3: 49:36  I think your vision is a good one where I think it would, so for example, you’re taking about CSA or UL for electric things here in the us. So a lot of these things are declaration based. There’s a follows the statements that as a requirement that this organization has addicted, but those are the minimum sets of requirements. Like in a poor extender, you should not connect the two things or the wire should be separated or they should be grounded or whatever. We as an industry have set those minimum standards. Right. So is it the future role of an enterprise data governance team to set this minimum policies in a way? Maybe Is it something that governments are doing? Yes. Okay. When you look at G-D-P-R-C, CPA, HIPAA and others, okay, so are we able do the organization where they do as well as they publish this thing into pretty boring English documents, but are we going to be able to transform this HIPAA policies as code and just take them and incorporate them directly in your data product or part of your data contract? Okay. Your data contract could reference these policies as code. This would be kind of idea. This would be great to be honest. Okay, but we are not there yet.

Speaker 1: 51:32  No, absolutely. Charlotte, I know you have to run so you can break, but any thoughts on what JGP mentioned or this idea before you? I know you have to leave a little early.

Speaker 2: 51:42  No, I agree with your point of view, but I think within an organization you could figure out a way to certify your data products. I recommend that you check out the Midas certification of Airbnb, which is a certification they have put in place regarding some standards on data quality, on data stewardship, on each of the products that are available in their data portal, which is the discovery tool of all the data sets and data products. I think it’s super interesting and I guess the engineering team probably who did this certification process and now people want to get their data sets certified because when they do a research on the data portal, it gives a score and a level on the certification, and so you really want your product to be certified. So I thought that it is really interesting what they did.

Speaker 1: 53:05  Awesome. Now Cheryl, I know you have to leave, so wish you well, jg I do. So I want to continue the vision a little bit. So I’m going to use the state better. USA analogy again. So if you have a policy that you wish to maintain within the state, maintain within the data product. In other words, you don’t need to share your data product with anybody. You can make whatever rules and regulations you want, but if you want to share, so this is extending the discussion we just had a moment ago. If I wanted to actually share my data and make it reusable, okay, then and only then do I need to go through certification. In other words, if you don’t want to share your data, go lightweight, go whatever way you want. But if you want to share your data, that’s where the certification comes in. So best of both worlds, if you need to share data, there should be an explicit level of trust and confidence in that data. Thoughts before we close for? Yeah,

Speaker 3: 54:04  No, I think one thing we need to keep in mind as well is that we are in an iterative process. The whole idea of data mesh is to bring data product and data product and product thinking to data engineering. I think that’s one of the big factor I think or eventually the biggest factor. So when you’re thinking about that, it say, Hey, I can start dirty. Okay, I can start. I just want to throw a few things together and then I’ve got my data product and then later, oh, I realize that, oh, this is actually using healthcare data that is leaving the country that used to contain PI or PHI information. Then the thing is I can add those policies on top of it and maybe at some point reach a certification that is either government based or internal or enterprise government based that could say, Hey, I’ve got this thing, but I am with you Eric. The only thing I would add is think about this iterative process. You can start dirty and you can have a version one that is MVP of your data product. Then you’ve got your version two and your version five is a certified one, like the state of the art that there was one flashy with all the champagnes that goes with it.

55:30  But yeah, I think you’re pretty visionary in a lot of ways and I think this is a visionary version of it. Yeah, definitely. Perfect.

Speaker 1: 55:39  And by the way, you’ll see some of that in the book. A little quick plug. So we are done our time. We have actually one minute for me to wrap up. So here’s a few things that I would suggest is we did not get to all the questions from the audience. Audience, please go to our Slack channel, the data mesh learning community Slack, put the stuff there, I will answer them. The data mesh learning MVPs will find a way to, but there’s also an 8,000 member community that can answer your questions. So please do that. Now first off, I want to say thank you to the audience. We are at time. I’m hoping that you got a little bit of understanding what traditional data governance is, and you saw that the data mesh principles are not just words on a page. They’re actually implementable. When you combine them with things like data contracts or a certification perspective, all of a sudden magic can happen.

56:37  I think you can start to see that as JGP mentioned, we bring agile to data engineering and make things faster, better, cheaper, if you will. And data governance is one of those things that we can have a distinct impact with our principles and applying them and the way that we mentioned around the whole idea around data products and the capability they’re embedded within that there’s a huge opportunity for data governance. I’m hoping that’s what you got out of this discussion. And once again, thank you very much for your time and participation. And with that, we’ll call it a day by now. Yeah,

Speaker 3: 57:13  Don’t forget, you can listen to more of Eric on myself. And I have a fireside chat with AK next week for AI on Pie Day. Okay. And Eric has two great talks, AI on pie day, 24 hours of AI on data organized by a nonprofit. I’m a part of the board. So yeah, if you’re not fed up with us, see you next week.

Speaker 1: 57:37  Bye now.

Data Mesh Learning Community Resources


Ways to Participate

Check out our Meetup page to catch an upcoming event. Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community Resources