Engineering a Data Mesh Platform for an Ecosystem Across Organisations

Jun 21, 2024 by Paul Au

Tom De Wolf is an experienced hands-on architect and serves as the innovation lead, spearheading new innovative initiatives for future ACA Group projects and offerings. His expertise lies in cloud native platform engineering and ‘Data Mesh’ platform architectures, leveraging his background in software engineering and experience in designing various architectures, including software, microservices, data platforms, and IoT solutions. Tom draws inspiration from his knowledge of evolutionary architectures, domain-driven design, team topologies, platform thinking, platform engineering, and cloud-native technologies, among others. In addition, Tom is the founder and host of the Data Mesh Belgium meetups. His academic background as a Ph.D. holder in Computer Science has taken him around the globe, where he has presented at numerous international conferences.

ACA Group has implemented Data Mesh Platform within the Flemish Cultural Sector in collaboration with the Flemish Department of Culture, Youth and Media (CJM) and publiq. Instead of supporting a single organization, the challenge here was to support a whole eco-system of teams across many organisations within the cultural sector. In this talk we explained why a Data Mesh solution was fit-for-purpose for enabling a data-driven digital transformation in the sector. As an example, we dove into some meshes of data products and data use cases. On the technical side, a cloud-native platform engineering architecture was envisioned, designed, implemented and evangelised. After the talk, the audience will have insight into this real-world data mesh case and the architecture behind it. This Data Mesh Platform received the Flanders Digital Award 2023 for Best Digital Transformation Project.

Speakers:

Tom De Wolf – Senior Architect and Innovation Lead, ACA Group

Watch the Replay

Read the Transcript

Speaker 1: 00:00 Thanks everybody and welcome to this online webinar. I’m indeed going to talk about how we at a CA group engineered a data mesh platform not only for one organization but for a whole ecosystem across organizations. More specifically how we enable data products in the Flemish cultural sector. And this was done together with the government, more specifically the Department of Culture, youth and Media of Flanders and with one of the major players within that sector called public. I’m Tom Wolf architect and innovation lead at a CA group I specializing in data mesh expertise and doing data mesh consulting and projects for number of teams and sectors. First of all, I want to answer the question why a data mesh is fit for purpose in a cultural sector and the main goal of the government is actually to enable a data-driven digital transformation of that cultural sector.

01:08 But there’s already a lot of data that is scattered everywhere but actually difficult to access and combine. There are many organizations with each of them having a number of source systems where the data is at. There are also various types of data on cultural activities on infrastructure for cultural such as theaters on the performance that act in those theater performances, ticketing information or archiving information from museum collections or libraries and so on. And at the same time there’s limited knowledge and tooling available to do data sharing and reuse of that data and the analysis of that data. So that’s a missed opportunity and that’s where data mesh enters to provide those tools and enable and stimulate that sharing of data as a product access to it and to reuse the data for new and invigorating use cases. Now why data manage then? Well, when you talk about the whole sector then there is really a need for decentralization.

02:20 The picture you see here on the screen is actually a data product landscape, well actually an IT landscape that we did in the beginning of the project where we sat down with a number of stakeholders within the sector and for each of them identified the source systems they currently have and went through it to also identify what potential data products would be. And on this picture, the source systems are depicted as boxes and the data products as hexagons and all the colors you see here actually define different domains and different organizations, so actually different ownership of those applications and data products. So what you can conclude from this is that there is a scalability challenge all over this and you have the scale of data sources that are available scale of data use cases, but they’re all a bit stuck because the access to data is difficult and there is certainly a scale of different organizations that can provide the data or want to use that data.

03:28 And what we do want to avoid is to put some central data team into the sector and ask them to do all the data work because that will certainly end up with that typical bottleneck we want to avoid. And what we also want to avoid is that every organization needs to build up their own data infrastructure because that would otherwise skyrocket the total cost of ownership within the sector and that we want to keep lower by having a shared data mesh platform. All this to foster an open ecosystem across these organizations and stimulate the data exchange and innovation on that data to end up with a more data-driven cultural sector. To explain this a bit more specifically to that cultural sector, I want to show you a video that was created by public, the partner in this project and that is going to show you, explain you a bit in a fun way what the need for a data exchange platform is.

Speaker 2: 04:33 It’s possible to look at the cultural sector as one big family, a family that loves beautiful, interesting life enriching experiences and loves the audience. The audience is the cake at family gatherings. But what happens if there’s not enough cake to go around? Everyone focuses on getting the biggest piece possible. There are other approaches we could put our energy into making the entire cake bigger so there will be enough for everyone. This sounds simple. It is the cultural sector has been building an exchange platform. This platform will help us all increase the total participation at our cultural events, but it will only work if we all share out data. An example, audience insight. Every cultural center knows its viewing public, but why does this viewing public attend events and who are the people that don’t come? Why do they stay away? By sharing data, we will have much broader information and clearer answers.

05:35 For example, we will be able to see which segments of the public are not being served in our regions, where our greatest chances for growth are and we will then be able to benchmark you name it. And so you get real insights into how to adapt your program or your marketing to grow and make the cake a little bit bigger. Another example, interaction with the public. Right now the cultural sector is split into silos. Every cultural center reaches its own clients by its own channels. The digital world is moving in the opposite direction. Providers are coming together on a giant platform where their clients know they will find something they like. It’s just like where we go to book flights and hotels. If we share our data, we could make a hundred percent user-friendly, one-stop shop for the cultural sector, one that knows what you like and where you will be able to book your tickets directly so that you more frequently choose not to stay at home watching Netflix. This is how we make our cake bigger.

06:34 Final example, simplified administration. We now spend hours on end filling in one report for one department and a different report for the next following the stipulations of this department and then the different stipulations for that one. If we all share our data, it would be possible to automate a lot of these tasks. The result all this time would become free to work on the real work, the cultural sector would become more attractive and our cake would once more grow. Sharing data on an exchange platform means reaching a bigger audience, having a larger cake to share and that means a happier cultural family.

Speaker 1: 07:17 So I think with that and what I told you previously, it’s clear why a data exchange platform and a data measure approach is fit for purpose in this sector. I also want to dive into a bit the added value on a business level then for the cultural sector by showing you some example use cases or IDs for example, use cases. So we’re talking about an exchange platform. Lings platform is the Dutch translation of exchange platform and an exchange platform with all those data products within them. And there are different types of organizations that can share and reuse that data. We have the producers of cultural activities or the organizers and also the venue operators that want to reuse and share data themselves that want to benefit from that data exchange. By having new and better cultural experiences, you have the service providers such as for example, ticketing vendors that have data that’s interesting to analyze talking about analysis.

08:25 There are the research and education institutions that are really searching for data on cultural participation to do their research and of course there are the governments that want to align their policy based on the data that’s available so that they can better optimize the policy towards what the cultural sector needs. And in general there are actually those data product developers that want to share the data product and the data product consumers that want to reuse it and all these for some use case pillars that were also mentioned in the video one to improve the insights in what the audience is. Another one to improve the interaction with the audience, to simplify the administration and also to stimulate new and innovative services for that cultural sector.

09:21 For audience interactions, it’s about making data more available within the flow of finding cultural activities. For example, if ticket availability based on ticketing systems can be made more available and shared, that would add value. Also working with recommendations based on preferences and previously visited events. Those suggestions can pop up in the already existing platforms but it can also be used for marketing automation. Another example is where policy makers can have or need an overview and insights in where all the different cultural infrastructures are located, where are the theaters, where are the museums, the libraries, the camping sites for institutions and so on. And they can use that to find for example, gaps in certain areas and to align their policy to maybe invest a bit more in those areas of the country. Another one is a tool for audience insights. So for example, if a museum would have insights in where the visitors are coming from, then it’s also clear that some areas might be underrepresented and is an unserved audience for that museum so they can target their marketing towards that area but also have information on what type of visitors are coming and maybe segment those visitors in categories so that you can detect what promising segments are for new types of events and so on.

11:07 So all these are examples of potential use cases that add value for the cultural sector and as I said in the beginning, we started by creating that data product landscape and that really proved to be an inspiring artifact throughout the whole project. So I would recommend doing that for all new data mesh journeys to make it very concrete for the business what that means in terms of data products and use cases. And needless to say that we detected that this is only a tip of the iceberg of potential data products and use cases that are already around 50 data products on this map, but it is only a few organizations that we talk to. So it’s still more to come after that. So that’s about the why and the added value for the cultural sector In the next two sections. I want to dive into the user experience on this platform and I want to do that from two persons.

12:12 One is the data product developer that wants the self-service experience for sharing data products. And the other one is the data product consumer that wants a self-service experience for reusing data products. So when we have a developer, what are the steps in this developer experience? First of all, the organization that he or she is part of and the domain it represents needs to be onboarded and then the platform takes over to bootstrap some kind of ownership parameter with security measures on top of it and then the developer for that organization can be onboarded, which again makes the platform automate the user creation and permissions that are needed on the different components of the platform. And then an thorough analysis and design of a data product can happen. And after which the platform enables to register that data product so that the automation of that platform, the self-serve tooling, will automatically generate a code repository for the data product developer to start developing his data.

13:19 Product transformation logic in when that’s done, the building deployment and also provisioning of different data technologies that are needed for a data product is handled by the platform. Also enabling discovery in the catalog by provisioning the metadata towards that catalog. And finally also orchestrating operating and governing the whole life cycle of that data product so that it’s executed when it needs to be executed and so on. And have the ability for that data product developer to observe what’s happening to monitor the data product, the data products that he owns and evolve data products with new versions to maintain them and make them better. So in the next screens I’m going to go through most of these steps by showing you screenshots from the actual user experience so that you get an idea of what that means. So first of all, onboarding of a company and the users is done of course through a landing page and a login is enabled by actually the identity provider infrastructure of the government itself that provides this for every citizen of Flanders and there’s the necessary documentation that focuses then on sharing data. Most of the screens that you’ll see are in Dutch because the target audience is the Flemish cultural sector. But I’ll try to explain as good as possible what it means and what it is that you see on screen.

15:00 We talked about analysis and design of data products for that it’s not a platform that’s doing this. You’re using your regular artifacts like for example the data product canvas to analyze what the input ports are for a certain data product. So this is an example of a data product about museum visits which takes that input from the source system and identified that it needed two types of output ports, one in format of linked data and another one in the format of more CRA tables in analytical data error like click outs in this case. Also in the design a breakdown of the different concepts is done in there and indications of who will be the consumer or which use case will benefit from it. In this case an audience finder use case to have those audience insights for museums. When that analysis is done, you can go to the platform as a developer and start creating a data product.

16:07 So do you have a small form that you can fill in to enter the name of the data product, a description to state what the data in that data product is about? Select the domain that will own that data product and select in which programming language you will. You are going to write your transformation logic. Second step allows you to enter some more details mostly about the ownership information and who to contact for business questions or technical questions on this data plot. And then the developer can click the register button, which will make the platform go and scaffold and generate GI code repository for that data product in which you can see a number of main files and the transformer directory. So one important file is a data product ml, which is actually when I open one the specification of a data product. It sort of also includes the data contract information, but it has that general descriptive information that was already entered in the registration form.

17:27 But the developer can then edit this and for example indicate when the transformation or the data product should be scheduled using Chron’s schedule, indicate if it needs large memory usage or rather small one indicate what the input ports are. In this case it’s an example of a connection to an elastic in a source system and what output port is needed or multiple output ports are needed for that data product. In this case a linked data output port for this activities data product within the transformer directly there will be the project with the code in it and I’ll show you later how that looks well even now. So then the developer can start using their regular IDs to develop the transformation logic of that data product. And so as I said in the forming directory, you have in this case a regular Java project in which the transformation can be written.

18:36 But we also support Python, which is this is an example and you have a default or quite common python project in there. And in this example also you see for example that in the data product specification there is information like column definitions that can be added which then reflects the data contract that you’re going to adhere to on the output ports so that way the data product developer can go to run and Deb book the data product and to enable that fast local development cycle. We also developed a data product CLI and that data product CLI allows to run the data products locally without having too much infrastructure on your laptop but still for example, enabling to easily connect to upstream data products. So if you’re developing a data product that is consuming data from another one, then you can from your local development environment connect to the output ports of that data product so that you can actually test your own transformation of data and do this in a secure way so that the CLI also allows to store the credentials you need for that in a secure way.

19:55 So when that’s done, we’re talking of course about building and deploying the data. Product building is happening automatically, so it’s a GitHub style. If something is committed, a change is committed to the Git repository. Then behind the scenes the built pipelines will activate and build a new version of that data product. And then for deploying, you can go to actually this developer portal to select the version you want to deploy and provide the secrets you need for example to connect to source systems and then deploy that version which will make sure that the platform will provision everything that’s needed for that data product when the data product is executed. Of course there are also the logs and information about what’s happening during a run of that data product. And for that you also have access to the transformation logs of each data product.

20:59 When you do this multiple times, you of course as an owner of data products or a company that owns data products have a complete list of all the data products that you own with status information. Some might be an error, some not yet deployed, others are perfectly fine. And you also get information on when it was last updated successfully. But also if you click through to the details when it was last, when it last failed and with a message and also access to the history of this information including for example which version is deployed. So that’s a bit the experience for the data product developer. You can easily create new data products, develop the transformation logic and have a single entry point towards the platform for a self-service experience and to easily evolve his data products. We’re talking about the data product consumer. We’re talking about a self-service experience for reusing data products.

22:09 And that of course also starts with the ability to onboard the organization and the consumer. But after that it’s important to have the data product catalog available to discover the data products to understand a data product based on the metadata. Define if you trust it based on the lineage of that data product and if that’s okay, request the credentials to access the data product, which then the platform can generate and provide. When you have those, you can start exploring the data within that data product and if that all seems a fit for purpose for reusing that data in your use case, you can start connecting the systems that you want to reuse them in and consume the data product through its output ports. So again, the landing page and documentation that’s no longer talking about producing data products but consuming data products and giving information about how to authenticate and which formats of data are available. And first entry point is of course the data catalog. For that we’re using data hub, which has the data product as a first class entity within its catalog. And so you can search these data products and have a view of the domains that are available within the data mesh and for each domain see which data products are in there and who are the owners of that data product.

23:48 Of course then you want to understand a data product. Yeah, this is a full list, but that’s the last time I looked. We were about 23 data products currently and still growing. So that’s nice. When you go into one of those data products, you will see the metadata that was entered by the developer through the registration form, but eventually in the data product specification that’s within the code repository and you will see all that meta information also represented within the catalog. So the source of truth of the metadata for us is that yamo file with the data product specification within the repository that’s owned by the developer owning the data product.

24:34 And then you can drill down to the output ports. In this case there’s one type linked data available and for an output port you can also find out what the URL or address is on which you can consume the data of that data product. Trusting data products, as I said for that the lineage is important. So this is a view of the lineage on the data product level where what are the upstream data products of data that you want to reuse. So that’s also available. And for exploring data, our platform uses Jupyter Hub to provide Jupyter Notebook experience, which is easy to connect to the output ports of those data products and start doing your first analysis of the data and explore how the data looks and if it’s suitable. And you can easily use more graph like libraries to start exploring the linked data formats that we have in our platform so that you can also do things with knowledge graphs and things like that.

25:50 And when that’s all okay, connecting and consuming. It means knowing the endpoint to connect to and the credentials to use to do that securely. And the last thing to say about that is that our platform actually supports at this moment four types of output ports that are each targeted at specific use cases and to provide that native access for that use case. So first one is for having analytical data. So queryable tables, which is based on click house running on Kubernetes environment for more unstructured data, we’re using OBX storage in Amazon S3. Linked data is also an important output port. The Flemish government is focusing a lot on defining data standards based on linked data and on semantic web technologies. So also for our data mesh platform, one type of output port is linked data and we use triple stores more specifically Apache Gina physi as a backing technology for that type of output port. And the last one was added recently is to support geospatial data using the appropriate standards there. And Geo server is the one that is supporting this. So that one for example enables those map like views where you have the geographic distribution of those cultural infrastructures or other types of visualizations on maps.

27:36 This provides a nice bridge towards the platform architecture. So for that we really did an thorough platform engineering approach towards data mesh where on the one hand you want to support the whole lifecycle of a data product. This is a compacted version of the two developer and consumer life cycles I’ve shown earlier. We’ve built that data product management application or that portal that you saw in the screenshots that enables to register new data products. Then the orchestration of the platform is done completely within Kubernetes. I’ll come back to that a bit later. And we use Git as our source code solution development can be done in Java, in Python with the CLI, but also in a more web oriented language. The reason that was added is because we noticed that in the cultural sector there are a lot of web developers present, so having a language that’s more aligned with their expertise is certainly a plus. Building and deploying is done using cloud native technologies like build packs, techon and Flux. I’m not going to explain each of them but just name dropping a few here. And then the automation of provisioning, everything is also done by using Kubernetes operators. Again, the application that we created as a single pane of glasses enabling. Also the monitoring data hub, as I told you is the catalog Jupiter notebooks for exploring and then other tools can connect to these output ports to actually build reports or train AI models or other use cases.

29:35 What we did in this project is to bring the abstraction of a data product alive. And for that it’s important to define what we mean with the data product. Here we’re not talking about a dashboard, a report or an AI model. As a data product. We’re really talking about an architectural component that’s tasked with making data in itself reusable in multiple ways. So it’s an independently deployable component that has high functional cohesion and you have a data product for example about cultural activities and another one about the tickets that are sold. And that data project includes all those structural elements for it to function properly. So we have the input ports that can be defined in the output ports. You have the transformation logic, the metadata which can be made discoverable through the discovery port, observability of metrics and logs is done within the platform. And there’s also a control port in the sense that the platform is orchestrating the execution of the data products, be it based on a Chron’s schedule or be it based on an upstream data product being updated and that the downstream data product is triggered to also update itself. So technology wise, I already shown you the four output port types, the languages that are used, the catalog that’s used for this. So this is a bit of mapping of the technologies on the anatomy of a data product.

31:27 So what we did here is to raise the level of abstraction of the cell service towards that data product abstraction. So a data product, it has all those structural elements, input output port, transformation code, but actually it’s even more detailed than that. There is a lot of technical components involved with one data product and we talked about the Git repository, we talked about an entry in the data catalog, potential secrets for credentials to access source systems, the metadata itself, the job to schedule and execute a container that contains the transformation code, but also for the different output ports. You can have a bucket in a street storage, a dataset in click house with its tables, a graph in a triple store or a dataset in geos server for geospatial data. The chrome schedule itself, the build to the deploy pipelines, provisioning the address to access the output ports with credentials to do this securely. So there’s a lot of things that happen when you want to create and handle a data product. What we don’t want to do is to put the developer experience completely to the right at the level of the individual components that I just mentioned. We want to push the developer experience completely to the left so that the developer is actually thinking and interacting with the platform in terms of creating that data product, specifying the data product, building it, deploying it, and monitoring it at that abstraction level.

33:12 And the component in our platform that’s responsible for bringing that abstraction alive is the platform orchestrator that one will provide self-service APIs and on top of these APIs the experience planes can be built and the catalog for the consumer, but also the developer portal for the developer. And on the other hand, the orchestrator brings the abstraction of a data product alive by automating all the infrastructure and technology beneath the platform and to actually hide complexity for the users to know so that the cognitive flow of those developers and those domain teams that are going to use your data mesh platform is lowered. What we use for this is Kubernetes. So Kubernetes is more than just orchestrating containers. You can think of it as actually being a cloud native framework or platform to engineer platform orchestrators. It’s also an API driven framework and it is extensible. So for providing those self-service APIs and defining those abstractions, there’s something that’s called a custom resource definition in which allows you to define custom APIs on top of Kubernetes. And on the other hand, to automate things, there’s the Kubernetes operators pattern that we apply a lot, which allows to act upon interactions with those APIs and orchestrate everything underneath.

34:50 So a data product is more than just a dataset. We want to simplify all this by raising the level of abstraction and lower that cognitive float so that you can specify a data product like is shown here, which we also shown earlier to lower that cognitive float. Now in this presentation, I’m not going to go further into details into the technical realization of this platform. For that you can use this QR code to end up with a presentation I did last year on the inner workings of this platform. So more technical details over there. Where I want to end with is more information or having a view on how the operating model of mesh is mapped onto the cultural sector. More specifically team topology that we used and talk a bit about the governance of this platform.

35:51 So at the typical operating model for a data mesh, which is also described by its a Mac, is having a platform team responsible for that self-service data platform, which provides those self-service APIs that those business domains teams can use. Being cross-functional teams that will own the application and the data products. You can have an enabling team that does consulting, provides examples and best practices to help those business domains. And you have the federated computational governance, which is represented by a team of domain representatives that try to define those policies that need to be automated by the platform. That in a nutshell, but how does that work in the cultural sector? So the platform ownership is actually anchored within that sector. So one of the major players public is taking on the ownership of the platform and within that platform team, we as a C group also try to support public in this task and guide them in a further evolution of the platform.

36:59 Different other organizations are taking ownership of the domains that are within the cultural sector. And here it’s maybe important to note that in the extreme form of the data mesh theory, the actual owners of the source systems all the way to the outskirts of the sector should own the data products. But what we see in the cultural sector is that there are a lot of very small organizations that have their own systems, but they are not capable of doing any IT capabilities, but independently of the analytical world for having done operational platforms and applications available, there are already a number of intermediary sector players in the sector that take on the ownership in name of those, the smallest organizations within that sector. So that’s also what’s happening for those data products and owning the data products and the domains of those data products. It might not be every organization that’s going to own the data products, but at least you have a divide and conquer into multiple intermediary organizations that will own the data product sharing and reuse for enabling use cases for the whole sector. Concerning the enabling team, what is happening is that the platform team is also taking on the enabling role to empower those sector players and by that means raise their data maturity and concerning governance, well we’re talking about the whole sector. So the governance here is also led by the government, more specifically the Department of culture and by involving major players within the cultural sector. So that’s a bit the operating model that we mapped like that.

39:17 The last thing is federated governance. There are a lot of aspects concerning governance, but I think the main message I want to give here is that I think it’s very important to focus your governance on two pillars and that’s also what’s happening for this journey. One is interoperability, so making sure that the data products can connect to each other, but also that the data within the data products can be correlated with each other. And for that, there is a lot of work being done by the Flemish government to define data standards. Data standards, not for the whole government but for each sector and for each subdomain within that sector define a data standard. And they do that based on semantic linked data definitions, which they call Oslo data standards. So that’s one focus area, make sure that everything stays working together. And another focus area is to have governance based on to stimulate collaboration, for example, to get representatives of domains together to brainstorm on interesting use cases and new business cases for the cultural sector. And this always facilitated, so both the collaboration focus and the interoperability focus with those data standards is facilitated by the Flemish government by defining and organizing working groups of those sector representatives. And when policies come out of this, they can be automated and supported by the platform where possible. So for example, one policy can be that each data product at least has to have a data product output port of type linked data that’s compliant with a data standard so that you make sure that all data is at least shared under those standards, for example.

41:28 So with that, we end up with a data product ecosystem across organizations in which multiple types of organizations having multiple types of data can be combined and exchanged their data no longer live in silos and only work in silos, but make that cake bigger for the whole sector and enable those innovative services based on data. And with that we can take the first steps towards that data-driven cultural sector. And I intentionally say the first steps because it’s not because you implement a data mesh operating model and a data mesh platform and a data mesh governance and all those principles that from day one you will have a fully data-driven sector. It’s a journey and the journey has already gone into a couple of years, but the journey still has to start also towards the future. And it was nice to see that this project got the flounders digital awards last year in the category of best digital transformation project. And if you want to know more about that, you can also use the QR code shown on screen. So with that, I’m at the end of my presentation. If you want to contact me or get in touch, please use the email or the LinkedIn profile shown on the screen and I can only thank you for listening and hopefully it was useful.

Speaker 3: 43:01 Great, that was awesome. Thanks so much Tom. There’s, we have a few questions and these are all questions that I had written down as well. So yeah, from Sean McClintock, is there a way to indicate the prior version of a data product as being deprecated and its retirement schedule

Speaker 1: 43:21 Currently? No, that’s not yet implemented, but it’s not that hard to think how that could be added to the platform because we’re already tagging versions of a data product. But mainly the capability has to be built in that that version is coming onto the output ports I think. So that for example, for one data product, you can have two versions of the same output port and then gradually let consumers of the data switch over to the new version and then decommission the output port is no longer used. So that would be I think the answer to that question.

Speaker 3: 44:05 Okay, awesome. And then Sean has another question. Does data hub support exposing any incidents on the data product affecting its usage? For example, failed refresh, shape, changes, unexpected size, variance, et cetera?

Speaker 1: 44:22 Yeah, so one thing which was also quickly shown in the screenshots, we have a freshness indicator on the data and also a status that it’s green and active, then nothing is wrong, but if it fails, then that will get another indication and also the freshness indication will lag behind. So that’s something you can see as a consumer of the data products.

Speaker 3: 44:53 Okay. So yeah, this is a question you can tell us. I don’t know how much you can answer this, but yeah, impressive. So this is from Mark Vermillion, so impressive self-service platform, but as these platforms do not exist off the shelf solutions, I wonder how many years of development work was required to get this to a certain maturity level.

Speaker 1: 45:16 Thinking back, I think now I think about one, two years of development with a team of about six, but those six were a combination of two software engineers that engineered the platform to data engineers providing the expertise and to cloud engineers to work on the infrastructure. So that’s a bit the vague answer to that, but gives an idea.

Speaker 3: 45:50 Yeah, I mean all the questions I was thinking of you then answered them, so it seems like it’s a pretty well thought out platform. So another comment slash question from Robin. Adam. So presentation, how much time in average does it take for you to onboard a new domain on the platform and what do you see as the biggest challenge to get a new domain started?

Speaker 1: 46:16 How much time it takes? Well, the consultancy answer, it depends, and it depends also on how complex is the data they have, how many source systems do they have and so on. But basically just onboarding them on the platform, making sure they have their users and access to the documentation. That’s a matter of hours or minutes or there’s nothing that is holding that back. Of course you want to do a thorough intake, so the platform team or enabling team, this case public, but also go into interaction with that organization to explain them things, make sure that they think about the needed skills that they have to have in their team. For example, when it concerns linked data, you have to have some kind of knowledge about that, but also check which language they use mostly currently when they have developers in their organization. And if for example, that would be a language that we currently do not yet support, I think we can add quite easily some languages because we’re using those cloud native build packs and they already embed a number of languages to use in containers. So that could be an evolution of the platform when you onboard also a new team that needs other tech.

Speaker 3: 47:56 Okay, great. Thanks. Yeah, I was also wondering, well there was another question from Mark and about getting buy-in. Did you all create that video specifically for as a pitch for getting buy-in and what else was needed or what were the other approaches you took for getting buy-in from different organizations?

Speaker 1: 48:22 Well, there are different phases of course in the data mesh journey. And this video only exists I think for a couple of maybe a month or two, I dunno, exactly. So it wasn’t created at the beginning to get buy-in from the pilot organization so to say. But there what I shown earlier, creating that data product landscape together with representatives of those organizations, some key players that really helped to show what the potential is of a data mesh platform. And it also triggered ideas and inspired people to go talk to others. And that made it quite concrete for all those involved because at the beginning when you start explaining data mesh, it is quite abstract, certainly if you just read the book and nothing more and it’s quite abstract. So what we think it’s important to try to make it concrete and as concrete as possible for not only the business stakeholders and going into that data product landscape and identifying the data and the data products that are potentially interesting, but also make it concrete from a technical level so that you can already do a proof concept and start trying out those things so that you can also actually show something to the stakeholders that helps and also make it concrete on what it means for their day-to-day job, the operating model, how will it work?

50:02 So you have to work on these things in parallel I think. So you indeed need more than the nice video that is shown, but it is also convincing people, organization is also a journey and it’s mainly about focusing on what’s in it for them.

Speaker 3: 50:23 Sure, yeah. And that goes into what I also going to ask, which is, so since this platform has been implemented, what types of ROI have you seen? Is it in terms of speed of getting data or, I know those metrics are sort of sometimes hard to define, but are there any things you can share?

Speaker 1: 50:48 Well, I don’t know if currently they are really measured, but in terms of return on investment, as I said, I think somewhere in the slides we’re currently at 23 data projects that are going live on it and about I think maybe five, six organizations that are working on it, including the ones that are going to start to work on it. So that already is for the government, a return on investment. That’s a big difference by comparing it from the situation before where all the data was within the systems and couldn’t be accessed. So now it’s becoming visible, the data can be explored, can be accessed. So that’s mainly the return on investment from the government’s perspective. Also, of course, it’s also a long-term vision of that government. So it’s not about the hard return on investment in euros or dollars, it’s more about a return for the whole community or the country.

Speaker 3: 52:08 Okay. Yeah. Well great. Yeah, then are there any other questions from anybody? I’ll wait just a little bit. There’s a little bit of a delay in the stream, but yeah, this was a wonderful presentation. I definitely learned a lot. Yeah, this definitely seems like a really well thought out platform. You answered most of the questions I had as I was writing them down. So I think that’s it in terms of questions. But yeah, I really appreciate you taking the time to speak with us. It was super interesting. You’re welcome. And yeah, we look forward to maybe we’ll have you back in a year or so and you can give us an update on any new lessons or any changes that have been made.

Speaker 1: 52:54 Yeah, who knows? Yeah.

Speaker 3: 52:56 Alright, great. Well yeah, thanks again Tom. I really appreciate it.

Speaker 1: 53:01 Thank you.

Speaker 3: 53:02 Alright, have a nice day everybody. Thanks.

Speaker 1: 53:05 Bye-Bye

Speaker 3: 53:06 Bye.

Data Mesh Learning Community Resources

Engage with us on Slack
Organize a local meetup
Attend an upcoming event
Join an end-user roundtable
Help us showcase data mesh end-user journeys
Sign up for our newsletter
Become a community sponsor

New Whitepaper! Getting Data Mesh Buy-in

Engineering a Data Mesh Platform for an Ecosystem Across Organisations

Watch the Replay

Read the Transcript

Data Mesh Learning Community Resources

Ways to Participate