Data Cataloging in the Age of Data Mesh

Aug 07, 2023 by Melissa Logan

A summary of our Data Mesh Learning community roundtable discussion on August 3.

The Data Mesh Learning community hosted a roundtable discussion with data mesh practitioners to discuss data cataloging.

Some of the questions posed during the discussion included:

Data cataloging: What is it, what’s the point, and who is it for?
Is everyone building something on top of their data catalog?
How do you use generative AI for discovery?
Can a data catalog be a single pane of glass?

Facilitators:

Jean-Georges Perrin, Senior Data, AI, and Software Consultant & President and Co-founder AIDA User group
Scott Hirleman, Founder and CEO of Data Mesh Understanding

Participants included:

Ole Olesen-Bagneux, Author of The Enterprise Data Catalog, Host of The Data Democracy podcast, and Chief Evangelist at Zeenea
Tom De Wolf, Senior Architect and Innovation Lead at ACA Group, Host of Data Mesh Belgium Meetup
Jen Tedrow, Executive Director, Pathfinder Product Labs
Eric Broda, Senior technology executive, delivery specialist, and architect

Watch the Replay

Read the Transcript

Scroll to the bottom of this post

Additional Discussion Resources:

The Enterprise Data Catalog by Ole Olesen-Bagneux

Ways to Participate

Check out our Meetup page to catch an upcoming event.

Let us know if you’re interested in sharing a case study or use case with the community. Data Mesh Learning Community resources

Engage with us on Slack
Organize a local meetup
Attend an upcoming event
Join an end-user roundtable
Help us showcase data mesh end-user journeys
Sign up for our newsletter
Become a community sponsor

You can follow the conversation on LinkedIn and Twitter.

Transcript

Jean-George Perrin (00:01): Hello everybody. I’ve got some friendly faces. And Tom. Hi

Scott Hirelman (00:12): Rude. J g p. Rude.

Jean-George Perrin (00:14): Okay, sorry.

Ole Olesen-Bagneux (00:18): Hello.

Scott Hirelman (00:19):Hey.

Jean-George Perrin (00:19):Hello,

Ole Olesen-Bagneux (00:20):Hello. And happy to meet you.

Jean-George Perrin (00:27): Oh, it’s a great pleasure. I can’t see you. I don’t know if your camera is dead or something, but I’m really happy you’re joining today.

Ole Olesen-Bagneux (00:36):Let me see here. And happy to see you too, Scott. Long time.

Scott Hirelman (00:43):Yeah, long time. Yeah. So J G P, you’re the one who kind of chose this topic. Why don’t you get us started with your general thoughts on this and then we can open that up to other folks, especially Ole who literally wrote a book on this topic.

Jean-George Perrin (00:59):Well, and I was teasing Tom, but Tom did also some very interesting work on cataloging. Anyway, so we’ve got a great audience today. So yes, I wanted this topic because I’m still puzzled whether we should in a tool, an external tool, or should we build our specific tool for data products And based on that, what people think, what people see, what we see are there, what’s even the meaning of cataloging for the audience. So that’s kind of a little bit of what I’d like to explore today with this group.

Scott Hirelman (01:50):Ole, you wrote the book on it. Would you want to give us your definition?

Ole Olesen-Bagneux (01:56):Definitely. Can you still not see me though?

Jean-George Perrin (02:01):Nope.

Ole Olesen-Bagneux (02:02):It’s been a while since I used Zoom. I apologize. I have my video on, I can see myself. Is this better?

Scott Hirelman (02:16):It’s all right. Just go with it. It’s totally fine. We had this with Paolo last week too.

Jean-George Perrin (02:22):You’re going to be anonymous for today.

Ole Olesen-Bagneux (02:26):Yeah. Okay, no problem. My definition of a data catalog, well that’s actually pretty simple. I have this definition in my book that it’s a structured overview of the data in your company at a meta data level. So basically you can pull all kinds of meta data from all kinds of sources, and that will allow you to have an overview of what kind of data you have in which systems. So at its core, that’s the data catalog.Jean-

George Perrin (03:08):Yeah, go ahead.

Ole Olesen-Bagneux (03:10):I was looking for the camera button. So maybe Jean, you were saying something else in particular in regards to the conversation today and data catalogs had a question of some sorts.

Jean-George Perrin (03:29):No, not specifically. And I was also wanted, because you don’t have the camera on, so I just wanted to show everybody. Of course when you’re doing that, it doesn’t work. The enterprise data catalog. Okay. So there’s nothing like a little promotion. Okay. Thanks solo.

Ole Olesen-Bagneux (03:58):Thank you. Thank you.

Jean-George Perrin (04:00):You’re very welcome. And to be honest, I haven’t read it yet. Okay. But it’s on my reading list. So Tom, I know you’ve been working also with very interesting thing with data catalog. Do you want to add something else there?

Tom De Wolf (04:18):Well, what we’ve used as a data catalog, or what we did when we are trying to find a suitable data catalog for a data mesh solution is to have something where we can already see or built in the data product entity so that it can be composed of multiple data sets and have the met the information, the owner information and all that. And when we started, we didn’t find any catalog that already had it. So we started with a data hub project and customized it ourselves, which was doable. And yeah, I think a few months ago also the project itself put the data product entity in their solution. So it is going that way, but I don’t know if other people already have seen other catalogs that have the data product entity defined in them.

Scott Hirelman (05:17):What I’ve heard from so many people on the podcast and stuff, anytime you ask about data catalog, everybody is building something custom on top of it. Nothing out there that is as is seems to be extended far enough into this. It seems like there’s needs, and I don’t know if that’s custom needs per organization or if it’s just the data catalogs aren’t there? I don’t know. Jen, you’ve worked on a couple of things. Have you seen kind of the same thing where you’re just having to build stuff on top of

Jen Tedrow (05:49):Yeah, or if you have the right partner asking for a lot of custom development, which I know nobody likes to do that, but I think that a lot of folks just have different needs within the organization. And Scott, as you know, I spent a lot of time evaluating a lot of different tools for the implementation of the data mesh that I worked on. And I think the first step is understanding exactly what it is that you need or that your users need from the tooling. And that’s going to vary from organization to organization. For example, in the organization where I was working, of course we want the metadata, we want all of the standard things, but the business context was very important to understand and how to leverage that data. And there was a barrier with some of the tooling that some of the folks that were very non-technical get access to that tool. So the user experience was something that was really important for us because we wanted to lower the barriers and make sure that the folks with that context and information could easily add that to it because once they understand the data, it becomes more valuable and more usable. And that was really one of our biggest goals. But again, if you’re working with mainly a very technical tech savvy audience, that might be a very different goal for your organization and your needs.

Scott Hirelman (07:16):I was literally going to ask, do you actually think that your needs are that different from other organizations? The needs that keep coming through is yes there. It’s like when people talk about lineage, lineage is very technical. It’s never the actual transformations and the why and the business logic of the lineage of what’s happened. And that’s where I’m thinking technical tools built for technical people aren’t lowering the bar to access. I mean J G P, you were doing one in a very, very technical implementation. So I’d like to hear what yours was for basically data scientists. So did you have those same needs or was that not really the same thing?

Jean-George Perrin (08:02):I think when we approached data cataloging, we just wanted to, so basically the why of the data catalog is to ease discovery. So that was our main goal. You can imagine all the things like finding out business context or defining business rules, but that’s almost like data governance. Our thing was really focusing very strictly on data discovery. So making sure you’ve got a powerful search engine to give a kind of Google experience. And we achieved that. Okay. That’s why I’m very curious about the We custom develop it. We custom built it. Okay. So we could leverage any fields we wanted. We could extend it to match the data contracts or the data product definitions that we wanted. We actually incorporated the feedback loop directly in the search engine as well. Okay. So I don’t know. I like to build, I like some people kind of define me as a builder, but the thing is I like to build, but the thing is I’d rather have an approach like Tom did I trying to integrate with something that already exists, but what is the definition of this element? You want to be searchable. Okay. So our idea was to index data contracts and to expose them with additional information like reliability on top of that. So that’s the approach we took. To answer your question, Eric, you want to say something?

Eric Broda (10:02):Yeah, I was going to say we are implementing data mesh in this global climate environment for a nonprofit. And it’s all based off of data mesh. And the catalog’s a central piece. We call it a registry, but we took quite a different approach. And the reason being, and I want to get people’s feedback on this because there’s probably different ways of solving this problem, but as many of you probably know, there’s literally thousands of different sources of climate data out there and everyone’s owned by a different individual or group or entity. What that really means is unlike an enterprise, there is actually no formal body or formal mechanism to get these folks to coordinate nor enforce any level of coordination. So unlike an enterprise, you can have a central group that says, thou shalt do that. We have zero opportunity. And I think even in the enterprise, my experience is that thou shalt follow the standards is problematic at times.

(11:02)
So we took a very different approach and what we said is we wanted to make sure that anytime a data product is created, the onboarding process into the registry was drop dead simple. And we felt it needed to be drop dead simple because otherwise people would not do it. And we wouldn’t be able to encourage Nassau, for example, great huge data source, how would we ever get NASA to do our bidding? We would never be able to do that. So we had to figure out a different approach. So this is the approach we actually took as we said. First off, guiding principle is the data product is the book of record for all metadata about the data product or all data descriptions, tags, everything, and seeing as, it’s the book of record for it. It’s the best source of it. So we said all of the stuff that would normally be in a catalog should not be centralized, it should stay in the data product.

(11:57)
So how do we actually do discovery as Jean George mentioned? Well, we use generative ai. So what we do is the only thing we keep, we seek to get is we have a vector database and some gen AI capability. And what we do is we literally point to the website where the data exists and we scrape it. And all we look for is a description. Well, there’s usually tons and tons of information, but we boil it down to a single description, some keywords tags. We even can create knowledge graphs and taxonomies if we want all gen AI based zero interaction from the actual data product owner. But here’s what we now do is we use that Google-like interface and all we do is we search those descriptions that we’ve got automatically literally within five minutes the moment we identify a data source, five minutes later they’re in the catalog and they actually have a description.

(12:50)
So we use the gen AI capability to search and find the data product we actually want and then we point it to them and to the extent that they have a discoverable interface, we point ’em to that discovery interface. So what ends up happening is the problem that we had, there’s no way to centralize this. So there’s no way of actually getting these folks to send us all the metadata and all the problems that are associated with that. We took a very different, completely decentralized approach and it actually works quite well. So I’d welcome that solves an extreme level of independence that most organizations may not have. But that’s something that we found we had to do. And I think there’s some direct analogs for enterprises too, should they have that approach as a result of that need. So anyway, I’ll pause there and get some feedback. Welcome ideas.

Scott Hirelman (13:42):And Tom, before we jump to you and J G P and stuff, one question I would add on top of this is do you have a single pane of glass? Can the data catalog be the single pane of glass? I know we want it to be, but all the observability data, all that stuff, is that all flowing into the catalog? So Tom, want to hear your feedback and how that flows through.

Eric Broda (14:04):Yeah, well let me just answer that. The way we did it, Scott, is we actually the catalog, we have two versions of it, a production version, and we have this demo and the demo is pure, it’s just a streamli app to be honest with you, with a very simple query interface that connects into a gen ai, a vector database gen AI engine. And all it does is it returns and points you to a link. So our catalog is actually super duper simple. It’s literally, it’s a little bit more than a Google-like interface Beyond that, it’s not much more than that. So anyway, I’ll pause there.

Tom De Wolf (14:42):Well, what I want to say is that I agree with the fact that you should possibly try to keep the metadata with the data products and also the owners responsible to fill that in. And what we’ve also done is to then let the platform automate the discovery of a new data product and push all the metadata into the catalog. So it’s not a pull mechanism like all the described in the beginning, but more pushing it into what’s done a data product entity in that catalog.

Ole Olesen-Bagneux (15:18):I feel like chipping in, but I should respect the order. I just can’t find the hand button either. It’s been quite a while since I used to, apologies.

Scott Hirelman (15:28):Go ahead. I was going to call on you specifically. We brought you in as our specific guest on this topic. You’re the subject matter expert because you’ve gone and talked to a lot of these people, so I’d love to hear your thoughts, but also you’ve talked to tons of people on this. What are other people saying even if they’re disagreeing with you?

Ole Olesen-Bagneux (15:47):Oh, well, yes, of course the disagreements. I’m happy also to hearing and speaking with Eric again, Eric Broda and I had a few exchanges while I wrote the book. I think, and also of course, data catalogs can either pull or push metadata into them. Both the setups exist and they also exist in harmony. I also describe DataHub in my book in relation to push-based data catalog, if you will. I think that certain data catalogs allow for, so knowledge graph, data catalogs are typically, they’re not that many, but they exist and they allow, they have, I guess the majority of the listeners here know they have this flexible meta model that allow for extension of the meta model and defining, okay, what are the components inside the data catalog? And you could definitely define a data product as such. If you have a knowledge graph based data catalog, that of course would to some extent not mirror the vision that you Eric put forward. And what is also stipulated or written in the data mesh book by Akani, because data and metadata has to stay together in her vision and that is a vision of course, that many follow if they want to implement data mesh. Pete Al, the author of data Management to scale has a somewhat different definition. He actually differs from Jamar Dani in saying that data and metadata should not be kept together as a product.

(17:46)
Back to your question, Scott, you’ll find different interpretations of what a data product is, what a data meshes so forth. But there’s no doubt that Jamar Dani, she advises on keeping data and meets the data together in the product, not in a central registry. I’d say that for many companies. So to be completely honest, after I have written the book, also, one of the reasons why I wrote the book was that I thought this technology was very, very difficult. And after I wrote the book, many people have reached out to me saying that their data catalog implementation has failed. I have no scientific backing of this, but I assume that it is more than 50% of data catalog implementations that fail. That is my assumption. I think one of the reasons why is that the setup is too complex. The meter model is either static or very, very deep. People have very little understanding of what lies outside of the technical scope in the sense of what is good information architecture, what is good data architecture, what does the knowledge of my company actually look like? And so I think those are one of some the key issues here to address and always to keep in mind.

Scott Hirelman (19:16):Yeah, I think that insight about the failures or that you have organizations that have five different data catalogs spread throughout. So J G P, you’ve had your hand up for quite a while. I wanted to hear kind of what you’ve been thinking on that

Jean-George Perrin (19:29):I wanted to react just to Eric, and I know Eric and I were talking very often, but not on this topic. So when you’re searching something, are you making some kind of a distributed query around that to all your data products and say, Hey, do you have a customer in your metadata?

Eric Broda (19:56):No, you do it.

Jean-George Perrin (19:57):Oui

Eric Broda (19:58):Actually, so we tried to address, it’s kind of interesting that Ole mentioned the problems. Those are exactly the problems I’ve personally seen and we try to address. They are complex. You have to always move this metadata and it’s always out of sync. I’m exaggerating, it’s not always out of sync, but you get my drift. On top of that, what we found is these catalogs were made for technical people, not for business people. So what we did is we took a very completely different approach. We said our audience is literally anybody including a technical audience, but including a business person who’s looking for some data. So the only thing we keep in our vector database, our gen AI engine is literally the scraped site. Wherever that data happens to be, everything’s, at least in this area, everything’s A U R L, we scrape the site and effectively translate it all into texts and tags that go into a marketplace tag.

(20:54)
It’s a bunch of tiles on a webpage effectively once you’ve done the search. So what ends up happening is we actually have no overhead and the only thing we keep is that description because our goal is to get them as fast as possible to the data product. I’ll be honest with you, every data product has a different discovery interface, a different level of complexity, a different level of simplicity, a different set of attributes. Every data product is different. So what we said is all we want to do is point them, get ’em the fastest way as possible, the simplest way as possible to get them to the actual data product where the unique aspects of that data product can surface it. Because again, our problem may be a little different where we have thousands of data sources and one of the single biggest problems is actually finding the data, let alone once you find it, consume it, but you can’t find it.

(21:49)
So we solve that. We optimize for that discovery and delegating all of the low level stuff around how to consume it to the actual data product. So to answer your question, jg, we have next to nothing in our database other than what we scraped from the site automatically with no interaction from the data product owner. I mean they can send it to us if they want, but most don’t. And all we do is summarize it and keep that brief summary in there enough that we can actually search using the vector database, the semantic search capability that comes with it to point to the actual data product. So very different approach, very much trying to solve some of the problems that Ole mentioned and staying true, which I think is foundational and just a visionary aspect of what zema came up with. It puts the onus and the delegation on where the actual insight and knowledge actually resides. So anyway, that’s the approach we took.

Scott Hirelman (22:47):I’d love to hear what people think about what is the point of the data catalog because is it what I saw when I was managing a W Ss costs, I could never get a single engineer to actually log into the A W S console. Never. They just never did. Right? So they would never see that their tear down scripts didn’t work and there was all this stuff that was left. You could only get them to work at the C L I, is the data catalog for everybody or do you need something where somebody can do an a p I call and they get that information and that’s what works for them and that the data catalog is the lower bar and you have it for the non-technical folks. Is that the right way or not? Personally? I don’t know. I haven’t really seen on those conversations, but it’s something that constantly is stuck in my head and we’ve got Olays with his hands up and then Andrea had said, what if the catalog were not primarily for the users but rather for the platform services that utilize it to automate product lifestyle cycle management processes.

(24:05)
I think that’s almost a different aspect of the platform. I dunno. Anyway, I’m not going to give my thoughts on this. I’m not deep enough on this, but Ole you’ve got your hand up so I’d love to hear your thoughts and then anybody else’s as well. If you’re talking, you’re still on mute.

Ole Olesen-Bagneux (24:32):Thank you, Scott. Apologies. We can’t have you both muted and not on video. That’s too bad. No, so I was just saying that I think, and Andrea was actually asking a very good question here, and I definitely advice on how to do lifecycle management of data of data sources and data objects via a data catalog. It can be done, not many data catalogs have specialized in this field, but it is quite obviously a good idea to do it on a more general level. To answer your question Scott, also because I need to be mindful of time. I think that depending on the data catalog, you could consider it to have many end users also reaching far beyond data engineers and data scientists. And in other cases it can be a very, very technical setup. We have to keep in mind that data catalogs are, it’s not as if this product category is protected. So if you’re looking at purview or a w s glue I think recently changed it its name, but these are very technical data catalogs. They are very easy to use if you are on a single cloud in a single cloud setup. But they are quite technical with low user friendliness and other data catalogs are very user-friendly to the point where they’re trying to push it to make all employees use it for everyday tasks. So I think you find a lot of variety and you can’t really answer that categorically.

Scott Hirelman (26:23):I mean provocative question that I don’t think we can answer in the amount of time left, but is data cataloging? Is that a feature? Is that a product? That’s something a lot of the data catalog folks have said. Data catalog is simply a feature at this point and that we have to build out into far broader region capabilities. I don’t know if you’re seeing the same, Tom, you had some great thoughts on this, so we’re hoping to hear from you and Jen as well.

Tom De Wolf (26:51):Well, I wanted to pick into the effective for which users the data catalog is suited and what we’ve seen is that there are at least two categories around the data mesh. That’s on the one hand data product developers that more need also a listing of their data products and sort of a developer portal towards them and all the technical details including debugging and look features. But you could also see that as a catalog. I’ve seen Backstage for example, being used for that to realize such things. And on the other side there are the data consumers that might not be technical and maybe they need a completely other kind of catalog and have to have insight in where the data comes from and lineage, the trusted and all those things. So maybe it’s also a matter of splitting those two personas and trying to come up with different solutions for both.

Jen Tedrow (27:57):I would agree.

Tom De Wolf (27:58):Thinking about it.

Jen Tedrow (28:00):Yeah, no, I think that it’s understanding the jobs to be done for the specific situation and understanding your personas. That’s what we did was really just who’s using this, who needs to use it, what’s working or not working now because we’ve tried this before. To Scott’s point, we have all these other solutions running around and I mean to be fair, I’ve seen instances in much smaller organizations where they’ve used tools like Excel or Sheets or Confluence and that worked for them. That was fine for their needs. So I think that it could be a product, it could be a feature, it could be a product that extends to a feature because it feeds back into the data product and any iteration is fine as long as you’re meeting the jobs to be done. And the first step to that is understanding that

Scott Hirelman (28:45):I think kind of what you’re talking about is premature optimization to scale of like, hey, this 200 person company is spending $500,000 on a catalog when they have seven data products. It’s like, no, you can just book time with that person. So J G P we’re coming up on regular time. I’m happy to go over a bit, but I wanted to make sure we’ve got some closing thoughts around your thing or if you’ve got additional questions to ask folk.

Jean-George Perrin (29:19):It’s always kind of similar. It is almost impossible to get some kind of consensus, but it is like we’re five people and then we’ve got six opinions. But I’m curious if we can actually try to drive a consensus saying that the source of truth was a data catalog in a data mesh is a data product

Tom De Wolf (29:50):Plus one. Yeah,

Jen Tedrow (29:51):Hot take, but I’m opposed to it.

Jean-George Perrin (29:55):Absolutely agree. Okay. So at least we can say that we are in agreement there. So I think at least that’s where we’re going. And then whether it’s a feature or tool or something, my concern is, and when I was listening to Tom and Jen, I love the approach you have, but isn’t there a risk that the data catalog becomes the tools for everything like the Swiss army knife of your data mesh? And that’s something I’m a little bit worried.

Jen Tedrow (30:33):I could definitely see the concern and understand where you’re coming from. I think that would be difficult at least again, I’m speaking from really one or two experiences, so it’s not the be all end all, but there’s so many other, even the assessment process, it was multi-vendor. I mean we were looking at across observability, discoverability, data quality, automation. There’s so many things and we were looking for a Swiss Army knife. We’re like, is there something out there that can’t do all of these things? And there definitely wasn’t. So maybe that could be or it could be extensible. I mean in my scenario, we ended up having to purchase several different tools and we were augmenting with a your own solution as well. So I don’t know, I guess I’m having a hard time wrapping my head around that just from my limited experience. But I could see why you would be concerned about that. Just trying to shove everything to one tool.

Scott Hirelman (31:36):I’d love it to be at least a single pane of glass. That would be nice if somebody had,

Jen Tedrow (31:41):That’s their own part,

Scott Hirelman (31:42):Right? Well, because everything, every tool has trapped metadata, every tool. So if you want to dig into the quality, you have to go into this tool. If you want to dig into the observability, you have to dig into this tool if you want to. Yeah, I’m going to shut up now, but yes, exactly. If we can at least have that, that would be great. But nothing even can do that.

Jen Tedrow (32:04):That was what we found too.

Tom De Wolf (32:05):We were actually betting the place of the Swiss Army knife, the one that connects everything to be the platform itself that we’re also engineering and automating and to make it possible to integrate the different tools for what they do best. So that catalog is then only suited or not suited, intended to be used for the metadata and providing insights towards that. Other tools for other things. But I do agree that a lot of tools have the tendency to be a Swiss Army knife themselves and that makes it hard.

Eric Broda (32:45): One of the things that I haven’t heard too many people mention, but it’s a really big deal with a lot of my clients, but we are using gen ai, so chat G P T or the engines, large language models behind it. But we actually let it, again, we can ingest this information. I’ll give you one example. One of my clients, we ingest like an 800 page document that explains how to integrate with the product and we actually ingest that and then we say, Jen AI tell me how to integrate and it actually gives me an open a p I open a p I spec or it actually will give me Python code to actually access it. It’ll actually tell me how to write a fast a p i Python server service to actually consume and process that. And we actually have the ability to have it actually analyze data also, which is for us still very early stage.

(33:50)
But the other stuff I mentioned, we actually let gen AI do a lot of the heavy lifting. Tell me about my data, tell me what I can do with the data. Tell me about the metadata. In other words, we actually don’t rely on any individual to structure it all. We just take in whatever we can get and almost all of it has a set of developer documentation. Almost all of it is different format. Some of it is open a p i specs, some not, but it all can be translated with gen ai. So anyway, my advice is Jen and Tom as opposed to looking for tools that do it. There’s some general purpose stuff, admittedly early stage gen AI that is just mind blowing what it can actually do with very limited manual intervention.

Jen Tedrow (34:41):I love that. I think the challenge gets when you’re working in a highly regulated industry with a lot of proprietary data sets that aren’t publicly available or accessible, I think the barrier becomes more about, I love that for being able to go out there and get available data on a website. I think that’s an amazing solution for that because again, you’re not relying on people doing that. You’re going out and getting it. But a lot of the industries that are implementing data mesh are finance, healthcare, highly regulated, and a lot of the data’s internal proprietary, not publicly

Eric Broda (35:20):Available. A lot of those are my clients too. What we do though is I just use chat G B T because everybody knows what it is, but we use local models that run in the four corners of the on-prem or on their cloud from hugging face, and they’re just almost as good as the chat G P T. So we actually can keep all of that information inside the four walls of the company. In fact, my clients require it. And then what we found too is we don’t have to go out to external websites. Almost everything is available on a SharePoint, on Confluent or even some other places, but it’s all accessible using A U R L. Even a P D F file can be an url and because we can ingest just about anything, all of a sudden gen AI becomes, like I said, it’s our superpower. And I don’t think a lot of people realize how powerful it is, but I’m telling you, it’s mind blowing what it can do. It’s really going to shake the foundations as people start to see how easy it can make things.Jen Tedrow

(36:22):As long as there’s pleased to hear that you’re able to do that. I just kind of assumed given some of the experiences that I’ve had that some internal security person would balk at that. But I think that there’s a lot of just potential around leveraging that. I mean, like I say, you have to make it easy. You have to make it so easy. If you want people to actually do anything, leverage something, use it, get value out of it, then don’t make it hard. Don’t put up barriers. You have to lower them. And anything that I think pulls in the necessary information that helps people understand and find data is going to be helpful. And as long as you can do that in a way that’s compliant, safe, secure, that’s only going to help you achieve your goals and get value out of that data. So I love that.

Scott Hirelman (37:14):Do you think that we could leverage Gen AI to ask the questions for people to actually write better documentation? It can write the documentation, but can it help actually interview people and have them help write the documentation? We don’t have business logic that ever shows up in the documentation. It just never happens.

Eric Broda (37:34):But here’s the thing, guys, you can, but what we found is the world is awash with documentation. It’s just 98 different formats. What we found is Gen AI understands it all. And we use, as opposed to trying to write documentation in a particular format, we just ingest it. And admittedly, some of the data, some of the information you’re looking for may be missing. But what we found more often than not is almost every enterprise, whether on the public website or inside the four walls of the company, the data is there. It’s in 98 different formats, sometimes PDFs, sometimes Word document, but Gen AI normalizes to use that term, all that stuff and makes it easy to get information. Like I said, it was remarkable when we could get this P D F document that had no open API specifications and just ask it to write me an open a p I specification for this particular function. And it did. Then we said, right, tell me how to consume the code in Python. And it did. It’s remarkable what you can do with the stuff that just exists out there. As is.

Jean-George Perrin (38:50):I think we’re never going to invite Eric anymore because otherwise we’re going to talk only about gen ai. Okay. So I think we’re about time if we usually go over like 10, 15 minutes. But if someone has a pressing question, please, please do. I think we need to have a revisit of this cataloging thing, data cataloging thing, and talk more about the features because I see what I’m getting from there is Eric has great, you use cases that can be solved with Gen ai. When I’m hearing Jen, I mean it’s a little bit different because she’s working in a closed environment, in a regulated environment, and I’ve worked with companies where you’re not sending anything to open AI period. So it’s not a solution for everything. But I’d like to understand more about the features of the data catalog. What is the features you’re looking for? Is it this kind of just Google interface? Is it something else? What is the back? What information you’re referencing? And I think this could be an interesting follow up conversation at some point, not

Eric Broda (40:19):Today. I’d love to talk more about that. We should definitely do that.

Jean-George Perrin (40:21):Yeah. Yeah. And I think we need to make sure that all A is also on the call when we do this one. So guys, Scott, Tom, Jen, always gone. Eric, Andrea, thanks everybody for participating and sharing your knowledge. And we’ll see each other next week. Next week is W T F is a data quantum.

Eric Broda (40:51):Awesome. Thanks so much everyone. Okay, thank you.

Jean-George Perrin (40:55):See you guys. Bye-Bye.