David Arturi: Today I have the pleasure of speaking with Dr. Tiffany Perkins Munn, managing director and Head of Data and Analytics for JPMorgan Chase marketing. Dr. Perkins Munn is a member of the editorial board for CDO Magazine, a recipient of the prestigious CDO Global Data Power Women Award, the AI 100 award from AIM research and has been recognized as one of the top 100 most influential people in data by data IQ. Additionally, she was a top 10 finalist in the Merrill Lynch Global Markets Innovation program. As head of marketing data and analytics at JPMorgan Chase, she has built a reputation for solving complex business problems through operational improvements, financial insights and strategic recruiting. We’re thrilled to have her with us today. Thank you so much for joining us, Dr. Perkins-Munn.
Tiffany Perkins-Munn: Thank you for having me. I really appreciate you taking the time. One of the things that’s near and dear to my heart is really making these kinds of topics accessible to the broader population, so that everyone in every industry, in every field, no matter how large or small their organization, really understand data and the utility and how and the impact that it has on, you know their success, and how they drive their metrics.
David Arturi:Awesome. That’s great.And I guess, kind of just throwing into it right? We’ve previously discussedhow Gen-AI is serving as the catalyst to finally solve the data challenge.The data challenge being something we’re all aware of, as you mentioned,for as long as memory serve. But now it’s really imperative for it to start. It is imperative for us to solve it.Today, I’d like to start to focus on the commitment and solving data challenge thatyou mentioned and the why. So where do we get started steps to clean up, maintain accessibility. And then we can kind of roll into AI right? So just question one, can you please share yourperspective on the impact AI has had on the need for data cleansing and what obstacles.Are typically faced when trying to pair an AI initiative with data management practices.
Tiffany Perkins–Munn: Yeah, it’s interesting because in my perspective, anyway,AI has created somewhat of a paradoxical situation in data, cleansing and management. Right? So, on the one hand, AI promises to automate and to improve data cleansing processes, and that’s what we want itfor. But, on the other hand, it’s inadvertently increased the need for more rigorous data cleaning practices right. I think there are a few factors that this paradox stems from,just to give you like concrete examples, there’s this, there’s data need versus data quality. So as most people know, AI models,especially deep learning algorithms, require vast amounts of data to performeffectively. Right? But what this has created is this idea or this need for people to collect everything, right? So, we have this mentality in some organizations. But the sheer volume and velocity of data collected often compromises its quality right, and that necessitates more intensive cleaning efforts.So that’s one paradox. Another is really this what I call the amplification effect?AI models can amplify small errors or biases present in training data. So, what issues have been like a minor dataquality issue in traditional analyticscan become a huge or significant problem when it’s fed into an AIsystem. So that magnifies the importance of thorough data cleaning, obviously.And then the last thing which I think people need to understand is that complexity breeds obscurity.So as AI systems become more complex, it becomes harder to trace the impactof data quality issues on outputs, and that kind of opacity increases the needfor preventative data cleansing, because post hoc error, detection becomes more challenging. I would say that’s the paradox of the situation. I’m happy to talk about innovative approaches to reconciliation, it that’s something you’d like to talk about here.
David Arturi: Yeah, I mean, I definitely think it is. I mean what we see a lot of visits, or as we understand it, seems like everybody, kind of just grew data without a true strategy or plan and then inflection point that you’re speaking about where it’s like, okay, now, what do we dowith it. So that you know, now, it’s like, okay, we’ve got, you know, this database snowflakes,data bricks, whatever it is. And now it’s like I have constantly talked to people, it’s I don’t know what to do, these systems don’t talk right.So yeah,if you could talk through that approach, that’d be very interesting.
Tiffany Perkins-Munn: Yeah, in my, I mean just in terms of what I’ve seen across the street. I think there are several thingsthat you can do to kind of reconcile those paradoxes that we were just talking about.One key one, though, is AI driven data governance. So, developing AI systems thatcan dynamically adjust data governance rules based on evolving needs of the AI models, and whatever you know, they’re obviously shifting regulatory landscapes as well.So, AI data governance processes is really key.Another one is really continuous data quality monitoring. It sounds like, yeah, of course, but the key is, how do you implement real time data, quality assessment systems that can actually detect and flag issues as they emerge in AI driven processes. Right. So, that’s like the Holy Grail.And then there’s federated learning for data management. So, you want to explore federatedlearning approaches that allow AI models to learn from distributed data sets without centralizingthem, because that might potentially ease some of the data management burden. That’s just a few of the reconciliations,challenges or processes that I’ve seen approaches that I’ve seen to reconciliation.
David Arturi: Well, you especially number 2, I think on our second point, there is interesting,because, you know, we will talk to firms, and they do one big data cleansing all this money,all these resources, and then they go back to doing what they were doing. Andthen that same vendor is in there 17 times. It’s like, well, it doesn’t really solve theproblem if you solve it once, and then you just go back to doing what you were doing. So, you know, if we could kind of just elaborate on that, what kind of practices have you foundsuccessful for maintaining that golden record or whatever you want to call that that’sthe perfect set.
Tiffany Perkins-Munn: Maintaining the perfect set of data.
David Arturi: Yeah, once you’ve cleansed it, let’s say, like just maintaining that high quality.
Tiffany Perkins-Munn: Yeah, it’s. Interesting, because I think that part of what we have to do is really think about is how do we enhance data cleaning and quality management? You know what I mean? So, we’ve seen. Like, we know that AI can be effective in assisting dataand cleaning data and managing data quality. But how do you keep it going, maintain it over time and I think there are lots of ways, but I I’ll talk about a few of them here. AI algorithms. So, this is all about automated data profiling and anomaly detection. So, AI algorithms,particularly when you’re talking about unsupervised learning models, automatically profile data sets and detect anomalies or outliers. And that’s the kind ofcapability that’s crucial for if they’re inconsistencies in data formats or values,if they are, if you need, you know, if they’re detecting potential errors or unusual patterns that might indicate data quality issuescontinually monitoring data streams for real-time anomaly detection.
That’s just one way that you keep. It’s like an automated, ongoing way that you keep your data clean. You also do intelligent data matching and deduplication. So, machine learning algorithms can improve the accuracy and efficiency of data matching and deduplication processes, right? So fuzzy matching algorithms, as most people know, can identify and merge duplicate records even when they contain slight variations or errors, but AI can learn from human decisions on matching to improve its accuracy over time. Those techniques are particularly valuable for maintaining clean customer databases or merging data sets from multiple sources.
David Arturi: Okay? Awesome. Yeah, that makes sense. Thank you. Just kind of go back to what you started with. You know, I know you wanted to help a lot of firms and figure out where to get startedright. These are firms of all sizes. So, I guess the question is, and something we come across, companies are always asking, how do I get started? Where do I getstarted? Is it? What does that look like? What do you typically advise.
Tiffany Perkins-Munn: So, you raise an excellent point about the ubiquity of data, the challenging challenges of data cleansing and I think prioritizing where to start is crucial for effective data management and the guidance that I would offer. I wouldn’t say this is a stair step method, I would say, choose the method that works for you. But mainly you start with a business-driven approach rather than trying to clean all the data at once. You focus on the data that directly impacts your most critical business process or decision, and that will ensure that your efforts have immediate, tangible value. You could also assess data criticality and impact. You could evaluate your data, could evaluate your data sets based on things like how frequently they’re used their impact on key business decisions, the potential cost of errors in this data, regulatory requirements or compliance needs. That’s another method that you might use to determine where you get started. Obviously, everyone knows about the 80/20 rule. The 80% of value comes from 20% of your data so identifying that crucial 20% and prioritizing it for cleansing could be another way. And I also think people should not try to boil the ocean. So, begin with a pilot project. Select a manageable data set. That’s important, but not mission critical. That will allow you to refine your processes and demonstrate value before tackling more sensitive data.
David Arturi: I think that’s a great point. Oftentimes we see companies try to boil the ocean, and they invest all this money and the returns not there and then, you know. They’re like, what do we just do.
Tiffany Perkins-Munn: And people who want to go right into like AIML, they could focus on data that feeds into AIML models. If you’re using those models, you can prioritize cleaning the data that those models rely on because clearly poor data quality, you know, garbage in garbage out.
David Arturi: Exactly. It’s funny that seems to be that comes up in almost every conversation garbage in garbage out one way or another.
Tiffany Perkins-Munn: Well. The idea, I think, though, is just to remember that it’s an ongoing process. It’s not a 1-time event. You start with a focused approach, you demonstrate value, and then you gradually expand your efforts.
David Arturi: Right? Because I think that’s an interesting part of this, at least from our experience, there’s a lot of internal marketing that has to go on with using these AI tools and ML, and all this kind of stuff, right? So, you have to prove value up the ladder. So, to your point, I think it’s critical to find that right, use case, start small, prove it works, and then get the buy in internally, because when you try to do it all at once. It’s too much, it’s too big of a bite. So, I’m going to move on to the next question here. So, preparing and blending data from diverse sources is critical for accurate and wholesome customer insights. Could you please shed some light on some of the internal or external sources organizations typically encounter.
Tiffany Perkins-Munn: There’s an extensive list, but I’ll just give you some of the examples that I’ve come across kind of over the course of my career. The most obvious internal data sources are obviously customer relationship management data like CRM systems like contact information, purchase history, customer service interactions, sales pipeline data. There’s also ERPs enterprise resource planning systems where you can get information about order details and inventory levels and shipping information and financial transactions and point of sale systems where you get transaction details and product preferences, In-store behavior data and some of the most obvious like website analytics. Who’s viewing the page? Who’s clicking through? How much time are they spending on the site? Are they doing the call to action and converting? Same idea with mobile app data usage. Are people engaging with the app? Are they doing in-app purchases? What features are they using, and how are they using them? Customer support logs like call in, you know, people call in, there are support ticket details, their chat logs, their call center, recordings. There’s, oh, there’s our old faithful email, obviously, like open rates and click through rates and subscription preferences. Many firms and organizations obviously have loyalty programs. So how are people accruing points? How are they redeeming? What are some of the members’ preferences? Program engagement levels? Their feedback surveys like customer satisfaction surveys net promoter score which everyone’s big into product feedback like how, you know. Then some of the external sources. Well, they’re the obvious ones, like the social media platforms. How often is the brand mentioned? It’s like sentiment analysis, user generated content. Are you engaging with influencers? Lots of different 3rd party, you know, research like, there are those industry, like reports or competitor analysis or information on public um on market trends. Public records like census data, property records, business registrations, things like that bring me to information like credit, bureaus, credit scores, financial history, debt levels weather, data, geolocation, data. Economic indicators, you know, like GDP growth, unemployment consumer price. I think just thinking broadly, like news and media, Government. Open data like public health statistics or transportation data. To say, I say all of that, to say that there are, without being exhausted, that there is a myriad of sorts of internal and external data that you can use, and you have to be creative about it. Now, will all that data speak specifically to your customers. You never know right? Because there are challenges in blending all of that data. Things like data format inconsistencies. How often is the data updated? If you get external data, and one set of data is updated quarterly and the other set of data is updated yearly, you’re going to have discrepancies. But I don’t I think there are also different privacy, which is a big issue in the data space. Obviously, privacy and compliance concerns bringing in these different data sets. There’s also which I think is a common issue, that a lot of problem people tackle, which is, how do you blend structured and unstructured data? Right? How do you handle real time and batch data. This is the Holy Grail. Obviously, you have information, let’s say, on one person. I have all this information, David, that I just mentioned to you, internal and external, and half of the information is conflicting, and it has conflicting data sources, but blending it. You know, require sort of robust data integration strategies. You must have advanced analytic capabilities and really a clear understanding of data, privacy regulations and ethical considerations.
David Arturi: Thank you, Dr. Perkins Munn, for joining me today and sharing your very valuableinsights. Look forward to catching up with you soon, hopefully, we talk in the near future and thank you to everyone who listened to our discussion today. To learn more about Lydonia, please visit Lydonia.ai. And for more interviews like this one, please visit CDOMagazine.tech.Thank you all so much, and we’ll see you soon.