Understanding the Importance of Data Health in Salesforce

By

Today on the Salesforce Admins Podcast, we talk to Mehmet Orun, GM and Data Strategist at PeerNova. Join us as we chat about why data health is easier than you think and what you can do to get started.

You should subscribe for the full episode, but here are a few takeaways from our conversation with Mehmet Orun.

Healthy data drives business outcomes

We talk a lot about getting your data ready for AI, but there’s a simpler question you need to ask yourself: is your data driving business outcomes? After all, AI insights are only as good as the data they’re based on.

That’s why I’ve been looking forward to this episode with Mehmet Orun. He recently gave a presentation about all this and more, entitled, “Harnessing AI: Strategic Planning & Data Best Practices for Salesforce Success,” and I was able to grab him for a quick conversation how you can improve data health in your org.

Questions for a foundational data health check

If you’re cooking, you want to make sure that you have the basic ingredients and enough space on your countertop. And the same is true with your org. You need to have your data health squared away before you can cook up something tasty.

For Mehmet, a foundational data health check starts with asking three questions:

  1. Do you have any objects that are close to or past their limits?
  2. Are you retaining too much data in your CRM that you don’t use?
  3. Do you have unintentional duplicates in your solution and do you know where they come from?

You want to zero in on which data matters for which specific business need. You don’t need it to be perfect, you just need a solution that is good enough to do what you want it to do.

How to get started with data cleanup

Every org is going to have some duplicates, and Mehmet recommends thinking through a few things about how data works in your business before you merge everything. Is there a business reason to have duplicate records? Do you have other information in objects or fields that can help you decide whether to match or merge?

Above all, Mehmet wants you to know that obtaining good data health in your org isn’t as difficult or time consuming as it sounds. There are free data profiling tools on AppExchange that can help you get most of the way there. So what are you waiting for?

There’s a lot more great stuff from Mehmet about what to look for when you’re doing a data health checkup, so be sure to listen to the full episode. And don’t forget to subscribe to hear more from the Salesforce Admins Podcast.

Podcast swag

Learn more

Admin Trailblazers Group

Social

Full show transcript

Mike:
We talk a lot about data readiness and getting ready for AI, but let’s take a step back. Is your data really driving business outcomes? So that’s what we’re going to talk about today on the podcast, and I am bringing in Mehmet Orun, who is the GM and Data Strategist at PeerNova. I mean, just looking through his LinkedIn profile, he has a ton of publications and a ton of patents. I actually don’t think I’ve ever had anybody on the podcast that has had patents. And I should have asked him about that. So spoiler, I don’t ask him about patents. But we’re going to talk about getting your data ready to drive business outcomes. You know what? Even if you’re not ready to use AI, this is still a good podcast for it. So with that, let’s get Mehmet on the podcast. So Mehmet, welcome to the podcast.

Mehmet:
Thank you, Mike. It’s a true pleasure to be here.

Mike:
Yeah, well, you ran into colleague of mine at World Tour London. And well, I mean everybody’s talking to AI and you’re talking to AI and data. But before we get into that, why don’t you give me a little bit of a brief history of how you got into the Salesforce ecosystem?

Mehmet:
So before I was a partner, I was a Salesforce employee. Before I was an employee, I was a customer. I worked for Genentech, which is a biotech company, for a period of time. And what was interesting about Genentech was our CEO was a scientist. We looked at problems like they were clinical trials. You formed a hypothesis. In a safe way, you chose to assess if that hypothesis was going to be true or not. And then we would look at how can we solve it at greater scale. What that meant was when we were getting ready to launch a new set of products, and the enterprise architecture was going to be shifting from 150 or so disconnected applications, this is 20 years ago by the way, and the story today may sound much the same for many customers and companies, we wanted to bet on a new CRM solution, rather than the homegrown or the older technology ones.
And Genentech became the first life sciences company to chose Salesforce. Because the idea of not needing to spend time just working on an upgrade, rather than solving business problems, made a lot of sense to us. There were a few challenges, like a contact model didn’t really work for life sciences, because we are really engaging with a doctor or a prescriber who may teach at a university hospital, they may see patients at a different facility, they may have their own practice. By the way, this is why person account was born. If you’re curious about the trivia, happy to dive into the details.

Mike:
You need one of those shirts. “I’m the reason person accounts exist.”

Mehmet:
Yeah, I’m not sure how popular it may be, but maybe I’ll submit to shirt force.

Mike:
Yeah, you never know. Might try. So I’ve had Liz Helengo on the podcast, we’ve talked about data quality. And you have a great presentation out there, Best Practices for AI Ready Salesforce Data. Do you think people’s Salesforce data is AI ready?

Mehmet:
From what I have seen, and I do engage with many organizations still, neither the data nor metadata is AI ready, vast majority of the time. Now the question of readiness is interesting because it depends on how far you want to go. What is it that you’re trying to solve or accomplish? If you just want to see if you can get recommendations, it’s a proof of technology, great. You can definitely use it. If you’re trying to get consistent answers based on reliable data, and make sure it is behind the trust layer, at a minimum, organizations need to do an assessment of the current state of their data and metadata, and make sure that their architecture is going to meet their needs, not just today, but on an ongoing basis.

Mike:
One of the questions that you ask, and I think this is pretty paramount, because anytime we talk about data and data cleanliness, is oh, I’ve got to look at everything. And there could be some objects that have two 300 fields on it. Lord knows why, but there’s a lot of fields, right? Because we’re capturing everything. One of the things that you point out, is how do I know if it’s good enough data to drive business outcomes? And I think that’s second part, that clarifying part, is really important. Because when we’re looking at data, yes we need to look at everything. But what is the data that we really need to have perfected to drive a business outcome? So what should admins be looking at?

Mehmet:
Before diving into data that matters to business outcomes, one of the things I suggest is what is the foundational data health of your org in general? And I use cooking or dance analogies. Usually I’m [inaudible 00:05:27].

Mike:
I use cooking analogies too.

Mehmet:
Great. So if I’m getting ready to cook a big meal, I want to make sure I have the right ingredients, and the ingredients I have are also fresh. They haven’t passed their expired date. I want to make sure that I have enough space on my countertop. Not everything has to be cleaned, not everything has to be put away. I don’t need to have every single ingredient up there, but I need to have just enough. So when I mean a foundational data health check, we should always know, do we have new objects that are close to or past their limits. You mentioned two or 300 fields. I have seen 900 custom fields, which is the upper limit.

Mike:
I was trying to be nice.

Mehmet:
Salesforce platform is incredibly flexible. We can add packages from AppExchange, which we install [inaudible 00:06:21] custom fields at times. And then after a while processes change, people change, new people come in, we stop using fields that we used to. Or perhaps fields were added, but we weren’t quite sure what they were going to be. User adoption head gaps. I think you can find many parts of this, but if your org is more than five years old, your foundational objects, account contact case opportunity probably have 25% of custom fields or more that haven’t been used in the current last one or two years. So one aspects of foundational data health is about understand if any of your objects are nearing or at their limits. Number two is are you retaining too much data in your CRM org, because that is going to be part of what data you want to act on.
If you have rubbish data or if you have data that has outlived its usefulness, archiving solutions are great. And the third piece to be mindful of is do you have unintentional versus intentional duplicates in your solution. Just looking at those three areas is going to give you a sense of data consistency, data completeness, data relevance risks. Once we look at that, then it is a matter of looking at what is the fields that matter, What is the data that matters to add specific business need at a point in time. I’m happy to dive into more details, but do you have any questions on the foundational data health outline I just gave?

Mike:
Well, I think you mentioned duplicates. So I’m an admin, I’m looking at my data, and I find duplicates. Where should I start talking to understand are these intentional and good, are these intentional and bad, do I need to deduplicate? What are the types of questions, who are the stakeholders that I should be looking at to understand if we should have duplicates in our system, let alone not even talking about looking at other systems?

Mehmet:
Yeah. When I talk to people, let’s say that you’re an admin for… You can make up a scenario.

Mike:
Sure.

Mehmet:
So how did you find out about the duplicate problem, and can you describe to me what is the problem these records are causing on your end users? The reason I start with that question is I am listening for the answer that is telling me whether stakeholder impact is well understood, and what is the nature of that impact that can really help drive the type of solution we could put in place. Time to value is something that’s going to be quite important, as well as seeking to avoid nonreversible fixes. Because many solutions are not going to be 100% right. Especially when it comes to match [inaudible 00:09:23] type scenarios.
A common challenge is let’s say that it’s a call center operation and we have a lot of context, but the data is distributed, which means it may be out of state, information may be incomplete, I would often ask the question, “So what if regardless of how many duplicates you have, every single record you click on shows the exact same transactional history? Would that solve your business need?” Or if it’s a marketing challenge and they are concerned about consent and compliance, and they are unsure about which of these values should we pick, I would ask a question, “That’s great. Do you have the policies in place on how would you approach these different related records?” And the question that I get incomplete answers most often, is, “Do you know why you have these duplicates, and if you are supposed to have some of these duplicates by design?”

Mike:
I can only imagine the look on people’s faces when you ask that.

Mehmet:
Well, their examples help. I’ve written a few articles on that and I sent people pictures, and asked them how this could relate to their line of business. One of the things I love about how Salesforce talks about solutions is they put a person in the middle surround by the icons of the era. Every industry can use that mindset and think about their interaction with an individual or with an organization. The reality is, whether you’re a nonprofit, whether you’re a consumer company, you are a B2B company, you are likely to encounter the same individual or same organization in more than one business context.

Mike:
Yeah, Very true.

Mehmet:
There’s a high risk in being overzealous in approaching duplicates, that I worked for Salesforce in the past, I worked for Genentech in the past, I work for PeerNova now, I’m involved in the trailblazer community. I Mehmet as a single human being, have at least four different business contexts in my engagement and relationship. So if you try to combine and merge all four of the records into one, first off, which email address do you take or keep, given the nature of the CRM data model? But what is some of the interactions were contact specific, account specific? Are we going to introduce more risks or would we be better off recognizing all of these records are associated with one person, and then use the contact record when it makes sense, use the individual record when that makes sense? Is the example helpful?

Mike:
No, it is, because I think that’s what a lot of people run into, is you run reports and you look at the data very, I don’t want to say abstractly, but you try to look at it very black and white, and say, “Well, there is four Mehmets, so we should merge them. There shouldn’t be four.” But you bring up a very important point, is the associated, let’s say account for this person, really brings context to what you were discussing with that person at the time. Which is lost when you merge it all. Because to your point, all of those activities would just merge together, and it’s like it wouldn’t make sense. It’s like, so we talked to them one minute about partnering and the next minute about this, and it’s like, wait, why was this happening? And you’re losing the context of where this individual contact was employed at. So I think that’s important. Those are the questions that people have to have, is yes, that is one person four times, but the context of what was our relationship with them is very important.

Mehmet:
One of the other aspects is when we’re looking at the records. I think people jump into, “Oh, I know they are duplicates because they have the same name and email, or they have the same name and address, or they have the same name and LinkedIn profile. Whatever it may be.” It is incredibly important to look at the object as a whole, to look at the fields as a whole, for three reasons. There may be fields, record types, types, some other custom field for classification, that actually indicate this person, this organization is playing a different role. That may be the basis of what else to include in a match role. So if the context is different, you may want to match them, but you may not want to merge them. Or you still want to match them, but you want to create a unified profile in data cloud.
Number two, there may be other fields that you can use, that increases matchability of that particular record. When I talk about account matching, I often say account matching is not a string matching problem. You are not trying to match Salesforce to salesforce.com or Salesforce Inc. What you are trying to do is understand Salesforce in San Francisco at 1 Market Street, which is the old address, is the same location as the new headquarters. Salesforce in Bishopsgate, London is part of Salesforce corporate hierarchy, but it’s a distinct entity and subsidiary. By the way, Slack in San Francisco, completely different name, is also a legitimate distinct but related account record.
If you don’t have the depths of the B2B domain, let’s say that you’re a new admin, but you profile your account object, you may discover there are other fields that are not standard fields. They were brought in by a managed package, let’s say D&B connect or BVD Connect, but then you see fields like dance number, global ultimate dance number, that have a high population rate, but low distinct rate. Maybe you can use these fields as part of your match rules also, and discover that you have a lot more attributes at your disposal than just name and contact points.

Mike:
Right. Yeah, it’s really diving deeper.

Mehmet:
Absolutely. And the third and final reason, and I ask this question to everyone, “Mike, what is your favorite fake email address or phone number?”

Mike:
I can’t tell you.

Mehmet:
Without exception, every single org I’ve analyzed, either had invalid or fake contact points in it. What is invalid? Maybe it is sales@companyname.com, or supportedcompanyname.com. They didn’t have an email address, it was a required field. They just put a group email, or perhaps they put their own. Na@na.com. Noemail@noemail.com. If we do not discover the data content that may also throw off our match result, not only we may over merge where the contacts needed to be separate, we may actually incorrectly match and merge accounts and contacts the way they should never be. So we started this from duplicate management. I know the session is for data reliability and not just for AI. At the end of the day, we want to discover what is knowable with statistical techniques with data profiling, as much as we can. And once we determine that we want to define what an experiment would look like, how would we know for certain is this the outcome we’re looking for, and then drive it forward?

Mike:
Yeah. No, you’re right. One thing you bring up, and I’m going to ask, maybe it’s a bit of a facetious question, but I’d be curious what your answer is. Do most organizations have someone responsible for data quality?

Mehmet:
I think most organizations have someone that cares about data quality, but that doesn’t mean they’re necessarily responsible or empowered.

Mike:
What’s the difference?

Mehmet:
I have been in orgs where let’s say there’s a data quality manager, it’s an independent role, it reports to the business, sounds great, but it is outside of the org hierarchy where the CRM administrator is reporting into. Even if they get long, if the CRM administrator cannot act on requests, unless it is associated with a specific project task, there tends to be delays or friction. Because I don’t see a lot of organizations saying we need to launch a data quality initiative. Most of the initiatives are business initiatives where data quality assessments, verification, and as needed improvement should be a part of it. But if your job is to ensure data quality is good, if you are not authorized to be able to initiate projects that can then be prioritized, you may not even be able to get an AppExchange package installed in a quick and timely manner.
Now on the flip side, you may be an admin and you have the rights and you are close to the system. You may not know that there are tools and techniques out there that helps you discover whether that field that was so urgent that you just edit and rolled out, is being used at all. Tracking user adoption of fields be rolled out, pick list values be rolled out, is something admins ideally would and should do if they’re informed by effective techniques, and if their [inaudible 00:19:26] allow them to not just add a field but put in place the processes to monitor the usage of that field.
Honestly, one of the reasons I’m most excited to be on this podcast is to be able to talk about these things being not only possible. But fairly easy and not time-consuming. So we can broaden the conversation on how do we make sure the good work admins put in is actually being impactful. And admins can even be more empowered to monitor what is being used, what is not being used, what is being used poorly. To be able to raise these to their stakeholders and drive that level of awareness, so they’re being more impactful on any line of business.

Mike:
No, I understand. Okay, so if I’m hearing this, depends on how big my backlog is and my requests are for new features, in your opinion, how much time should admins be spending ensuring data quality is happening in their org?

Mehmet:
I don’t know that I can answer that with number of days or percentage of time, as opposed to when should they look at their quality and act accordingly. Because each work is going to be a little bit different. One of the things I believe in, is if I’m a new admin, and you mentioned this earlier, she has a great LinkedIn post she did on what is the first thing you look at as a admin in a new org, and the answer is very broadly, “I like looking at for the foundational objects, what can I tell about the usage in current plus one year versus the life of the object?” That’s a starting point data profiling scenario for me. And the reason is when I look at accounts, contacts, opportunities, and cases alone, or let’s throw in leads for good measure, it’s going to give me a sense of how well adapted is this org.
It’s a really good baseline. I want to know what fields are not used or no longer used, what fields appear to be used but not really used, because they only have the default value. The number of times I see 100% populated fields with one and only one value, is pretty significant. And to me that means it either is driving code somehow, or someone has set up a field with a default value and never looked at it again. I then look at what is my foundational health, and with the right tools on AppExchange, you can get much of these insights in a single business day. Then you have the ability to have a conversation with your manager, with your stakeholder, that is about starting the job and having an understanding of the foundational health. The other piece I look at, is if I’m starting on a new project, and my role as an admin is supporting the needs of that project, I’m going to focus on a scenario that is specifically for that.
Maybe we have HR cases, customer cases, and partner cases in our org, but this project is just about customer cases. I’m going to want to look at what can I tell about the cases that are coming in that have caused successfully or unsuccessfully, whatever is the definition for my business. And I want to look at the fields that are being consistently populated with high fidelity, and then compare the difference between successful and unsuccessful outcomes. The way admins can minimize the amount of time they are spending analyzing data, is reports are great, but creating reports just on field rates are incredibly time-consuming and not scalable. There are great free data profiling tools that are 100% native on AppExchange. Start with one and start running different scenarios to see what you can tell about the state of your data. And then the best way to make sure that you don’t have to keep checking is, set up and monitoring scenario.
Salesforce CFJ has been talking about the importance of profiling, cleanup, and monitoring for a long time. When I go to roundtables, I see almost no one monitoring their data reliability. And with flow, with the right profiling tools, it is something you can very easily configure, and detect deviations whether your sale rates are going down, or you used to capture an active pick list value, it’s no longer being picked up. Send a targeted alert based on understanding the fields that matter to a particular outcome. And I think three to six months, from the beginning of this journey, people are going to start noticing a higher level of either user or admin engagement.

Mike:
Yeah. I also like you point out the idea of a data owner. I think that’s important. That’s something that admins when they’re meeting with stakeholders, can sit down and really kind of empower one or two, maybe multiple people, within a team, along with the stakeholder to really kind of be the overseer of that data. And these can be that next level admin, maybe people that are looking to move up into the organize, and take ownership of that. I think that’s a powerful idea.

Mehmet:
And the nice thing about what you mentioned, is it could be an admin who wants to increase their scope and impact. It could be somebody in a line of business. At Salesforce, for example, the owner for account and opportunity fits within sales operations.

Mike:
Makes sense.

Mehmet:
Yeah, because that is where you’re going to be closer to it. And if I recall, the ownership for contact and lead, set in the marketing organization. Because that is where you’re wanting to make sure you have a holistic understanding from lead to contact, and you’re also being consistent and compliant. For shared entities or when you are starting new, an admin would make sense, especially admins that are close to their business, and know what data matters or not. It is about increasing impact. And for anyone that doesn’t know, you can capture data owner along with data sensitivity as part of your object manager and CRM metadata. A lot of people do not seem these attributes exist.

Mike:
Yeah. As we kind of wrap things up, we talked a lot about the doing. And sometimes ironically we get caught up on the doing, and we forget to actually look at what the goal is. And so how do you define success when you’re doing data cleanup? I mean, I’m sure there’s multiple ways to define it, but what are things that admin should look at in terms of creating that definition of success so that they can show progress to the stakeholders that they’re making their way towards AI ready data?

Mehmet:
I was lucky to have a mentor that would say, “Unless you can define how you’re going to demonstrate success at the end of your project, you’re not going to start working on it.” Now, we don’t always have that luxury, but part of it is to be able to say what do we need to demonstrate differently. We started the conversation with duplicate management. And if people are seeing too many duplicates, and the concern is inconsistent data when they look at one record versus the other, perhaps the definition of success is they see consistent, complete, correct data. Which makes it not about merging anymore, by the way. It’s about data consistency and correctness, which is what is impacting the end users. And if you think about it that way, we can now start taking about hiding records over time. Because everyone is already looking at information the same way, rather than taking the riskier task of merging records and then worry about, “Can I on unmerge?”
If it’s about an AI outcome, how would we know and users are going to be able to rely on the information? AI is not just one flavor of technology. We have deterministic solutions, we have probabilistic solutions, right? We have Einstein Discovery as well as Einstein Copilot. So at the end of the day, can we define a process that is human repeatable, to then demonstrate how this is being automated at scale? This is one of the things that AI is very good at. If it is going to be about judgment calls, an AI may or may not be as good at it, so we need to look at what is that feedback loop that can also be provided back to an admin. And sometimes data readiness is about having just the right data and just the right metadata you need for completeness sake. Einstein Copilot leverages the field description metadata in finding what fields to look at for information. Sensitivity classifications are also important.
And sometimes you need to add a few additional fields in order to inform what AI could do for you. Just last week, gave a brief presentation on what if we can leverage copilot to inform end users that while their opportunity cost probability is at 75%, because as you move it along the stage, it updates the probability percentage, AI could tell you when that opportunity was actually at risk. It says, actually your risk of closing on time was 50%, 75%, whatever it may be. The idea of adding formula fields that most admins know about, to assess record level data quality, is something you can actually define, and then feed into your prompts, so you can look at information completeness at the record level, and then use that to inform your end users. The key message here is sometimes data completeness is about knowing what to remove, and sometimes it’s about knowing what to add. And it all has to be about specific business use cases and specific business outcomes at the point of customer engagement.

Mike:
Yeah. Oh, there’s never a simple answer, is there?

Mehmet:
Rarely, and I think this is what makes this a fulfilling journey. None of us have all the answers, but there are positive patterns and anti-patterns out there. I love reading the admin blog and listening to this podcast, I love reading articles on Salesforce Ben, going to Trailblazer community events, and in person get togethers. Because we shared stories, we complain, but then we make suggestions on, “Have you considered this way of approaching it?” And this is how we keep learning and how we keep being better.

Mike:
Yeah, I would agree. I think it’s much like when earlier this year I talked to David about puzzle solving, and sometimes it’s like you literally just have to sit down, put the puzzle down, give your brain a break, and then come back to it refreshed, and with a different perspective, and that changes everything. So I would agree. Mehmet, thanks for coming on the podcast and talking about a different perspective to AI readiness, than what I’ve already covered. Because I feel there’s a lot to cover, so I appreciate you sharing your insights, and getting us hopefully AI ready.

Mehmet:
As you said, it is a journey. I hope this conversation helps all the listeners on what are some of the things to consider, right? It is not visual, we are not pointing to a roadmap. Much of this is really a mindset. And if anyone is curious about furthering the conversation, I am happy to be a part of that conversation. Feel free to reach out to me.

Mike:
So that was a fun conversation with Mehmet. I love the idea of a data owner. I don’t know why I haven’t thought of that. Somebody that works with the stakeholders in every department, and kind of owns the data, right? It’s like when you get a puppy, making sure that somebody is always going to keep their bowl of kibble full.
I guess the kibble is the data in this scenario. That’s the best I can come up with, but I really like that idea. I think that’s something that we and Salesforce administrators are doing our quarterly check-ins with our stakeholders, and talking about business objectives. I think that’s something we should start bringing up, and really having that conversation even with the larger organization, as we branch out and maybe bring in data cloud, and have the conversations with IT. Data owners. That’s the next thing we need to be talking about. But anyway, if you love this episode, and I did, I thought it was great. Because it’s more than just really reducing duplicates and figuring out good data and bad data, as you heard. But let’s go ahead and just share this episode. You do me a favor, just share it. Just click share in whatever podcasting app you’re listening to, and then that way you can send it to your friends who are maybe thinking about doing some data stuff.
I promise you everybody’s doing data cleanup. Now, Mehmet mentioned some things and some links. I’ll be sure to put those in the show notes as always. And of course, if you enjoyed this episode, there’s tons more episodes. Everything can be found admin.salesforce.com, which is just your one stop for everything Salesforce admin, including a transcript of the show. Now, if you want to join the conversation, there is the Admin Trailblazer group, and that of course is in the Trailblazer community. Of course, the link is in the show notes there. So with that, until next week, all of you data fans, I will see you in the cloud.

 

 

Love our podcasts?

Subscribe today on iTunes, Google Play, Sound Cloud and Spotify!

What Are the Key Features of Salesforce’s Model Builder?

Today on the Salesforce Admins Podcast, it’s another deep dive with Josh Birk as he talks to Bobby Brill, Senior Director of Product for Einstein Discovery. Join us as we chat about how you can use Model Builder to harness the power of AI with clicks, not code. You should subscribe for the full episode, […]

READ MORE