Spring ’15 introduced Duplicate Alerts and Blocking, powered by Data.com. (Check out The Essential Guide to Duplicate Alerts and Blocking). We had a chance to sit down with Andy Louca at Thomson Reuters to discuss their work around duplicate management. Andy and team participated in the pilot and beta of the Data.com Duplicate Alerts and Blocking and had some great insights to share on how they’re approaching this challenge.
The Duplicate Alerts and Blocking capability is just one of a number of approaches their team uses to manage data quality. Andy looks at it in terms of “creating an ecosystem around duplicate management.” As different scenarios have different needs their team leverages a number of tools, workflows, and processes to ensure overall quality.
Here’s some of the tips Andy shared on duplicate management.
1. Know the scope of your problem
Your first step is knowing the scale of the problem. Andy describes it as going in “with your eyes open.” Get consensus throughout your business on what the most important attributes are. You need to have all groups involved to get consensus on: what is the minimal amount of data; what is considered an active contact; and what is a lead vs. a contact. And remember, as Andy says, “what the business considers important is likely to change over time.”
With an agreed upon standard, you can approach the challenge as their team did by actually testing to see where the greatest challenges are. Your investigation should drive the momentum for what you tackle first. It’s important not to make those decisions just based on the perceptions of those involved. Make your decisions based on the results of your testing.
2. Clean, and keep cleaning
Cleaning is an ongoing process. Your data may come from many places (acquisitions, lists, data entry, etc.) — and the data governance may have varied over time. Look at clean data holistically, it’s not just about duplicates — it’s about architecture, governance, cleanup. Tools like Data.com help with data cleaning and enrichment. Thomson Reuters leverages the D&B D-U-N-S® number, provided by Data.com, to help maintain consistency. Creating this consistency helps with resolving duplicates.
3. Stop duplicates
Your rules need to reflect your organization’s needs, but here’s an idea of the rules Thomson Reuters are leveraging. For accounts, they’re matching based on name and post code. Leads have two matching rules. The first is based on email. The other is to help maintain quality for the data coming from the marketing automation system and relies on first name, last name, and company name. For contacts, they leverage a unique email field hidden in the system, separate from their workflow around contacts.
4. Measure and adjust
The further you go into the data, the more difficult the cleaning process becomes. It’s important to measure how effective you are at preventing duplicates and how effective your cleaning actions are on the historical data. Andy’s advice is not to get hung up on the actual percentages you’re targeting. It’s more important to know what the biggest problems are and build in flexibility. When you’re looking at two duplicates, making the call on what data survives in a record can be difficult. For some, they don’t get rid of them, they look at which are getting used before determining the right course of action.
“People underestimate the importance of having clean data,” is a sentiment Andy shared that we see frequently. In Andy’s case they’re working across verticals, giving them a very broad dataset that can be challenging to “get your arms around,” and his instance of Salesforce is just one of many across the enterprise. As Andy puts it, “the goal is to create a Salesforce environment everyone can use.” While many admins focus on functionality required to support the business, data is the “foundational piece.” As Andy says, “if you don’t have the right data, the other things are impossible.”