Home » Article » 3 Types of Fields to Exclude From Your Salesforce Einstein Predictions

< Back to all posts

3 Types of Fields to Exclude From Your Salesforce Einstein Predictions

By Ayori Selassi | November 16, 2018

If you’re reading this then it means you’ve already begun to blaze your trail with AI and Salesforce Einstein. You understand more fully the business value of leveraging machine learning and data to be more predictive in your business. You also probably identified a use case or two from the Big Book of Predictions that could provide immediate value to your business. So, now it’s time to build your first predictive model. A predictive model produces predictions that fill in missing information which can be used to support an action, recommendation or decision.

Fortunately, Salesforce Einstein provides the tools to make creating predictions easy, but there is still some special domain expertise that you as an Admin, analyst or developer will bring to these models. It’s your job to decide what goes into the model, and what stays out.

You’ve heard the term “garbage in, garbage out” before, right? Well, the same is true with predictive models. If your model is full of bad data, the insights will be bad as well. So let’s review a few key points about what data you should be careful to exclude from your models in order to get the best predictions and insights.

(Photo by Andrew Worley on Unsplash)

Sensitive Information

At Salesforce, Trust is our number 1 value, and proper handling of sensitive customer data is critical to maintaining that trust. That’s why new regulations such as GDPR have been designed to ensure consumers data is protected. But regulations alone aren’t enough. Everyone has a role to play to protect customer data, including those in charge of building and maintaining predictive models inside the CRM.

When building any predictive models you’ll want to be thoughtful to exclude sensitive data. So what qualifies as sensitive data? It depends on your business, but in general you’ll want to exclude things like: government-issued identification numbers, financial information (such as credit or debit card numbers, any related security codes or passwords, and bank account numbers), racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, information concerning health or sex life, information related to an individual’s physical or mental health, and information related to the provision or payment of health care or any other data that is deemed too sensitive to include in a predictive model.

Legacy or Stale Data

If you’re lucky, you likely have a governance process in place that defines all the fields tracked inside Salesforce and denotes whether or not it is legacy data. This governance process would clearly state the update frequency and the shelf life of data. But, if you’re like most people that governance process is still on a wishlist somewhere—whomp, whomp… Well, that means you’re going to have to use other means to identify what data is legacy or stale because when it comes to creating predictions, if the data is stale then, well, it’s bad. And, you don’t want bad, stale data from the past powering your predictions about the future.

(Photo by Daan Mooij on Unsplash)

Leakers

Okay, this one is less intuitive but equally as important so, read this carefully: A “leaker”, often referred to as data leakage, happens when you train your model on a dataset that includes information that would not be available at the time of prediction. Basically, a piece of data that only shows up after the question you are trying to predict the answer to has already been answered. This can be tricky for a model because on paper, leakers look like highly correlated or really good predictive signals. But they produce unrealistically accurate predictions. Basically, it’s like bringing an answer sheet into an exam. To avoid leakers, remove any fields or field values from your model that would not be known at the time of the prediction.

The good news is that when you are the domain expert it can be a lot easier for you to figure out what data is relevant to the process you are trying to predict than someone who isn’t as close to the business. To do this, consider constructing a timeline for the process you are trying to predict and mark any data that gets populated after the process is completed as a leaker. For example: say you want to build a prediction for lead conversion, and you know after a lead is converted the “converted timestamp” is set, but before conversion, the field is always blank. That means you would just exclude the converted timestamp from your model, and you won’t have to worry about it leaking into your predictive model and corrupting your good predictions.

Legacy, sensitive data and leakers, oh my! Whew, that was a lot! But guess what? It was totally worth it because now you are a lot more prepared to blaze your trail with Einstein. So let’s get cracking! Check out trailhead.einstein.com for a whole host of trails to help you get started.

And be sure to check out the #BeAnInnovator adventure to build your own AI-powered app with Salesforce Einstein! You can read all about how to get involved right here.

Share this story!

ABOUT THE AUTHOR

Ayori Selassi

Ayori is a Product Marketing Manager, Salesforce Einstein (AI for CRM) at Salesforce, and a patent holding inventor. She wakes up inspired every day to lead growth driving go-to-market strategies, especially for Einstein Prediction Builder, and through leading keynotes, demos and workshops that make you say “wow”. She leads partner strategy for Einstein, works closely with SIs, ISVs, and loves presenting to developers and admins just as much as executives in the Salesforce Innovation Center. She is also the founder of BoldForce, an employee resource or affinity group (https://www.salesforce.com/company/equality/ohanas/) for employees of African and African-American Descent.

MORE POSTS FROM THIS AUTHOR

3 Types of Fields to Exclude From Your Salesforce Einstein Predictions

Related Posts

5 Essential Questions Salesforce Admins Must Ask for Effective AI Solutions

By Mike Gerholdt | May 13, 2024

You know artificial intelligence (AI) is officially everywhere when your 73-year-young mother asks you if AI can create a birthday card design for one of her friends. Which is another way of saying, it’s most likely crept into your business conversations with stakeholders and users. If not, then let’s be proactive Salesforce Admins and get […]

READ MORE

How Salesforce Einstein Is Supercharging Mobile Experiences.

How Salesforce Einstein Is Supercharging Mobile Experiences

By Keith Samuel | November 15, 2023

While its impact is widespread, one of the most exciting aspects of artificial intelligence (AI) is its ability to create conversational interactions that generate personalized experiences, supercharging productivity and efficiency. In this blog post, we’ll explore how the implementation of large language models on mobile devices is reshaping the enterprise mobile landscape and how Salesforce […]

READ MORE

Einstein standing next to text that says, "How to Use Generative AI Tools to Write SOQL Queries."

How to Use Generative AI Tools to Write SOQL Queries

By LeeAnne Rimel | July 3, 2023

Salesforce Object Query Language (SOQL) is a powerful tool that allows you to retrieve data from Salesforce. You can use SOQL to query any Salesforce object, including custom objects, custom fields, and user permissions like profile and permission set perms. As a Salesforce Admin, I know that writing SOQL queries can be a pain. Not […]

READ MORE

Trailhead

Learn in-demand skills that lead to top jobs with Trailhead,
the free online learning platform from Salesforce.

Get Started for Free

Trailblazer Community

Connect, learn, have fun and give back with #AwesomeAdmins across the globe.

Join the Community