AI for Admins blog series post on Einstein Case Classification

AI for Admins: What You Need to Know to Make Einstein Case Classification a Success

By

Many Salesforce Admins are looking for ways to help scale their organization’s customer service experience. This can include finding solutions to increased case volume and optimizing agents’ time so they can solve customer issues faster, instead of just triaging them. One way to accomplish all of this is with Einstein Case Classification. So let’s take a look at what you need to know to make this a success at your organization.

What is Einstein Case Classification?

Einstein Case Classification (ECC) is part of the Service Cloud Einstein suite of products aimed at empowering your service agents, alongside Einstein Article Recommendations, Einstein Reply Recommendations, and Einstein Case Wrap-Up (ECC’s little cousin, coming out GA this Summer ’21 Release). ECC is included in base Enterprise Edition and Unlimited Edition licenses with some limitations.

By leveraging historical cases in your org, ECC builds a predictive model that’s used to recommend, pre-populate, or directly save field values for incoming cases. It saves your agents precious time, which they can then dedicate to helping customers resolve actual issues. ECC fast-tracks the triaging and routing steps of your response, greatly improving both agent productivity and the customer experience. With high volume, agent productivity gains add up quickly. One of our largest customers, for instance, reported 34,000 hours saved in four months!

It’s really easy to turn on ECC in your org. See what the steps are in this Trailhead module. You can also try it out using one of our scratch orgs here (Select Classify Citizen Requests under Service Cloud Einstein). Follow the instructions to classify a subset of citizen requests received by the city of Baton Rouge on its 311 service in no time! (Thanks to the city of Baton Rouge for making the original dataset freely available through its Open Data BR initiative.)

Screengrab showcasing how you can turn on Einstein Case Classification.

How does Einstein Case Classification work?

If you followed along in your org or using the scratch org with data from our friends in Baton Rouge, you saw that setting up ECC was a quick process. With a few clicks, you configured it to learn to classify the Case Type and Parent Type of incoming emails. Note that you could have selected up to 10 fields to be classified, and set up filters for training and scoring. And, done!

Behind the scenes, ECC relies on the last six months of closed cases to train its predictive model. It requires a minimum of 400 closed cases in the last six months to build successfully. Three fields are leveraged to train the model and classify cases: the case subject, the case description, and the field to be predicted. They go through a natural language processing pipeline where the language of the case is identified first, then some generic and language-specific text processing is performed (stemming, dropping stop words, etc.). Finally, word frequencies and co-occurrence for the processed text are generated. A separate model is built for each field. In fact, for each field, multiple machine learning models are built with multiple parameter values — and the best one is automatically selected in a “tournament.” This tournament is a monthly event as models are retrained every 30 days, so they’re always up to date if your business changes.

You can monitor performance via the built-in Performance Dashboard or via the new Einstein Case Classification TCRM Performance Dashboard for finer-grain analysis.

Screengrab showcasing what a Performance Dashboard looks like.

What is the typical timeline to roll out Einstein Case Classification?

The amount of time varies, but the steps are the same:

Timeline of implementation for Einstein Case Classification.

Assuming you have enough volume and a clear picture of which fields would be most helpful to predict, setup is pretty quick. Some customers are up and running with recommendations for agents in production in just a couple of weeks, while moving to automation requires more time. I recommend getting the business stakeholders on board right away, and identifying which KPIs you want to track to assess the success of the rollout from the very start.

You may want to begin in a sandbox to get a feel for the tool and iterate on your model. Refining your model is a key step that you don’t want to rush. It’s common to identify some issues with the data that can impact model performance: dirty data, duplicates, or intricacies of your case taxonomy.

You’ll need to move to a Production org to get qualitative feedback from agents, start measuring actual accuracy of the predictions, and measure ROI. Most customers start with a limited Pilot deployment to, say, five to 10 agents for a month, before deploying more broadly. Listen carefully to these initial sets of users. ECC is designed for them, so it has to make their lives easier! Start in recommendation mode, and monitor accuracy and agent feedback for some time before setting up auto-triage.

Which fields to set up for auto-triage and which thresholds to choose for “Select Best Value” and auto-triage depends on your business process. It also depends on the cost of a case potentially being misrouted versus being in a queue, potentially for a long time, waiting to be reviewed. The lower the thresholds, the more cases will have recommendations/be auto-triaged, and the more cases misclassified. The higher the thresholds, the fewer cases with recommendations/auto-triaged, but those that will be are more likely to be correctly classified. Most customers tend to choose a relatively low threshold for “Select Best Value” as the agent can always review and correct, and a higher threshold for “Auto-Triage” as misclassification can be more costly there. Note that this selection is per field, so you can select different thresholds for different fields depending on how they are used.

Once this Pilot phase has been completed successfully, most organizations expand their deployment relatively quickly, sometimes gradually turning on different markets or groups. They also start to look at setting up auto-triage so that predicted field values are directly written to the case when confidence is high, which allows for automated case routing following your existing routing logic, saving even more time!

How do I measure success?

Overall, ECC’s goal is to save your agents time, which will also save you money. If you have high volume, ECC will save you a lot of both. One of our largest customers, a major shipping company, recently measured that they saved €1 million in their first four months after deploying ECC! How did they figure that out? They used our new Einstein Case Classification TCRM Performance Dashboard.

Another screengrab example of the Einstein Case Classification TCRM Performance Dashboard showcasing how much time and money was saved.

The dashboard is free and available on the AppExchange, so feel free to give it a spin (a Tableau CRM license is required). In addition to revealing the business value you are getting from ECC for your business stakeholders, the dashboard will also help you iterate on your model by providing additional insights on its accuracy, show where errors are made, and help you slice and dice by dimensions of your choice (for example, different markets or time frames). We’re actively working on a second version of this dashboard, so feel free to ping me directly if you have feedback on how to make it even more useful for you and your business!

Dashboard view showcasing classification outcome and top values.

Dashboard view showcasing actual values vs. top recommendations.

In addition, most customers track a subset of the following KPIs to measure the impact ECC has on their service center. You’ll want to track the before and after here.

These KPIs should go down after rolling out ECC:

  • Average handle time to triage a case
  • Average time to resolution
  • Number of cases escalated
  • Number of case transfers

And these should go up:

  • Number of cases triaged or auto-triaged daily/weekly/monthly
  • Number of cases resolved within service-level agreements (SLAs)
  • Case data completion
  • Customer Satisfaction (CSAT)/Net Promoter Score (NPS)

What are some potential issues to keep in mind?

As mentioned earlier, lower performance often comes down to data issues. Here are some of the most common issues:

1. Fields with too many values

If a field has a lot of values (for example, think of a picklist with many possible values), it may be hard for ECC to predict it accurately. You often have distribution issues (some values have lots of examples while others have very few). On top of that, these fields tend to have a lot of cases with incorrect labels, as manual classification (by an agent) is hard to get right. It’s difficult for agents to pick between so many values, so they will make mistakes — and the model will learn those same mistakes and repeat them. It’s hard to give an exact number but, typically, I’ve observed degraded quality when the number of values exceeds 100. If you have more than that, it might be a good idea to review your case taxonomy and see if some values can be merged or eliminated. This will make your agents’ lives easier and improve ECC’s performance!

2. Deep hierarchies

Dependent picklists are supported; however, hierarchies over three levels deep can sometimes experience lower performance due to compounding errors and less data available to learn from at the bottom of the hierarchy. You may consider keeping lower-level fields in recommendation mode instead of auto-triage if the performance is not high enough. Also, dependent fields may not get auto-triaged as frequently due to compatibility issues between the predictions of the dependent field and the controlling fields.

3. Overlapping values

This is probably the most common data issue I’ve seen while working closely with many ECC customers over the past few years. You may have a few values that are too close to each other or too generic, so the agent does not know which one to pick. For example, you may have picklist values such as “General” or “Spam”, so agents tend to put all kinds of cases in there that should have been classified as one of the other values. That will make it harder for the ECC model to learn how to differentiate historical cases and thus harder to predict for new cases. The confusion matrix shipped with the Einstein Case Classification TCRM Performance Dashboard is a great way to pinpoint such issues. Once the problematic fields have been identified, you may want to address the issue by merging certain values or further splitting those that are too generic.

4. Incorrect values

However conscientious your agents may be in their work, they will sometimes make mistakes. It’s often due to some of the issues mentioned above, such as fields with too many values or overlapping values. Maybe the process has changed and values are used differently than before. Or, they might simply be new to the job. For these reasons and possibly others, a portion of your historical data will be incorrect and thus mislead ECC when building a model from that data. For one customer I worked with closely, incorrect values represented about 30% of their historical data which had a significant impact on performance. This is a tough problem to solve, but you have a few options. You can address the underlying issues as proposed above, change the business process/agent training material to try and reduce the frequency of mistakes, reclassify those cases in bulk, or exclude these cases from training and then retrain your model. Also, with machine learning, time is your friend. Models get retrained regularly, so if you make changes that reduce the number of errors in the historical data, the model will pick this up and improve over time.

5. Handling of duplicate cases

How your organization handles duplicate cases can impact model performance. With many customers, agents who get duplicate cases correctly select the values for one instance — but for the others, they either leave the fields blank, leave the default values that are incorrect, or choose another value like “Spam” or “Duplicate”. This confuses the model because for every correct “ground truth” case, there can be multiple cases with incorrect values. Solve this issue by using another field to mark cases as duplicates (for example, a boolean “IsDuplicate” field) and excluding those from model training.

6. Modification of case subject or description by the agents

One thing to keep in mind is that ECC trains on the final state of a case, but you care about predictions when new cases come in. If your agents modify the case’s subject or description as part of the triage process, they will be different — so ECC will learn on the modified version and have a harder time predicting accurately on the initial state. For ECC to perform at its best, it’s better to leave the subject and description of the case untouched and use other fields to indicate updates to the case.

7. Fields used differently by different parts of the business

It may be that the same field values are used for different purposes by different parts of your business, for example, for your B2B and B2C customers. Or, perhaps your service centers in different countries use different processes, resulting in different field values being assigned for similar cases. These are great examples of when to use different models, also called segments. With ECC, you can create up to five segments and prioritize them so the segment whose filter matches first gets applied first. I’ve seen some customers improve their accuracy by up to 20 percentage points after creating segments. Just be careful not to create segments that are too small, as this could lower the accuracy of your model.

8. Cases in multiple languages

ECC supports making predictions on cases written in different languages. Some of our largest customers receive cases in dozens of different languages! The main thing to look for is that the distribution of languages in the training set that Einstein learns from and the cases you want predictions for are similar. For example, don’t train only on Italian cases and hope that it will work in Dutch!

Give it a try and rack up the benefits!

Now, you have a sense of what it takes to roll out ECC in your organization, the benefits you can expect, what to look out for, and how to measure success. Give ECC a try and see how it not only saves your agents time but also makes your customers happier with faster turnaround time!

Want to find out how else you can work smarter with Salesforce Einstein? Head over to our AI for Admins blog series. Each post features a different AI product or topic, with real world examples.

Resources

How Salesforce Einstein Is Supercharging Mobile Experiences.

How Salesforce Einstein Is Supercharging Mobile Experiences

While its impact is widespread, one of the most exciting aspects of artificial intelligence (AI) is its ability to create conversational interactions that generate personalized experiences, supercharging productivity and efficiency. In this blog post, we’ll explore how the implementation of large language models on mobile devices is reshaping the enterprise mobile landscape and how Salesforce […]

READ MORE
Einstein standing next to text that says, "How to Use Generative AI Tools to Write SOQL Queries."

How to Use Generative AI Tools to Write SOQL Queries

Salesforce Object Query Language (SOQL) is a powerful tool that allows you to retrieve data from Salesforce. You can use SOQL to query any Salesforce object, including custom objects, custom fields, and user permissions like profile and permission set perms. As a Salesforce Admin, I know that writing SOQL queries can be a pain. Not […]

READ MORE
Headshot of Tom Hoffman next to text that says, "AI Prompt Writing for Salesforce Professionals."

AI Prompt Writing for Salesforce Professionals

The rise of the machines Machines and artificial intelligence (AI) have been part of popular discussion since Samuel Butler authored Erewhon (1872), where his satirical utopian society explored the morality of conscious machines as a natural evolution of the Industrial Revolution. One-hundred and fifty years later, OpenAI released GPT-4, introducing the world to AI that […]

READ MORE