Home » Article » Ensuring AI Accuracy: 5 Steps To Test Agentforce

< Back to all posts

Ensuring AI Accuracy: 5 Steps To Test Agentforce

By Manjeet Singh | January 8, 2025

Artificial intelligence (AI) agents are intelligent systems that can reason and act on behalf of customers and employees. But to realize their full potential, we need to test and configure agents without disrupting live production environments. That’s where you as an admin come in.

Building an effective AI agent isn’t just about deploying advanced models—it’s about ensuring these agents work reliably and consistently in real-world scenarios. Your goal is to get fast user feedback and use those signals to improve the accuracy and performance of AI agents in a production environment.

Admins play a crucial role in a strong testing and evaluation (Evals) framework, and you can follow a step-by-step process to test Agentforce AI Agents using the Agentforce Testing Center. Let’s dive in!

What does it mean to test AI agents in Salesforce?

Autonomous AI agents are powerful but often encounter common pitfalls that can limit their effectiveness.

Inconsistent Quality: Variability in instruction data or insufficient testing can lead to unpredictable or incorrect responses.
High Costs: Poor agent performance often results in escalations or human intervention, driving up operational expenses.
Lack of Guardrails: Without safeguards, agents may produce biased or irrelevant responses, which can harm trust and user experience.
Performance Limitations: Agents may falter under edge cases, large-scale interactions, or specific business needs, causing reliability issues.
Slow Iteration Cycles: Without proper feedback loops, agents struggle to evolve with changing business demands.

By testing agents, you can avoid these pitfalls and anticipate all the ways customers and users might interact with an agent.

What does this mean for admins?

Admins building agents need to accurately test all the different ways a customer may pose a question or interact with an agent. In addition to Agent Builder, which features a Plan Tracer for investigating the reasoning process of an agent, the new Agentforce Testing Center coming in January 2025 allows teams to test topic and action selection at scale. Testing Center will be auto-enabled for all Agentforce customers to use in their production sandbox instance.

Using natural language instructions, Testing Center can auto-generate hundreds of synthetic interactions—such as requests a customer may make when engaging with an Agentforce Service Agent—then test them in parallel to see how frequently they result in the right outcome. Admins can then use the test data to refine instructions so the expected topic is more frequently selected, improving the end customer experience.

Agent Testing and Monitoring isn’t just a checkbox—it’s the foundation for admins to build reliable, scalable, and trustworthy AI agents. Testing Center provides an innovative way to use AI to test AI, from creating a test dataset to evaluating thousands of test cases in one job to gain the confidence that the agent you build is ready to be deployed in a production environment.

What should admins do next? A 5-step guide to testing Agentforce

The AI Agent Testing Loop is a step-by-step process to improve AI agents. It starts with creating test scenarios, selecting evaluation metrics, and running automated tests. Admins then validate results, and the feedback is used to refine the agent’s instructions and performance, ensuring better results with each cycle.

1. Identify test scenarios and create test data

Start by understanding your agent’s scope and capabilities. Each agent includes topics (domains of expertise) and actions (tasks the agent performs). For example:

A General_CRM topic might handle tasks like querying or updating CRM data.
The QueryRecords action retrieves specific records from your CRM system.

In Agent Builder, you can customize an out-of-the-box agent to any industry and any use case. You can create a job to be done by your agent by defining topics, which include natural language instructions and additional guardrails: a library of actions the agent can take using tools you already have in Salesforce, such as flows, Apex, MuleSoft APIs, and prompt templates.

Navigate to Topic Detail from Agent Builder. Then, review the Description/Scope and other details that describe the capabilities of the topic, as well as when and under which circumstances it will be evaluated by the agent at run time.

AI-generated tests for Agentforce: Clicking on the Batch Test button in Agent Builder can auto-generate hundreds of synthetic interactions, such as requests a customer may make when engaging with Agentforce Service Agent.

Alternatively, you can also create test cases directly from Testing Center. It uses the same logic of using AI to generate more test scenarios based on the Agent Metadata and what Knowledge the Agent has access to through Einstein Data Library.

A good quality test case dataset has three quality attributes:

Volume: A sufficient number of test cases to ensure comprehensive coverage of different scenarios and edge cases.
Diversity: A wide range of inputs, contexts, and variations to test the AI agent’s adaptability across real-world use cases.
Quality: Well-defined, accurate, and relevant test cases aligned with the AI agent’s objectives.

2. Review the generated test cases

To ensure the data has coverage and diversity, review the generated test dataset and see if the expected topic column covers all the topics associated with the agent you want to test. Here’s how to do this: Select the agent, open in Agent Builder, then go to Topic Details.

You can view a complete set of topics/actions that were evaluated for requests to the agent by navigating to Agent Builder and viewing the responses to requests. For example, here’s what your Builder will include when “Query Acme” is input to the agent. Observe that the EmployeeCopilot__GeneralCRM topic and QueryRecords action were evaluated.

A test case validates how the agent processes input (utterances) and generates responses. Each test case includes:

Utterance: Input query to the agent
Expected Topics: Topics the agent should evaluate (for example: EmployeeCopilot__GeneralCRM)
Expected Actions: Actions the agent should execute (for example: QueryRecords)
Expected Outcome: The desired result described in plain language

Example test case:

3. Run tests and evaluate results in Testing Center

Once the test data is ready:

Navigate to Testing Center in Salesforce Setup.
Create a New Test and upload your test case CSV file.
If you do not have test cases in CSV format, use the “Batch Test” to create test cases using AI. Enter the number of records you want to create, and the test cases will be ready for evaluation within a few minutes.
Click Save & Run to create the test and start executing the tests.

Testing Center: AI-assisted evaluation

Topic Evaluation: Verifies the agent assessed the correct topic
Action Evaluation: Confirms the correct actions were executed
Outcome Evaluation: Compares expected versus actual responses using LLM-powered interpretations

You can view the results by selecting the Test detail view.

4. Perform human validation

While automated testing handles most scenarios, human validation ensures responses align with nuanced user expectations. This step catches subtle issues, like tone mismatches or context-specific inaccuracies.

5. Review results and iterate

Testing is an iterative process. Use test results to:

Refine prompts, topics, and actions.
Adjust Retrieval Augmented Generation (RAG) settings.
Improve agent instructions for better accuracy.

A continuous feedback loop ensures your agent evolves and stays relevant as user needs and business goals change.

Frequently asked questions

How do I create test cases for custom actions?

Review the action’s scope in Agent Builder, then design scenarios based on expected user interactions.

How often should I run tests?

Run tests whenever you update topics, actions, or prompts to validate changes and maintain quality. For example:

After every change (bug fixes, feature updates, data changes)
Pre-deployment (batch testing)
Post-deployment (online evals for quality, trust, guardrails)

Can I analyze Testing Center results further?

Yes, Testing Center results are stored and can be analyzed using tools like Tableau or Salesforce Data Cloud reports.

Does running tests incur any cost to me?

Yes, Agentforce Testing Center uses AI to evaluate the test cases so you don’t have to manually. It will decrease the number of Einstein Requests used based on the number of test cases you are running. You can see the details of Einstein Request Consumption in Digital Wallet.

Final thoughts

Testing is the foundation of building AI agents that are reliable, efficient, and trusted. By following this guide, you’ll ensure your Agentforce Agents consistently deliver exceptional results while adapting to evolving needs.

Thank you to Principal Product Designer Jon Moore and Senior Director of Product Management Deepak Mukunthu for collaborating on this article.

Resources

Salesforce site: Customer Success with Agentforce
Salesforce site: Agentforce ROI Calculator
Salesforce site: Agentforce Testing and Deployment
Trailhead: Agentblazer Community Group
Trailhead: Datablazer Community Group

Share this story!