A test-driven approach to building better agents

A Test-Driven Approach to Building Better Agents

By

When you first create an agent—like a service agent for support queries—it can be tempting to deploy it right away. However, by following a test-driven development (TDD) approach, you write tests first, set clear expectations, and then build your agent to pass these tests. This process, combined with creating a robust test dataset and running evals, ensures your agent handles real-world scenarios reliably, making life easier for Salesforce Admins.

Benefits of test-driven agent development

  • Accuracy and quality assurance: Write tests first to define what success looks like, then build the agent to meet these criteria.
  • Consistent updates: Validate each new feature with tests (evals) before it goes live, ensuring that every update maintains quality.
  • Maintainability: A well-defined test dataset and clear eval criteria make it easier to add or update features without unexpected errors.

What does this mean for admins?

Salesforce Admins often rely on quick fixes and minimal manual testing. However, using the Agentforce Testing Center with a TDD approach helps you catch issues early and iterate quickly without breaking existing features. This means smoother operations and fewer disruptions in your daily workflows.

A 4-step approach to building a better test dataset and evals

Before you begin testing, create a baseline version of your agent—a “first draft.” For example, a service agent that handles shipping questions and returns, or escalates complex cases to a human support team.

Step 1: Understand Agentforce Testing Center

Agentforce Testing Center is your hub for creating and running tests. It supports group and batch testing, as well as artificial intelligence (AI)-assisted test case generation, and provides both prebuilt and custom evals.

It links directly to your agent configuration and gives you real-time feedback (Pass/Fail reports) to know exactly where improvements are needed.

Agentforce Testing Center information.

Step 2: Generate a high-quality, diverse test dataset

What is a test dataset? It’s a collection of simulated interactions that your agent is expected to handle, including common queries and edge cases.

A robust test dataset is essential for evaluating your agent’s performance in realistic conditions.

How to build it:

  • Identify key topics (for example, Case Creation, Order Tracking).
  • Identify what data sources Agents can use through Agenforce Data Library (Knowledge articles, files, web search)
  • List realistic scenarios (for example, multiple orders, invalid data)
  • Think about different user persona and see which scenarios are applicable to them
  • Guardrails and edge cases: Test cases for negative testing scenarios.
  • Use the AI-assisted tool in Agentforce Testing Center to generate draft test cases, then manually add cases to cover any gaps.
  • Example: For a service query like, “Where’s my order?”, ensure you include variations such as multiple orders or typos.
  • Think about scenarios where you have ground truth available or not available.

Test dataset ground truth available or not available.

Build test data using AI:

The good news is that Agentforce Testing Center provides an AI-assisted way to create your draft dataset that you can review and refine. Here are the steps.

  1. If you’re in Agent Builder, click Batch Test. If you’re in the Agentforce Testing Center, click New Test.
  2. Select Generate Test Cases – provide a name and use “describe the test cases and provide examples” to create a diverse set of test cases.
  3. Enter number of test cases you want to create.
  4. Select topics that you want to test and click Generate Test Cases. This will take few minutes and will auto create a diverse set of test cases using AI/LLM.

Generate test cases experience.

Add new test cases in the dataset to cover all important topics and scenarios

With your test dataset ready, the next step is to add new test cases that are not covered by automated test case generation. In TDD, you actually start with a failing test—this tells you exactly what to fix. Then you improve your agent until the test passes. You should also think about adding context variables or conversation history as state injection to simulate different scenarios.

Short example dataset with context variable and state injection.

Step 3: Define evals and run tests

Evals (evaluations) are automated checks that compare your agent’s response against the expected outcome.

In Agentforce Testing Center:

  1. Import your test cases or generate a new test case.
  2. Select the eval (outcome evals, coherence, faithfulness, instruction following).
  3. Click Save & Run to execute the entire suite.
  4. Review the Pass/Fail Report: Check the agent conversation logs to see how it responded to each test.

Pass/Fail Analysis

  1. Pass: The agent responded exactly as expected—no updates needed.
  2. Fail: The agent gave a wrong or incomplete response.
    • Adjust your agent instruction, topics, or Knowledge base references.
    • Re-run the test until it passes.

Step 4: Iterate and optimize 

  • Optimize your agent’s instructions, topics, and actions based on test results. For example, if a test for “I want to return my order” fails because the agent doesn’t request a valid order number, update the flow to ask, “Could you share the order number you want to return?”
  • Re-run tests after making changes to confirm improvements.
  • Establish a routine for regular test reviews to ensure ongoing reliability and performance.
  • Continuous improvement: Regularly re-run tests and update your dataset as your agent evolves, ensuring long-term reliability.

Build better agents today

By following a test-driven approach—writing tests first, generating a comprehensive test dataset, and using evals to measure success—you build a robust Agentforce Agent. This process not only enhances accuracy and efficiency but also aligns with the everyday needs of Salesforce Admins, ensuring smoother operations and better customer interactions.

Thank you to Senior Director of Product Management Deepak Mukunthu for collaborating on this article. 

Resources

Introduction to Agentforce for Salesforce Admins

Introduction to Agentforce for Salesforce Admins

What is Agentforce? We are living in the artificial intelligence (AI) era, currently in the third wave of the AI revolution focused on contextual and generative AI and characterized by prompt-based generative AI, real-time AI applications, and autonomous agents. Agentforce is the suite of both assistive and autonomous agents built on the Salesforce platform. Agents […]

READ MORE
Join the Agentforce Virtual Hackathon

Innovate With AI and Win Big at the Agentforce Virtual Hackathon

Artificial intelligence (AI) agents are revolutionizing app development. Join us for the Agentforce Virtual Hackathon to push the boundaries of AI-powered agent technology and build groundbreaking solutions on Agentforce for the chance to win a $50,000 Grand Prize. With Agentforce, Salesforce is leading the way in the biggest technological revolution in decades. Agentforce is the […]

READ MORE
Succeed With MuleSoft and Agentforce

How MuleSoft Helps Admins Get the Most out of Agentforce

As a Salesforce Admin, you’re likely already familiar with Agentforce, Salesforce’s collection of assistive and autonomous artificial intelligence (AI) agents. Agentforce can handle a wide variety of tasks within the Salesforce ecosystem. But what happens when your business relies on external applications? How can you extend Agentforce to integrate with other essential tools?  Enter MuleSoft–Salesforce’s […]

READ MORE