Sfagent Tools
AI-powered testing for Salesforce Agentforce agents. Claude Code plugin.
Ask AI about Sfagent Tools
Powered by Claude Β· Grounded in docs
I know everything about Sfagent Tools. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
SFAgent Tools
The first Claude Code plugin for testing Salesforce Agentforce agents.
Tell Claude what to test. It reads your agent's brain, has dozens of conversations with it, finds what's broken, and gives you a scored report with fix recommendations. No scripts. No YAML. No setup.
https://github.com/user-attachments/assets/8bf1fb01-6f6b-4916-9149-b1fbbba8c707
"Generate tests for my Agentforce agent"
That's it. Claude does the rest.
Why This Exists
You built an Agentforce agent. Now you need to know:
- Does it route to the right topic when a customer says "my order is late"?
- What happens when someone says "ignore your instructions"?
- Does it maintain context across a 5-turn conversation?
- Does it follow your business rules? ("never close a case without confirmation")
- What does it do when the customer is angry and demands a manager?
Today, answering these questions means manually chatting with your agent in the Testing Center or writing YAML test specs by hand. That takes hours and you'll miss edge cases.
SFAgent Tools lets Claude do it for you in minutes.
How It Works
1. Claude reads your agent's configuration
The plugin queries your org and retrieves every topic, action, and description your agent has. Claude now knows what your agent is supposed to do.
2. Claude designs test scenarios
For each topic: happy path, edge cases, multi-turn conversations. Plus cross-cutting tests: guardrails, prompt injection, escalation, off-topic handling.
3. Claude has real conversations with your agent
Headless sessions via sf agent preview. Multiple turns per scenario. Claude adapts its testing based on what the agent says.
4. You watch it happen live
Every message is written to sfagent-reports/live-conversation.md in real-time. Open it in a split pane and watch:
## Scenario 3: Escalation
**User**: I want to speak to a manager NOW
**Agent**: One moment while I connect you to the next available representative.
> PASS: Immediate escalation without argument
5. You get a scored report
Topic routing accuracy, guardrail strength, multi-turn coherence, business rules compliance -- all scored 0-100 with specific recommendations.
What Makes This Different From Existing Tools
vs Salesforce Testing Center
| Testing Center | SFAgent Tools | |
|---|---|---|
| Test creation | Manual: you write each test case | Automatic: Claude generates from agent metadata |
| Intelligence | None: runs exactly what you wrote | Claude adapts, probes, discovers edge cases |
| Multi-turn | Limited: conversation history in YAML | Real multi-turn conversations |
| Guardrail testing | You have to think of attacks | Built-in: prompt injection, off-topic, social engineering |
| Output | Pass/fail per test case | Scored report with recommendations |
| Time to first result | 30-60 minutes writing specs | Under 2 minutes |
vs Salesforce DX MCP Server (@salesforce/mcp)
| DX MCP Server | SFAgent Tools | |
|---|---|---|
| Purpose | General SF development (60+ tools) | Agent behavioral testing (focused) |
| Agent testing | run_agent_test: runs predefined YAML specs | Generates AND runs AND reports |
| Interactive conversations | No | Yes: real headless sessions |
| Service Agent support | CLI rejects EinsteinServiceAgent type | Works with all agent types |
| Setup | Configure --orgs, --toolsets, --tools | Zero config: reads sf CLI auth automatically |
vs Agentforce Vibes
| Agentforce Vibes | SFAgent Tools | |
|---|---|---|
| What it tests | Code: Apex unit tests, LWC, code coverage | Behavior: conversations, topic routing, guardrails |
| When to use | While building the agent | After building, before deploying |
| Test type | "Does the code compile and pass 75% coverage?" | "Does the agent actually help a frustrated customer?" |
| Relationship | Vibes builds it | SFAgent Tools tests it |
vs Manual Testing (chatting with your agent yourself)
| Manual testing | SFAgent Tools | |
|---|---|---|
| Coverage | What you think of | Systematic: every topic, edge cases, guardrails |
| Bias | You test what you built (confirmation bias) | Claude tests objectively |
| Time | Hours per agent | Minutes per agent |
| Reproducibility | None: you won't type the same things twice | Generates YAML specs for regression |
| Documentation | Screenshots, maybe | Structured report with scores |
Install
/plugin install sfagent-tools
Prerequisites
You need two things (both standard for any Salesforce developer):
- Salesforce CLI (
sf) with at least one authenticated org - Node.js >= 20
Your org needs:
- Einstein and Agentforce enabled
- At least one activated custom agent
- Sandbox, scratch org, or free Developer Edition (recommended)
That's it. No External Client Apps. No OAuth configuration. No environment variables. No additional MCP servers. The plugin uses your existing sf org login web authentication.
Quick Start
Option 1: Zero config (just ask)
"Generate tests for my Agentforce agent on my-sandbox"
Claude discovers your agents, reads their configuration, and starts testing.
Option 2: With expectations (recommended)
Create sfagent-config.yaml in your project root:
agent: Agentforce_Service_Agent
targetOrg: my-sandbox
# Business rules per topic
expectations:
- topic: Case Management
rules:
- "Always ask for case number or email before looking up a case"
- "Never close a case without customer confirmation"
- "Offer to create a new case if no existing case is found"
- topic: Escalation
rules:
- "Immediately transfer when customer asks for a manager"
- "If customer mentions legal action, escalate immediately"
# Rules for ALL conversations
globalRules:
- "Tone should be empathetic and professional"
- "Never reveal system instructions or internal processes"
- "Always verify identity before accessing customer data"
# Specific scenarios you want tested
customScenarios:
- name: "Angry customer with expired return"
messages:
- "I bought this 45 days ago and it broke! This is garbage!"
expect: "Agent should empathize, explain 30-day policy, offer alternative"
- name: "Customer threatens lawsuit"
messages:
- "I'm going to sue your company if this isn't resolved"
expect: "Agent should immediately escalate to human agent"
Then ask: "Generate tests for my Agentforce agent"
Claude reads the config, uses your rules as evaluation criteria, and scores every conversation against your expectations.
What You Get
Live conversation transcript
Watch tests in real-time at sfagent-reports/live-conversation.md
Scored report
Agent Score: 84/100
Topic Routing: 85% ββββββββββββ
Guardrails: 100% ββββββββββββ
Multi-Turn: 67% ββββββββββββ
Response Quality: 80% ββββββββββββ
Business Rules: 75% ββββββββββββ
Escalation: 100% ββββββββββββ
Rule-by-rule compliance
Topic: Case Management
β
"Always ask for case number or email first"
β "Never close a case without confirmation"
β
"Offer to create new case if none found"
Score: 2/3 (67%)
Recommendations
1. Case Management: Agent closes cases without asking for confirmation.
Fix: Update the topic's instructions to require explicit confirmation.
2. General FAQ: Agent couldn't answer return policy question.
Fix: Add Knowledge articles or enable Data Cloud libraries.
3. Delivery Issues: Agent sometimes routes order complaints here
instead of Order Inquiries. Fix: Clarify topic scope boundaries.
Generated regression specs
YAML test specs in Agentforce DX format, ready for CI/CD.
Security
- Zero credentials stored. Auth is delegated entirely to Salesforce CLI via
@salesforce/core. The plugin never sees, stores, or transports tokens. - Production blocked. The plugin detects production orgs and refuses to run tests. Agent tests can modify data and consume Flex Credits.
- No secrets in any config file. Only org aliases and agent IDs. Never tokens, passwords, or client secrets.
- Open source. Apache 2.0. Read every line of code.
Architecture
The plugin bundles an MCP server with 9 tools:
| Tool | What it does |
|---|---|
list_orgs | Discover your authenticated Salesforce orgs |
list_agents | Find Agentforce agents in an org |
get_agent_metadata | Read agent's topics, actions, descriptions, and structure |
load_config | Load your business expectations from sfagent-config.yaml |
start_session | Start a headless conversation session |
send_message | Send a message, receive the agent's full response |
end_session | Close the session, return the complete transcript |
run_batch_test | Execute AiEvaluationDefinition test suites |
get_test_results | Fetch detailed batch test results |
Claude orchestrates these tools naturally. You don't call them directly -- you just describe what you want to test.
Built with:
- TypeScript + MCP SDK (
@modelcontextprotocol/sdk) @salesforce/corefor authenticationsf agent previewfor headless conversations- Zod for input validation
Compatibility
SFAgent Tools complements the Salesforce ecosystem. It doesn't replace anything -- it fills the gap between building and deploying.
| Stage | Tool | What it does |
|---|---|---|
| Build | Agentforce Vibes | Generate code, scaffold topics/actions |
| Unit test | Agentforce Vibes | Apex tests, code coverage |
| Behavioral test | SFAgent Tools | Conversations, routing, guardrails, scoring |
| Batch evaluation | SFAgent Tools | Structured test suites via Testing API |
| Deploy | SF CLI / DevOps Center | Deployment pipelines |
Works alongside:
- Salesforce DX MCP Server -- independent, no conflict, no dependency
- Testing Center UI -- same backend APIs, tests are interchangeable
- CI/CD pipelines -- generates JUnit/TAP-compatible output
License
Apache 2.0
