Tester
Instrumented MCP server for testing client conformance to list_changed notifications, enabling verification of dynamic tool, resource, and prompt surfacing.
Installation
npx mcp-server-testerAsk AI about Tester
Powered by Claude Β· Grounded in docs
I know everything about Tester. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
@gleanwork/mcp-server-tester
A testing and evaluation framework for Model Context Protocol (MCP) servers. Write deterministic Playwright tests against your MCP tools, or run data-driven eval datasets β including LLM-based evaluation of tool discoverability.
Playwright Tests
The mcp Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
test('read_file returns file contents', async ({ mcp }) => {
const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
expect(result).toContainToolText('Hello, world');
expect(result).not.toBeToolError();
});
test('server exposes required tools', async ({ mcp }) => {
const tools = await mcp.listTools();
expect(tools.map((t) => t.name)).toContain('read_file');
});
Playwright tests are fast, deterministic, and designed for CI. Use them for regression testing, schema validation, and protocol conformance. The framework includes built-in conformance checks for the MCP spec.
Available matchers:
| Matcher | Description |
|---|---|
toMatchToolResponse | Response exactly matches expected value (deep equal) |
toContainToolText | Response contains expected substrings |
toMatchToolSchema | Response validates against a Zod schema |
toMatchToolPattern | Response matches a regex pattern |
toMatchToolSnapshot | Response matches a saved baseline |
toBeToolError | Response is (or is not) an error |
toHaveToolResponseSize | Response size is within bounds |
toSatisfyToolPredicate | Response satisfies a custom function |
toHaveToolCalls | LLM called the expected tools |
toHaveToolCallCount | LLM made N tool calls |
toPassToolJudge | LLM evaluates response quality against a rubric |
Eval Datasets
Eval datasets let you define test cases as JSON files and run them with runEvalDataset(). Each case specifies a tool call and one or more assertions.
{
"name": "file-ops",
"cases": [
{
"id": "read-config",
"toolName": "read_file",
"args": { "path": "/tmp/config.json" },
"expect": {
"schema": "file-content",
"containsText": ["version", "name"]
}
},
{
"id": "read-readme",
"toolName": "read_file",
"args": { "path": "/tmp/README.md" },
"expect": {
"snapshot": "readme-snapshot"
}
}
]
}
import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
import { z } from 'zod';
test('file operations eval', async ({ mcp }, testInfo) => {
const dataset = await loadEvalDataset('./data/evals.json', {
schemas: { 'file-content': z.object({ content: z.string() }) },
});
const result = await runEvalDataset({ dataset }, { mcp, testInfo });
expect(result.passed).toBe(result.total);
});
Supported assertion types:
| Type | Description |
|---|---|
containsText | Response includes expected substrings |
schema | Response validates against a Zod schema |
regex | Response matches a pattern |
snapshot | Response matches a saved baseline |
judge | LLM evaluates response quality against a rubric |
toolsTriggered | LLM called the expected tools (LLM host mode) |
LLM host mode
In LLM host mode, a real LLM receives your server's tool list and a natural language prompt, then decides which tools to call. This tests whether your tool names, descriptions, and input schemas are clear enough for autonomous use β a different question from whether the tools return correct output.
{
"id": "find-config",
"mode": "mcp_host",
"scenario": "Find the application config file and return its contents",
"mcpHostConfig": {
"provider": "anthropic",
"model": "claude-opus-4-20250514"
},
"expect": {
"toolsTriggered": {
"calls": [{ "name": "read_file", "required": true }]
}
}
}
LLM host mode makes real API calls and produces non-deterministic results. Use iterations to run a case multiple times and measure pass rate rather than expecting 100% on a single run. See the LLM Host Guide for configuration and cost management.
Installation
Requires Node.js 22+.
npm install --save-dev @gleanwork/mcp-server-tester @playwright/test
The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:
npm install --save-dev @anthropic-ai/sdk
Quick Start
npx mcp-server-tester init
The CLI wizard creates a playwright.config.ts, example tests, and a sample eval dataset configured for your server. See the CLI Guide for all options.
Configuration
Point the framework at your MCP server in playwright.config.ts:
import { defineConfig } from '@playwright/test';
export default defineConfig({
testDir: './tests',
reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
projects: [
{
name: 'my-server',
use: {
mcpConfig: {
transport: 'stdio',
command: 'node',
args: ['server.js'],
},
},
},
],
});
For HTTP servers, set transport: 'http' and serverUrl. For servers that require OAuth, see the Transports Guide and CLI Guide for authentication setup, including CI/CD token management.
Documentation
- Quick Start β detailed setup and configuration
- Expectations β all assertion types including snapshot sanitizers
- LLM Host Simulation β tool discoverability testing
- API Reference
- Transports β stdio and HTTP configuration, OAuth
- CLI Commands β init, generate, login, token
- UI Reporter β interactive web UI for test results
- Development β contributing and building
- Migration Guide (v0.12 β v1.0) β upgrading from pre-1.0 releases
AI Skills
Install AI skills to help your coding assistant generate tests, eval datasets, and MCP host evals:
npx skills add -g gleanwork/mcp-server-tester
This installs skills globally so they're available across all your projects. Four skills are included:
| Skill | Description |
|---|---|
mcp-tester-guide | Framework reference β matchers, config, auth, anti-patterns |
write-mcp-test | Generate direct-mode Playwright tests |
write-mcp-eval | Generate data-driven eval datasets |
write-mcp-host-eval | Generate LLM host simulation evals |
Compatible with Claude Code, Cursor, Windsurf, Copilot, and 40+ other AI agents.
Examples
The examples/ directory contains complete working examples:
- filesystem-server/ β Test suite for Anthropic's Filesystem MCP server: 5 Playwright tests, 11 eval dataset cases, Zod schema validation.
- sqlite-server/ β Test suite for a SQLite MCP server: 11 Playwright tests, 14 eval dataset cases.
- basic-playwright-usage/ β Minimal Playwright patterns.
Known Limitations
These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:
- MCP resources (
listResources,readResource) - MCP prompts (
listPrompts,getPrompt) - Server-to-client notifications
- Streaming tool responses (
callToolwaits for the complete response)
If any of these affect your use case, please open an issue.
License
MIT
