📦

Tester

Name: Tester
Rating: 3.0 (1 reviews)
Author: gleanwork

Instrumented MCP server for testing client conformance to list_changed notifications, enabling verification of dynamic tool, resource, and prompt surfacing.

0 installs

4 stars

Trust: 59 — Fair

Devtools

Installation

npx mcp-server-tester

Ask AI about Tester

I know everything about Tester. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

@gleanwork/mcp-server-tester

A testing and evaluation framework for Model Context Protocol (MCP) servers. Write deterministic Playwright tests against your MCP tools, or run data-driven eval datasets — including LLM-based evaluation of tool discoverability.

Playwright Tests

The mcp Playwright fixture connects to your MCP server (stdio or HTTP) and exposes a high-level API for calling tools and asserting responses. Custom matchers keep assertions readable.

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';

test('read_file returns file contents', async ({ mcp }) => {
  const result = await mcp.callTool('read_file', { path: '/tmp/test.txt' });
  expect(result).toContainToolText('Hello, world');
  expect(result).not.toBeToolError();
});

test('server exposes required tools', async ({ mcp }) => {
  const tools = await mcp.listTools();
  expect(tools.map((t) => t.name)).toContain('read_file');
});

Playwright tests are fast, deterministic, and designed for CI. Use them for regression testing, schema validation, and protocol conformance. The framework includes built-in conformance checks for the MCP spec.

Available matchers:

Matcher	Description
`toMatchToolResponse`	Response exactly matches expected value (deep equal)
`toContainToolText`	Response contains expected substrings
`toMatchToolSchema`	Response validates against a Zod schema
`toMatchToolPattern`	Response matches a regex pattern
`toMatchToolSnapshot`	Response matches a saved baseline
`toBeToolError`	Response is (or is not) an error
`toHaveToolResponseSize`	Response size is within bounds
`toSatisfyToolPredicate`	Response satisfies a custom function
`toHaveToolCalls`	LLM called the expected tools
`toHaveToolCallCount`	LLM made N tool calls
`toPassToolJudge`	LLM evaluates response quality against a rubric

Eval Datasets

Eval datasets let you define test cases as JSON files and run them with runEvalDataset(). Each case specifies a tool call and one or more assertions.

{
  "name": "file-ops",
  "cases": [
    {
      "id": "read-config",
      "toolName": "read_file",
      "args": { "path": "/tmp/config.json" },
      "expect": {
        "schema": "file-content",
        "containsText": ["version", "name"]
      }
    },
    {
      "id": "read-readme",
      "toolName": "read_file",
      "args": { "path": "/tmp/README.md" },
      "expect": {
        "snapshot": "readme-snapshot"
      }
    }
  ]
}

import { test, expect } from '@gleanwork/mcp-server-tester/fixtures/mcp';
import { loadEvalDataset, runEvalDataset } from '@gleanwork/mcp-server-tester';
import { z } from 'zod';

test('file operations eval', async ({ mcp }, testInfo) => {
  const dataset = await loadEvalDataset('./data/evals.json', {
    schemas: { 'file-content': z.object({ content: z.string() }) },
  });
  const result = await runEvalDataset({ dataset }, { mcp, testInfo });
  expect(result.passed).toBe(result.total);
});

Supported assertion types:

Type	Description
`containsText`	Response includes expected substrings
`schema`	Response validates against a Zod schema
`regex`	Response matches a pattern
`snapshot`	Response matches a saved baseline
`judge`	LLM evaluates response quality against a rubric
`toolsTriggered`	LLM called the expected tools (LLM host mode)

LLM host mode

In LLM host mode, a real LLM receives your server's tool list and a natural language prompt, then decides which tools to call. This tests whether your tool names, descriptions, and input schemas are clear enough for autonomous use — a different question from whether the tools return correct output.

{
  "id": "find-config",
  "mode": "mcp_host",
  "scenario": "Find the application config file and return its contents",
  "mcpHostConfig": {
    "provider": "anthropic",
    "model": "claude-opus-4-20250514"
  },
  "expect": {
    "toolsTriggered": {
      "calls": [{ "name": "read_file", "required": true }]
    }
  }
}

LLM host mode makes real API calls and produces non-deterministic results. Use iterations to run a case multiple times and measure pass rate rather than expecting 100% on a single run. See the LLM Host Guide for configuration and cost management.

Installation

Requires Node.js 22+.

npm install --save-dev @gleanwork/mcp-server-tester @playwright/test

The Anthropic SDK is only needed for LLM-as-judge assertions or LLM host mode with the Anthropic provider:

npm install --save-dev @anthropic-ai/sdk

Quick Start

npx mcp-server-tester init

The CLI wizard creates a playwright.config.ts, example tests, and a sample eval dataset configured for your server. See the CLI Guide for all options.

Configuration

Point the framework at your MCP server in playwright.config.ts:

import { defineConfig } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  reporter: [['list'], ['@gleanwork/mcp-server-tester/reporters/mcpReporter']],
  projects: [
    {
      name: 'my-server',
      use: {
        mcpConfig: {
          transport: 'stdio',
          command: 'node',
          args: ['server.js'],
        },
      },
    },
  ],
});

For HTTP servers, set transport: 'http' and serverUrl. For servers that require OAuth, see the Transports Guide and CLI Guide for authentication setup, including CI/CD token management.

Documentation

Quick Start — detailed setup and configuration
Expectations — all assertion types including snapshot sanitizers
LLM Host Simulation — tool discoverability testing
API Reference
Transports — stdio and HTTP configuration, OAuth
CLI Commands — init, generate, login, token
UI Reporter — interactive web UI for test results
Development — contributing and building
Migration Guide (v0.12 → v1.0) — upgrading from pre-1.0 releases

AI Skills

Install AI skills to help your coding assistant generate tests, eval datasets, and MCP host evals:

npx skills add -g gleanwork/mcp-server-tester

This installs skills globally so they're available across all your projects. Four skills are included:

Skill	Description
`mcp-tester-guide`	Framework reference — matchers, config, auth, anti-patterns
`write-mcp-test`	Generate direct-mode Playwright tests
`write-mcp-eval`	Generate data-driven eval datasets
`write-mcp-host-eval`	Generate LLM host simulation evals

Compatible with Claude Code, Cursor, Windsurf, Copilot, and 40+ other AI agents.

Examples

The examples/ directory contains complete working examples:

filesystem-server/ — Test suite for Anthropic's Filesystem MCP server: 5 Playwright tests, 11 eval dataset cases, Zod schema validation.
sqlite-server/ — Test suite for a SQLite MCP server: 11 Playwright tests, 14 eval dataset cases.
basic-playwright-usage/ — Minimal Playwright patterns.

Known Limitations

These MCP protocol features are not currently supported. These are deliberate scope decisions, not bugs:

MCP resources (listResources, readResource)
MCP prompts (listPrompts, getPrompt)
Server-to-client notifications
Streaming tool responses (callTool waits for the complete response)

If any of these affect your use case, please open an issue.

License

MIT

Tester

Installation

Reviews

Documentation

@gleanwork/mcp-server-tester

Playwright Tests

Eval Datasets

LLM host mode

Installation

Quick Start

Configuration

Documentation

AI Skills

Examples

Known Limitations

License

Tags

Security Checklist