Web Reader MCP
No description available
Ask AI about Web Reader MCP
Powered by Claude Β· Grounded in docs
I know everything about Web Reader MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
web-reader-mcp
MCP server that fetches URLs and extracts clean, readable content as Markdown or plain text. Built on the Model Context Protocol so AI assistants (Claude, Cline, OpenCode, etc.) can read web pages.
Features
- Extracts main article content from any URL using Readability
- Converts HTML to clean Markdown via Turndown
- Extracts metadata (title, author, publish date, Open Graph tags)
- Optionally extracts all links from the page
- Optional JavaScript rendering via Playwright for SPAs and JS-heavy sites
- Exposes a single
webReadertool over MCP Streamable HTTP transport
Quick Start
bun install
bun run dev
The server starts on http://localhost:3001 (or PORT env var).
Tool: webReader
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string (required) | β | URL to fetch |
return_format | "markdown" | "text" | "markdown" | Output format |
timeout | number (1β60) | 20 | Request timeout in seconds |
retain_images | boolean | true | Keep image references in output |
with_links_summary | boolean | false | Include a summary of all links |
render_js | boolean | false | Use Playwright to render JavaScript |
Example Response
{
"title": "Article Title",
"url": "https://example.com/article",
"content": "# Article Title\n\nArticle content in markdown...",
"excerpt": "A brief summary of the article.",
"byline": "Author Name",
"siteName": "Example",
"publishedTime": "2025-01-15T10:00:00Z",
"metadata": { "og:description": "..." }
}
MCP Client Configuration
Claude Desktop
{
"mcpServers": {
"web-reader": {
"url": "http://localhost:3001/mcp"
}
}
}
Cline (VS Code)
{
"mcpServers": {
"web-reader": {
"url": "http://localhost:3001/mcp",
"transportType": "streamable-http"
}
}
}
JavaScript Rendering (Optional)
By default, pages are fetched with a simple HTTP client (got). For JavaScript-heavy sites (SPAs, React apps), enable Playwright:
bun add playwright
npx playwright install chromium
Then set render_js: true when calling the tool. Note that Playwright requires ~300β500 MB RAM per browser tab.
Scripts
| Command | Description |
|---|---|
bun run dev | Development server with hot reload |
bun run build | Compile TypeScript |
bun run start | Production server (node dist/index.js) |
bun run typecheck | Type-check without emitting |
Docker
docker compose up --build
Runs on port 3000 with health checks enabled.
Architecture
src/
βββ index.ts # Express server + MCP session management
βββ types.ts # Shared TypeScript interfaces
βββ tools/
β βββ web-reader.ts # Tool registration (Zod schema + handler)
βββ lib/
βββ fetcher.ts # HTTP fetching (got)
βββ fetcher-playwright.ts # JS rendering (Playwright, optional)
βββ parser.ts # HTML β structured content (Readability)
βββ converter.ts # HTML β Markdown (Turndown)
Tech Stack
- Runtime: Node.js (ES2022)
- Package Manager: Bun
- HTTP Server: Express
- MCP SDK:
@modelcontextprotocol/sdkv1 - HTML Parsing: jsdom + Readability
- Markdown: Turndown
- HTTP Client: got
- JS Rendering: Playwright (optional)
License
MIT
