Web2md
convert webpage to markdown for llm agents
Ask AI about Web2md
Powered by Claude Β· Grounded in docs
I know everything about Web2md. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
web2md
An MCP (Model Context Protocol) server that converts web pages to clean Markdown.
MCP(Model Context Protocol) μλ²λ‘, μΉ νμ΄μ§λ₯Ό κΉ¨λν λ§ν¬λ€μ΄μΌλ‘ λ³νν©λλ€.
Why Markdown? / λ§ν¬λ€μ΄μ μ¬μ©νλ μ΄μ
LLM Token Savings β Raw HTML is bloated with tags, scripts, and boilerplate that waste context. web2md strips all of that and returns only clean Markdown, drastically reducing the token count fed to your LLM. With the optional
summaryLevelparameter, you can get an extractive summary instead of the full page, cutting token usage even further when only an overview is needed.
LLM ν ν° μ κ° β μλ³Έ HTMLμλ νκ·Έ, μ€ν¬λ¦½νΈ, λΆνμν 보μΌλ¬νλ μ΄νΈκ° κ°λν΄ μ»¨ν μ€νΈ λλΉκ° ν½λλ€. web2mdλ μ΄λ₯Ό λͺ¨λ μ κ±°νκ³ κΉ¨λν λ§ν¬λ€μ΄λ§ λ°νν΄ LLMμ μ λ¬λλ ν ν° μλ₯Ό λν μ€μ λλ€. μ νμ μΈ
summaryLevelνλΌλ―Έν°λ₯Ό μ¬μ©νλ©΄ μ 체 λ΄μ© λμ μΆμΆ μμ½λ³Έμ λ°μ κ°μλ§ νμν κ²½μ° ν ν° μ¬μ©λμ λμ± μ€μΌ μ μμ΅λλ€.
Performance Example / μ±λ₯ ν₯μ μμ
claude fetch β $0.12 token cost, 9839 context tokens used
web2md (summaryLevel=3) β $0.08 token cost, 5 context tokens used
claude fetch μ¬μ©μ β $0.12 ν ν° μ¬μ©, 컨ν μ€νΈ 9839 μ¬μ©
web2md μ¬μ©μ (summaryLevel=3) β $0.08 ν ν° μ¬μ©, 컨ν μ€νΈ 5 μ¬μ©
Installation / μ€μΉ
Option 1: Claude Code Plugin (Recommended / μΆμ²)
MCP μλ², web-summarize μ€ν¬, /web2md 컀맨λλ₯Ό ν λ²μ μ€μΉν©λλ€.
Installs the MCP server, web-summarize skill, and /web2md command in one step.
claude plugin marketplace add kevstevie/web2md
claude plugin install web2md
Requires Node.js 18+. Playwright (Chromium) is installed automatically on first
npm install.
Option 2: Claude Code β MCP only / MCP μλ²λ§ λ±λ‘
Claude Codeμ MCP μλ²λ§ λ°λ‘ λ±λ‘ν©λλ€.
claude mcp add web2md -- npx -y web2md-mcp@latest
Requires Node.js 18+. Playwright (Chromium) is installed automatically on first
npm install.
Option 3: Build from source / μμ€μμ λΉλ
git clone https://github.com/kevstevie/web2md.git
cd web2md
npm install
npm run build
Getting Started (Source Build) / μμνκΈ° (μμ€ λΉλ)
Skip this section if you installed via
claude plugin installor npm.
claude plugin installλλ npmμΌλ‘ μ€μΉνλ€λ©΄ μ΄ μΉμ μ 건λλ°μΈμ.
Prerequisites / μ¬μ μꡬμ¬ν
- Node.js 18+
Build / λΉλ
npm run build
Run / μ€ν
node dist/index.js
The server runs in STDIO mode and communicates via JSON-RPC over stdin/stdout.
μλ²λ STDIO λͺ¨λλ‘ μ€νλλ©°, stdin/stdoutμ ν΅ν΄ JSON-RPCλ‘ ν΅μ ν©λλ€.
Claude Desktop Configuration / Claude Desktop μ€μ
After installing via Option 1β3, add the following to claude_desktop_config.json.
Option 1~3μΌλ‘ μ€μΉ ν, claude_desktop_config.jsonμ μλ λ΄μ©μ μΆκ°νμΈμ.
Via npm:
{
"mcpServers": {
"web2md": {
"command": "npx",
"args": ["-y", "web2md-mcp@latest"]
}
}
}
Via git clone:
{
"mcpServers": {
"web2md": {
"command": "node",
"args": ["/absolute/path/to/web2md/dist/index.js"]
}
}
}
Features / κΈ°λ₯
- Web Page Fetching - Fetches HTML from any public URL using Node.js built-in
fetchwith browser-like headers - Playwright Support - Renders JavaScript-heavy pages (React, Vue, Angular, etc.) using Playwright (Chromium)
- Smart Content Extraction - Automatically finds the main content (
<main>,<article>,[role=main]) - HTML Cleanup - Removes scripts, styles, nav, footer, ads, and other non-content elements
- Markdown Conversion - Converts clean HTML to Markdown using cheerio + turndown
- Extractive Summarization - Summarizes content using TF-IDF + TextRank with Korean language support. No API key required.
- SSRF Protection - Blocks requests to private/internal IP addresses (127.0.0.1, 10.x, 192.168.x, etc.) including redirect chains and JS-issued requests
- Claude Code Plugin - Bundles
web-summarizeskill (auto-invokes web2md on URL requests) and/web2mdslash command
- μΉ νμ΄μ§ κ°μ Έμ€κΈ° - λΈλΌμ°μ μ μ μ¬ν ν€λλ‘ Node.js λ΄μ₯
fetchλ₯Ό μ¬μ©νμ¬ κ³΅κ° URLμμ HTMLμ κ°μ Έμ΅λλ€ - Playwright μ§μ - Playwright(Chromium)μ μ¬μ©νμ¬ JSλ‘ λ λλ§λλ νμ΄μ§(React, Vue, Angular λ±)λ₯Ό μ²λ¦¬ν©λλ€
- μ€λ§νΈ λ³Έλ¬Έ μΆμΆ -
<main>,<article>,[role=main]μμΌλ‘ λ³Έλ¬Έμ μλ κ°μ§ν©λλ€ - HTML μ 리 - script, style, nav, footer, κ΄κ³ λ± λΆνμν μμλ₯Ό μ κ±°ν©λλ€
- λ§ν¬λ€μ΄ λ³ν - cheerio + turndownμ μ¬μ©νμ¬ μ 리λ HTMLμ λ§ν¬λ€μ΄μΌλ‘ λ³νν©λλ€
- μΆμΆ μμ½ - TF-IDF + TextRank μκ³ λ¦¬μ¦κ³Ό νκ΅μ΄ μ§μ ν ν¬λμ΄μ λ₯Ό νμ©ν μμ½. API ν€ λΆνμ.
- SSRF λ°©μ΄ - μ¬μ€/λ΄λΆ IP(127.0.0.1, 10.x, 192.168.x λ±)λ‘μ μμ² λ° λ¦¬λ€μ΄λ νΈΒ·JS λ°ν μμ²μ λͺ¨λ μ°¨λ¨ν©λλ€
- Claude Code νλ¬κ·ΈμΈ - URL μμ² μ μλμΌλ‘ web2mdλ₯Ό μ¬μ©νλ
web-summarizeμ€ν¬κ³Ό/web2mdμ¬λμ 컀맨λ ν¬ν¨
Tech Stack / κΈ°μ μ€ν
| Component | Technology |
|---|---|
| Language | TypeScript 5 / Node.js 18+ |
| MCP | @modelcontextprotocol/sdk 1.x |
| Transport | STDIO |
| HTML Parsing | cheerio 1.x |
| JS Rendering | Playwright 1.49+ (Chromium) |
| Markdown Conversion | turndown 7.x |
| Summarization | TF-IDF + TextRank |
| Build | TypeScript (tsc) |
Available Tools / μ¬μ© κ°λ₯ν λꡬ
webToMarkdown
Fetches a web page and converts it to Markdown.
Use summaryLevel to get an extractive summary instead of the full page β ideal for reducing LLM token usage when only an overview is needed.
μΉ νμ΄μ§λ₯Ό κ°μ Έμμ λ§ν¬λ€μ΄μΌλ‘ λ³νν©λλ€.
μ 체 λ΄μ© λμ μΆμΆ μμ½μ΄ νμν λλ summaryLevelμ μ€μ νμΈμ β LLM ν ν° μ¬μ©λμ μ€μ΄λ λ° ν¨κ³Όμ μ
λλ€.
| Parameter | Type | Default | Description |
|---|---|---|---|
url | String | - | The URL of the web page to fetch (http/https only) |
summaryLevel | Int? | null | Summary level: 1 (most concise) to 5 (most detailed). Omit for full content. |
debug | Boolean? | false | If true, prepends runtime debug logs (selected fetcher, status, elapsed time) to the returned text so MCP users can verify runtime behavior. |
Token usage guide / ν ν° μ¬μ©λ κ°μ΄λ:
| Mode | summaryLevel | Token Usage |
|---|---|---|
| Full Markdown | null | Standard (HTML β clean MD) |
| Brief summary | 1 | Minimum (~10β20% of full) |
| Balanced summary | 3 | Moderate (~30β50% of full) |
| Detailed summary | 5 | Extended (~60β80% of full) |
Example: full content / μ 체 λ΄μ©:
# Example Domain
This domain is for use in illustrative examples in documents.
You may use this domain in literature without prior coordination or asking for permission.
[More information...](https://www.iana.org/domains/example)
Example: summarized (summaryLevel=1) / μμ½ (summaryLevel=1):
# Example Domain
This domain is for use in illustrative examples in documents.
Project Structure / νλ‘μ νΈ κ΅¬μ‘°
web2md/
βββ plugins/
β βββ web2md/ # Claude Code plugin / νλ¬κ·ΈμΈ
β βββ .claude-plugin/
β β βββ plugin.json # Plugin manifest / νλ¬κ·ΈμΈ λ©νλ°μ΄ν°
β βββ .mcp.json # MCP server auto-config / MCP μλ² μλ μ€μ
β βββ skills/
β β βββ web-summarize/
β β βββ SKILL.md # Auto-invoke web2md on URL requests / URL μμ² μ μλ μ€ν
β βββ commands/
β βββ web2md.md # /web2md <url> slash command / μ¬λμ 컀맨λ
βββ src/
βββ index.ts # MCP server entry point
βββ tool/
β βββ webToMarkdown.ts # Tool definition and handler
βββ fetcher/
β βββ types.ts # HtmlFetcher interface
β βββ staticFetcher.ts # Node fetch() based static fetcher
β βββ playwrightFetcher.ts # Playwright-based JS rendering fetcher
β βββ index.ts # Factory: auto-select fetcher by runtime availability
βββ converter/
β βββ htmlToMarkdown.ts # cheerio cleanup + turndown conversion
βββ service/
β βββ summarizer.ts # MarkdownSummarizer (section-based extraction)
β βββ textRank.ts # TF-IDF + TextRank
β βββ tokenizer/
β βββ types.ts # Tokenizer interface
β βββ simpleTokenizer.ts # English tokenizer (stop words filter)
β βββ koreanTokenizer.ts # Korean tokenizer (space-based + particle removal)
βββ utils/
βββ ssrf.ts # SSRF URL validation (dns.resolve + private IP blocking)
βββ errors.ts # InvalidUrlError, FetchFailedError
Limitations / μ νμ¬ν
- Maximum body size is 5MB by default.
- Sites protected by Cloudflare Bot Manager or similar anti-bot systems may return empty or blocked responses.
- Playwright requires browser binaries. They are installed automatically via
postinstallwhen you runnpm install.
- κΈ°λ³Έ μ΅λ λ³Έλ¬Έ ν¬κΈ°λ 5MBμ λλ€.
- Cloudflare Bot Manager λ± λ΄ μ°¨λ¨μ΄ μ μ©λ μ¬μ΄νΈλ λΉ μλ΅μ΄λ μ°¨λ¨ μλ΅μ λ°νν μ μμ΅λλ€.
- Playwright λΈλΌμ°μ λ°μ΄λ리λ
npm installμpostinstallμ ν΅ν΄ μλμΌλ‘ μ€μΉλ©λλ€.
