Fast Web Fetch
Fast Fetch Web Pages to LLM-ready contexts
Ask AI about Fast Web Fetch
Powered by Claude · Grounded in docs
I know everything about Fast Web Fetch. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
fastwebfetch
Fastwebfetch scrapes web pages with Patchright and converts extracted HTML into cleaned Markdown. It relies on the markdownify, mdformat, tqdm, pylatexenc, and patchright packages.
How to use (one-shot)
The fastest way to run once and get Markdown output:
uv sync
uv run main.py init
uv run main.py "https://example.com" > output.md
If you already ran init, just run:
uv run main.py "https://example.com" > output.md
or without git clone
uvx --from "git+https://github.com/trotsky1997/Fast-Web-Fetch.git" python -m main init
uvx --from "git+https://github.com/trotsky1997/Fast-Web-Fetch.git" python -m main "https://example.com" > output.md
MCP quickstart
Run the MCP server directly from the repo URL:
uvx --from "git+https://github.com/trotsky1997/Fast-Web-Fetch.git" python -m mcp_server
Example client config (stdio server):
{
"servers": {
"fastwebfetch": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/trotsky1997/Fast-Web-Fetch.git",
"python",
"-m",
"mcp_server"
]
}
}
}
Features
- Patchright-based scraping with a persistent browser profile.
- Optional paywall bypass via
bypass-paywalls-chrome-clean. - Markdown cleanup with optional URL stripping and summary mode.
Requirements
- Python 3.11+ (see
.python-version). uvfor dependency management (recommended).
uv workflow
- Install uv (see https://astral.sh/uv for installers) or use
pip install uv. - Run
uv syncto install all dependencies fromuv.lock. - Use
uv run main.pyto execute the script with uv-managed dependencies. - When dependencies change, run
uv lock(oruv lock --upgrade), then commit the updateduv.lockalongsidepyproject.toml.
Initialization
- Run
uv run main.py initas a one-shot setup command to install Chromium via Patchright, create the persistent profile directory, and validate that the bundledextensions/bypass-paywalls-chrome-cleanassets exist. - When the bypass extension is missing, init downloads it automatically from GitFlic; override the URL with
FASTWEBFETCH_PAYWALL_URLif needed. - The
initcommand accepts the same--browser-data-diras the scraper, plus--channel(defaults tochromium) to choose the Patchright browser and--paywall-extension-dirwhen you moved the bypass extension. - Skip the automatic browser install with
--skip-chrome-install(useful when the requested browser is already available). Re-runninginitis safe and simply updates the profile directory and reconfirms the assets.
Patchright scraping
- Install Chrome through patchright with
patchright install chrome(recommended) or install Chromium and runuv run main.py init --channel chromium. - Run
uv run main.py <URL>to scrape a page, convert it to Markdown, and stream the result on stdout; pass--browser-data-dirto reuse a persistent profile and--no-headlessto show the UI. - The scraper launches Patchright with a persistent context (
no_viewport=True) and uses the Chrome channel by default. - Enable paywall bypassing with
--enable-paywall-bypass, or point to a custom directory via--paywall-extension-dir. - Extensions are disabled in headless Chromium/Chrome, so avoid
--headlesswhen relying on paywall bypassing. - See
extensions/bypass-paywalls-chrome-clean/README.mdfor the paywall extension configuration, updates, and bundled MIT license. - URLs are stripped from the markdown output by default to avoid link noise; add
--keep-urlsif you need to preserve HTTP/HTTPS links in the converted content. - Use
--summary-onlyto generate a short model summary instead of full markdown. The default checkpoint isHuggingFaceTB/SmolLM2-135M-Instruct, adjustable via--summary-model.
Quickstart
uv sync
uv run main.py init
uv run main.py "https://example.com"
Development
- Use
uv run pytestfor tests once they exist. - Format code with
uv run black main.pyand lint withuv run ruff main.py.
MCP usage
Run via uvx (git path)
Use uvx to run the MCP server directly from a git URL without cloning:
uvx --from "git+https://github.com/trotsky1997/Fast-Web-Fetch.git" python -m mcp_server
Notes:
- Update the git URL to your fork or a specific ref if needed.
uvxresolves dependencies frompyproject.tomlin the repo.- The server uses a persistent Patchright profile at
~/.fastwebfetch/patchright-profileby default.
MCP config (generic JSON, copy/paste)
Use this JSON in any MCP client that supports stdio servers:
{
"servers": {
"fastwebfetch": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/trotsky1997/Fast-Web-Fetch.git",
"python",
"-m",
"mcp_server"
]
}
}
}
fetch_md tool
The MCP server exposes a single tool named fetch_md with these inputs:
url(string, required): URL to scrape.enable_paywall_bypass(boolean, optional): Load the bundled bypass extension.keep_urls(boolean, optional): Keep HTTP/HTTPS URLs in the markdown output.summary_only(boolean, optional): Return a short summary instead of full markdown.
