shuck-file
AI doesn't eat shells
Ask AI about shuck-file
Powered by Claude Β· Grounded in docs
I know everything about shuck-file. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
shuck-file
Any file in, Markdown out β read only what matters.
shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps β so agents only pull what they need.
Why shuck-file?
AI agents need a bridge that's context-aware:
- Small file β
shuck report.docxβ full Markdown on stdout - Large file β
shuck report.docxβ document map with sections and extraction options - Targeted extraction β
shuck report.docx --sections s1,s3β only what you need - Search β
shuck report.docx --grep "revenue"β find without reading everything
Supported Formats
| Format | Extension | Library | What's Preserved |
|---|---|---|---|
| Word | .docx | python-docx | Headings, bold/italic, lists, tables |
.pdf | pdfplumber | Text content, page breaks | |
| Excel | .xlsx | openpyxl | All sheets as Markdown tables |
| PowerPoint | .pptx | python-pptx | Titles, text, tables, speaker notes |
| CSV | .csv | stdlib | All rows/columns as a table |
Installation
Via pip (recommended)
pip install shuck-file
This installs the shuck CLI command and the MCP server.
From source
git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .
Quick Start
# Convert a document
shuck report.docx
# Force full output (bypass map mode)
shuck large-report.pdf --all
# Search within a document
shuck report.pdf --grep "revenue"
Usage
Auto-Routing (default)
Small files output directly, large files return a document map.
# Small file β direct Markdown output
shuck document.pdf
# Large file β document map with sections table + next steps
shuck large-report.pdf
Extraction Options
# Force full output (bypass map mode)
shuck report.pdf --all
# Extract specific sections
shuck report.pdf --sections s1,s3
# Tables only
shuck report.pdf --tables-only
# Search within document
shuck report.pdf --grep "revenue"
# Token budget (smart compression)
shuck report.pdf --budget 4000
# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000
Excel/CSV Specific
# Column headers and types
shuck data.xlsx --schema-only
# Headers + first N rows
shuck data.xlsx --sample 5
Power User Subcommands
# Force map mode (even on small files)
shuck probe document.docx
# Force full extraction (alias for --all)
shuck pull document.docx
Output Control
# Write to file
shuck document.pdf -o output.md
# Write to directory (auto-named)
shuck document.pdf -d ./converted/
# Skip YAML frontmatter
shuck document.pdf --no-frontmatter
# List supported formats
shuck --formats
Map Mode Output
When a file is large, shuck returns a document map:
# Document Map: quarterly-report.pdf
**6 pages | ~12,400 tokens | 6 sections**
## Sections
| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...
## Next Steps
- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords
MCP Server
shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.
Claude Code
claude mcp add shuck-file -- shuck-file
Or add to your project's .mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Windsurf
Add to your MCP configuration:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Any MCP Client
shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:
shuckβ Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)list_formatsβ List supported document formats
Claude Code Plugin
Install as a Claude Code plugin for the /shuck skill:
claude plugin add /path/to/shuck-file
Architecture
src/shuck_file/
βββ cli.py # CLI entrypoint
βββ server.py # MCP Server (FastMCP)
βββ core/
β βββ router.py # Auto-routing logic
β βββ segmenter.py # Document segmentation
β βββ mapper.py # Map mode renderer
β βββ budget.py # Smart compression
β βββ grep.py # In-document search
β βββ frontmatter.py # YAML frontmatter
β βββ models.py # Data models
βββ extractors/
β βββ base.py # Base extractor ABC
β βββ docx_ext.py # Word extractor
β βββ pdf_ext.py # PDF extractor
β βββ xlsx_ext.py # Excel extractor
β βββ pptx_ext.py # PowerPoint extractor
β βββ csv_ext.py # CSV extractor
plugin/ # Claude Code plugin wrapper
tests/
βββ test_extractors.py
βββ test_router.py
βββ test_segmenter.py
βββ test_budget.py
βββ test_grep.py
License
MIT
