Sibyl
MCP servers to read handwritten notes in PDF form from Google drive and convert to markdown
Installation
npx sibylAsk AI about Sibyl
Powered by Claude Β· Grounded in docs
I know everything about Sibyl. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Sibyl MCP Servers
AI-powered PDF processing and note management through the Model Context Protocol (MCP)
Sibyl provides two specialized MCP servers that enable LLM hosts (Claude Desktop, VS Code, etc.) to intelligently process PDFs and manage markdown notes with advanced capabilities like OCR, intelligent merging, and template-based creation.
βοΈ Important: Licensing Notice
Before you begin: Sibyl uses MuPDF (via go-fitz) for PDF processing, which is licensed under AGPL v3. This means:
- β Open source projects: Free to use under AGPL v3
- β οΈ Commercial/proprietary use: Requires commercial license from Artifex Software
- π Network deployment: Must provide source code to users under AGPL v3
See detailed licensing information below before deploying in production.
π Quick Start
Prerequisites
- Go 1.21+ - Install Go
- Google Cloud Account - For Google Drive API access
- Mathpix Account - For OCR processing (Sign up)
- MCP-compatible host - Claude Desktop, VS Code with MCP extension, etc.
1. Clone and Build
git clone https://github.com/your-username/sibyl.git
cd sibyl
make all
This creates:
./bin/pdf-server- PDF processing MCP server./bin/notes-server- Note management MCP server
2. Set Up Credentials
Google Drive API Setup
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the Google Drive API
- Create service account credentials
- Download the JSON credentials file
- Share your Google Drive folder with the service account email
Mathpix OCR Setup
- Sign up at Mathpix
- Create a new app in your dashboard
- Note your App ID and App Key
3. Configure Environment
Create a .env file in the project root:
# Google Drive Configuration
GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/google-credentials.json
GCP_FOLDER_ID=your-google-drive-folder-id
# Mathpix OCR Configuration
MATHPIX_APP_ID=your-mathpix-app-id
MATHPIX_APP_KEY=your-mathpix-app-key
4. Configure Your MCP Host
Add the servers to your MCP host configuration:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"pdf-server": {
"command": "/full/path/to/sibyl/bin/pdf-server",
"args": [
"--credentials", "/path/to/google-credentials.json",
"--folder-id", "your-google-drive-folder-id",
"--mathpix-app-id", "your-mathpix-app-id",
"--mathpix-app-key", "your-mathpix-app-key",
"--ocr-languages", "en,fr,de",
"--log-level", "INFO"
]
},
"notes-server": {
"command": "/full/path/to/sibyl/bin/notes-server",
"args": [
"--notesFolder", "/path/to/your/notes",
"--logLevel", "INFO",
"--logFile", "notes-server.log"
]
}
}
}
5. Test the Setup
- Restart your MCP host (Claude Desktop, VS Code, etc.)
- Try asking: "What PDF documents do I have available?"
- Try asking: "Show me my notes in the projects folder"
π Core Concepts
PDF Server Capabilities
The PDF server connects to your Google Drive and provides:
- π Document Search: Find PDFs by name, content, or metadata
- π PDF-to-Markdown Conversion: High-quality conversion using Mathpix OCR + visual analysis
- πΌοΈ Image Processing: Converts PDFs to 150 DPI PNG images for visual analysis
- π Multi-language OCR: Supports 40+ languages through Mathpix
- π MCP Resources: Structured access to your document library
How it works:
- PDF β PNG: Converts PDF pages to high-quality images using MuPDF
- PDF β OCR: Extracts text using Mathpix OCR (handles formulas, tables, handwriting)
- PNG + OCR β LLM: Your LLM analyzes both visual and text data for optimal conversion
Notes Server Capabilities
The notes server manages your markdown vault with:
- π Intelligent Reading/Writing: Full CRUD operations on markdown files
- π Smart Merging: 5 merge strategies (append, prepend, date_section, topic_merge, replace)
- π Merge Preview: See exactly what changes before applying them
- π Template System: Pre-built templates for daily notes, meetings, research, projects
- π Content Search: Full-text search across your entire vault
- π MCP Resources: Structured exploration of your note collection
π οΈ Available Tools
PDF Server Tools
| Tool | Description | Parameters |
|---|---|---|
search_pdfs | Search Google Drive for PDF files | query (string), max_files (number) |
convert_pdf_to_markdown | Convert PDF to Markdown using OCR + visual analysis | file_id (string) |
Notes Server Tools
| Tool | Description | Parameters |
|---|---|---|
read_note | Read note content | path (string) |
write_note | Create or overwrite a note | path (string), content (string) |
merge_note | Merge content with existing note | path, content, strategy, title? |
preview_merge | Preview merge operation | path, content, strategy? |
list_notes | List notes in directory | path?, recursive? (boolean) |
search_notes | Search note content | query (string), path? |
get_note_templates | Get available templates | template_type? (string) |
create_note_from_template | Create note from template | path, template_type, variables? |
Merge Strategies
append- Add content to end of fileprepend- Add content to beginning of filedate_section- Add as new dated section with timestamptopic_merge- Intelligent merging based on content topicsreplace- Replace entire file content
Note Templates
daily- Daily note with focus areas, tasks, and reflectionsmeeting- Meeting notes with agenda, attendees, and action itemsresearch- Research notes with citations, analysis, and conclusionsproject- Project planning template with goals, milestones, and progress
π― Usage Examples
Example 1: PDF Research Workflow
You: "I have a research paper about machine learning in my Drive. Can you convert it to notes and add it to my research collection?"
Assistant: I'll help you find and convert that research paper. Let me search your Google Drive first.
β Calls search_pdfs(query: "machine learning")
I found 3 PDFs about machine learning:
- "Deep Learning Fundamentals.pdf" (2.1MB, modified yesterday)
- "ML Algorithms Survey.pdf" (896KB, modified last week)
- "Neural Networks Intro.pdf" (1.5MB, modified last month)
Which would you like me to convert?
You: "The first one"
Assistant: I'll convert "Deep Learning Fundamentals.pdf" to Markdown using our OCR and visual analysis process.
β Calls convert_pdf_to_markdown(file_id: "1BxYz...")
Perfect! The PDF has been processed with Mathpix OCR and converted to high-quality images. I've extracted the text and can see the visual elements including equations, diagrams, and tables.
Here's the converted content... [shows converted markdown]
Would you like me to save this as a new research note?
You: "Yes, save it as 'Deep Learning Fundamentals Summary'"
Assistant: β Calls write_note(path: "research/Deep Learning Fundamentals Summary.md", content: converted_content)
Done! I've saved the converted paper as "Deep Learning Fundamentals Summary.md" in your research folder. The conversion includes all the mathematical formulas, diagrams, and structured content from the original PDF.
Example 2: Smart Note Merging
You: "I just took some meeting notes. Can you merge them with my daily note but keep things organized?"
Assistant: I can help merge your meeting notes. Let me preview how this would look with topic-based merging to keep everything organized.
β Calls preview_merge(path: "daily.md", content: meeting_notes, strategy: "topic_merge")
Here's how the merge would look:
- Your existing "Tasks" section would be combined with new action items
- Meeting notes would be added under a new "Meetings" section
- No duplicate content would be created
Should I proceed with this merge?
You: "Perfect, go ahead"
Assistant: β Calls merge_note(path: "daily.md", content: meeting_notes, strategy: "topic_merge")
Done! Your meeting notes have been intelligently merged into your daily note with proper organization and no duplicates.
Example 3: Template-Based Note Creation
You: "Create a meeting note for tomorrow's sprint planning session"
Assistant: I'll create a meeting note using our meeting template.
β Calls create_note_from_template(path: "meetings/sprint-planning-2025-08-10.md", template_type: "meeting", variables: {"TITLE": "Sprint Planning", "DATE": "2025-08-10", "ATTENDEES": "TBD"})
I've created "sprint-planning-2025-08-10.md" with a structured meeting template including:
- Agenda section
- Attendees list (you can fill this in)
- Discussion points
- Action items tracking
- Next steps
The note is ready for you to add specific agenda items before the meeting.
π MCP Resources
Both servers provide MCP resources that enable intelligent exploration of your content without knowing specific file paths.
PDF Server Resources
pdf://documents/- Complete catalog of your PDF library with metadata- Example: Lists all PDFs with file sizes, modification dates, and direct links
Notes Server Resources
notes://files/- Your complete note collection with previews and tagsnotes://templates/- Available note templates with descriptionsnotes://collections/- Notes organized by folders and tags
Resource Benefits:
- π Discovery: LLMs can explore without knowing file paths
- β‘ Efficiency: Batch metadata retrieval vs individual queries
- π§ Context: Rich metadata helps LLMs make smarter decisions
- πΊοΈ Navigation: Structured browsing of content hierarchies
βοΈ Configuration Options
PDF Server Arguments
./bin/pdf-server --help
| Argument | Required | Description | Environment Variable |
|---|---|---|---|
--credentials | Yes | Path to Google Cloud credentials JSON | GOOGLE_APPLICATION_CREDENTIALS |
--folder-id | Yes | Google Drive folder ID to search | GCP_FOLDER_ID |
--mathpix-app-id | Yes | Mathpix API App ID | MATHPIX_APP_ID |
--mathpix-app-key | Yes | Mathpix API App Key | MATHPIX_APP_KEY |
--ocr-languages | No | Comma-separated language codes (default: "en") | - |
--log-level | No | Log level: DEBUG, INFO, WARN, ERROR (default: INFO) | - |
--log-file | No | Log file path (default: stderr) | - |
Notes Server Arguments
./bin/notes-server --help
| Argument | Required | Description |
|---|---|---|
--notesFolder | Yes | Path to your notes directory |
--logLevel | No | Log level: DEBUG, INFO, WARN, ERROR (default: INFO) |
--logFile | No | Log file path (default: stderr) |
π§ͺ Development & Testing
Running Tests
# Run all tests
go test ./...
# Run tests with coverage
go test -race -coverprofile=coverage.out -covermode=atomic ./pkg/...
go tool cover -html=coverage.out -o coverage.html
# Run specific package tests
go test ./pkg/pdfmcp
go test ./pkg/notes
# Run integration tests
go test ./tests/integration/...
Code Quality
# Static analysis
go vet ./...
staticcheck ./...
# Security scanning
gosec ./...
# Comprehensive linting
golangci-lint run --timeout=5m
Project Structure
sibyl/
βββ cmd/ # Main applications
β βββ pdfserver/ # PDF MCP server entry point
β βββ noteserver/ # Notes MCP server entry point
βββ pkg/ # Reusable packages
β βββ pdfmcp/ # PDF server implementation
β βββ notes/ # Notes server implementation
β βββ dto/ # Data transfer objects
β βββ utils/ # Shared utilities
βββ tests/ # Testing infrastructure
β βββ integration/ # End-to-end tests
β βββ testutils/ # Test helpers
βββ examples/ # Usage examples and configs
βββ bin/ # Built binaries
π§ Troubleshooting
Common Issues
PDF Server won't start:
- β Check Google Cloud credentials file exists and is valid
- β Verify Google Drive API is enabled in your GCP project
- β Ensure service account has access to your Drive folder
- β Confirm Mathpix credentials are correct
Notes Server can't find notes:
- β Verify notes folder path exists and is readable
- β Check that notes are in Markdown format (.md extension)
- β Ensure proper file permissions
MCP Host can't connect:
- β Use absolute paths in MCP configuration
- β Restart your MCP host after configuration changes
- β Check server logs for startup errors
Log Files
Enable debug logging for troubleshooting:
# PDF Server
./bin/pdf-server --log-level DEBUG --log-file pdf-server.log [other args...]
# Notes Server
./bin/notes-server --logLevel DEBUG --logFile notes-server.log [other args...]
π Advanced Usage
Custom OCR Languages
Mathpix supports 40+ languages. Specify multiple languages:
--ocr-languages "en,fr,de,es,zh,ja,ko"
Batch Processing
Process multiple PDFs efficiently:
- Use
pdf://documents/resource to get complete file list - Filter by date, size, or name patterns
- Process each with
convert_pdf_to_markdown - Use
merge_notewithtopic_mergestrategy for intelligent consolidation
Custom Templates
Create your own note templates by examining the built-in ones:
# Get template structure
curl -X POST http://localhost/mcp \
-d '{"method": "tools/call", "params": {"name": "get_note_templates"}}'
π€ Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Ensure all tests pass:
make test - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Style
- Follow Go conventions and use
gofmt - Add tests for new functionality
- Update documentation for user-facing changes
- Use structured logging with
slog
βοΈ License & Legal Notice
Primary License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
π¨ Important: Third-Party Licensing Dependencies
Sibyl uses MuPDF for PDF processing, which has specific licensing requirements:
MuPDF Licensing (AGPL v3)
- Library:
github.com/gen2brain/go-fitz(Go wrapper for MuPDF) - Underlying Library: MuPDF (licensed under AGPL v3)
- Licensor: Artifex Software, Inc.
AGPL v3 Requirements
β οΈ If you use Sibyl in any of the following ways, your entire application MUST be licensed under AGPL v3:
- Network Services - Running Sibyl as a web service, API, or SaaS
- Modified Distribution - Distributing modified versions of Sibyl
- Integration - Incorporating Sibyl into other software
- Internal Use - Using Sibyl within an organization over a network
AGPL v3 Obligations
When AGPL applies, you must:
- β License your entire application under AGPL v3 or compatible
- β Provide complete source code to all users
- β Include license notices and copyright information
- β Ensure users can rebuild your application from source
Commercial Licensing Alternative
If you cannot comply with AGPL v3 requirements:
π’ Contact Artifex Software for a commercial MuPDF license:
- Website: https://mupdf.com/licensing/
- Removes AGPL obligations
- Enables proprietary/commercial deployment
- Required for closed-source applications
Other Dependencies (Permissively Licensed)
All other dependencies use permissive licenses:
github.com/mark3labs/mcp-go- MIT Licensegithub.com/joho/godotenv- MIT Licensegoogle.golang.org/api- Apache License 2.0- Go standard library - BSD-style License
License Compatibility Summary
| Use Case | License Required | Commercial License Needed |
|---|---|---|
| Open source project (AGPL v3) | β Free | β No |
| Internal company tool | β οΈ AGPL v3 | β No* |
| SaaS/Web service | β οΈ AGPL v3 | β No* |
| Proprietary software | β Not possible | β Yes |
| Commercial distribution | β Not possible | β Yes |
*As long as you comply with AGPL v3 source distribution requirements
Recommendation for Users
- Open Source Projects: Use Sibyl freely under AGPL v3
- Commercial Projects: Evaluate if AGPL v3 compliance is feasible
- Proprietary Products: Contact Artifex for commercial licensing
- Consulting: Consider legal review for complex licensing scenarios
For questions about licensing compliance, consult with a qualified legal professional.
π Acknowledgments
- MCP Go SDK - MCP server implementation
- Mathpix - OCR processing service
- MuPDF - PDF processing and rendering
- Google Drive API - Document storage and access
