io.github.mlintangmz2765/scholar
Hardened Scholar MCP for deep academic research (Scopus, OpenAlex, Unpaywall) with PDF vision.
Ask AI about io.github.mlintangmz2765/scholar
Powered by Claude Β· Grounded in docs
I know everything about io.github.mlintangmz2765/scholar. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Scholar MCP Server
A Model Context Protocol (MCP) server providing structured access to scientific literature databases. It serves as a unified interface for Scopus, OpenAlex, Semantic Scholar, and Unpaywall, enabling AI agents to perform systematic paper discovery, author disambiguation, citation lineage tracking, and multimodal Content extraction.
Core Capabilities
-
Unified Literature Search
- Semantic Scholar Integration β High-relevance search and detailed metadata, including AI-generated TLDRs (requires API key).
- Scopus Integration β Targeted metadata retrieval via advanced Boolean syntax (requires API key).
- OpenAlex Integration β Broad search across 250M+ works with abstract reconstruction.
- Unpaywall Resolution β DOI-to-PDF cross-referencing across global Open Access repositories.
- Sci-Hub Fallback (β οΈ Use with Caution) β Automatic mirror resolution and parsing for paywall bypassing.
-
Book Search & Extraction
- Google Books & Open Library β Integrated search for book metadata, editions, and descriptions without API keys.
- Library Genesis (Libgen) β Search and extract full text from books directly using PyMuPDF. Features smart caching and token-saving strategies (TOC reading, targeted keyword searching, and page range extraction).
-
Author Identification & Metrics
- Instant author disambiguation and ID resolution via OpenAlex autocomplete.
- Comprehensive profiles: H-index, i10-index, institutional affiliation history, and ORCID linkage.
- Precision metrics from Elsevier (Scopus) for verified publication counts.
-
Citation Lineage Tracking
- Map research evolution through forward citations (citing works) and backward references (cited works).
-
Structured & Multimodal Extraction
- Text Extraction β Layout-aware parsing of OA PDFs using PyMuPDF.
- Vision Rendering β Page-by-page PNG rendering for LLM-based analysis of charts, tables, and equations.
- HTML Fallbacks β Extraction from web-based research resources via BeautifulSoup.
-
Topic Mapping & Field Analysis
- Concepts and domain hierarchy discovery to map research landscapes.
- Batch metadata retrieval for high-throughput literature processing (up to 50 DOIs/request).
-
Access Management & Fallbacks
- Automated detection of closed-access content with human-in-the-loop instructions for manual uploads.
Architecture
graph TD
A[LLM Agent] -->|MCP Protocol| B(Scholar MCP Server)
B --> C{Database Router}
C -->|Primary| D[Scopus API]
C -->|Fallback| E[OpenAlex API]
C -->|DOI Resolver| F[Unpaywall API]
C -->|Citations| P[CrossRef API]
D --> G{Access Check}
E --> G
F --> G
P --> G
G -->|Open Access| H[PDF Buffer Download]
G -->|Closed Access| I[Human-in-the-Loop Prompt]
H --> J[PyMuPDF Text Extractor]
H --> K[PyMuPDF Vision Renderer]
J --> L[Return Context to LLM]
K --> L
I --> L
B --> M{Author Router}
M -->|Profile| N[OpenAlex Authors API]
M -->|Metrics| O[Scopus Author API]
N --> L
O --> L
Installation
Quick Start (via PyPI)
The fastest way to use the server is directly via PyPI:
pip install scholar-academic-mcp
Manual Setup (for Development)
# Clone the repository
git clone https://github.com/mlintangmz2765/Scholar-MCP.git
cd Scholar-MCP
# Setup virtual environment
python -m venv venv
.\venv\Scripts\activate # Windows
source venv/bin/activate # Unix
# Install in editable mode
pip install -e .
Environment Variables
| Variable | Required | Description |
|---|---|---|
SCOPUS_API_KEY | Yes | Elsevier API key for Scopus search and author retrieval. |
S2_API_KEY | No | Semantic Scholar API key for TLDRs and S2 graph access. |
SCIHUB_MIRRORS | No | Comma-separated list of active Sci-Hub mirrors for PDF fallback. |
LIBGEN_MIRRORS | No | Comma-separated list of active Library Genesis mirrors. |
SCOPUS_INST_TOKEN | No | Institutional token for full abstract access via Scopus. |
CONTACT_EMAIL | Yes | Email for OpenAlex/Unpaywall polite-pool API routing. |
Configuration
Claude Desktop / Cursor
Add the following to your configuration file (e.g., claude_desktop_config.json):
{
"mcpServers": {
"scholar-academic-mcp": {
"command": "scholar-academic-mcp",
"env": {
"SCOPUS_API_KEY": "your_scopus_api_key",
"S2_API_KEY": "your_s2_api_key",
"SCIHUB_MIRRORS": "https://sci-hub.ru,https://sci-hub.st",
"LIBGEN_MIRRORS": "https://libgen.la,http://libgen.li",
"SCOPUS_INST_TOKEN": "your_optional_inst_token",
"CONTACT_EMAIL": "your_email@domain.com"
}
}
}
}
Quick Start & Examples
Once configured, your AI agent can perform complex research workflows. Below are representative examples of tool inputs and structured outputs.
1. Literature Discovery (Scopus)
Prompt: "Find recent papers about 'Transformer architectures' published after 2022 using Scopus."
Tool Call: search_papers_tool(query="TITLE-ABS-KEY(Transformer architectures) AND PUBYEAR > 2022", limit=3)
Output:
Found 3 papers via Scopus:
- [SCOPUS_ID:85184...] Attention is All You Need? A Survey of Transformer Variants
Authors: Smith, J., Doe, A.
Date: 2024-01-15 | DOI: 10.1016/j.artint.2023.104012
2. Multimodal Content Analysis
Prompt: "I need to see the diagram for the neural network architecture on page 3 of this URL."
Tool Call: get_full_text_visual_tool(url="https://arxiv.org/pdf/1706.03762.pdf", max_pages=3)
Output:
[Text]"Successfully rendered 3 pages visually..."[Image](PNG data of page 1)[Image](PNG data of page 2)[Image](PNG data of page 3 - containing the architecture diagram)
3. Research Topic Mapping
Prompt: "Help me understand the subfields and domains related to 'Generative AI'."
Tool Call: search_topics_tool(query="Generative AI")
Output:
Found 1 topics for 'Generative AI':
- Artificial Intelligence
Hierarchy: Computer Science β Artificial Intelligence β Machine Learning
Works: 12,450 | Citations: 450,210
Description: A field of computer science that focuses on creating systems capable of generating...
Tools
The server registers 23 tools across 7 categories:
Paper Discovery
| Tool | Signature | Description |
|---|---|---|
search_papers_tool | (query, limit=5, use_scopus=True, sort_by="relevance") | Search papers via Scopus (Boolean syntax) or OpenAlex. Sort by cited_by_count or publication_year. |
search_papers_s2_tool | (query, limit=5) | Search papers via Semantic Scholar. Note: strictly rate-limited to 1 request/sec. |
get_paper_details_tool | (paper_id) | Fetch full metadata and abstract by Scopus ID, DOI, or OpenAlex ID (with automatic routing). |
get_paper_details_s2_tool | (paper_id) | Fetch full metadata from Semantic Scholar, including AI-generated TLDRs. Accepts S2 ID or DOI. |
search_titles_unpaywall_tool | (query, is_oa=None) | Search Unpaywall's database directly by title. Set is_oa=True for strictly OA results. |
get_related_works_tool | (paper_id, limit=10) | Find related/similar papers using OpenAlex's bibliographic coupling. |
Book Discovery & Extraction
| Tool | Signature | Description |
|---|---|---|
search_books_tool | (query, limit=5, source="googlebooks") | Search for book metadata via Google Books or Open Library. |
get_book_details_tool | (book_id, source="googlebooks") | Fetch complete book details, descriptions, and ISBNs. |
search_libgen_tool | (query, limit=5) | Search Library Genesis for books to retrieve their download MD5 hashes. |
interact_with_book_tool | (md5, action, keyword, start_page, end_page) | Smart extraction from Libgen. Actions: toc (Table of Contents), search (keywords), pages (range). |
Author Analytics
| Tool | Signature | Description |
|---|---|---|
autocomplete_authors_tool | (name, limit=5) | Rapidly disambiguate author names and resolve OpenAlex Author IDs. |
search_authors_tool | (name, institution=None, limit=5) | Detailed bibliometric profiles: H-index, i10-index, ORCID, and research concepts. |
search_author_by_orcid_tool | (orcid) | Look up an author directly by ORCID (raw or URL format). |
retrieve_author_works_tool | (author_id, limit=15) | Chronologically sorted publications for a given OpenAlex author. |
get_author_profile_scopus_tool | (author_id) | Fetch precise Scopus-sourced h-index, citation counts, and affiliation. |
get_author_profile_s2_tool | (author_id) | Fetch Semantic Scholar author profile (H-index, paper count, citations). |
Citation Tracking
| Tool | Signature | Description |
|---|---|---|
get_citations_tool | (paper_id, direction="references") | Retrieve forward citations or backward references via OpenAlex. |
Full-Text & PDF
| Tool | Signature | Description |
|---|---|---|
get_full_text_tool | (url, start_page=None, end_page=None) | Extract text from an OA PDF or HTML page. Supports page range selection. |
get_full_text_visual_tool | (url, max_pages=3) | Render PDF pages as images for Vision-capable LLMs. |
fetch_pdf_text_unpaywall_tool | (doi) | All-in-one: resolve DOI via Unpaywall β download PDF β extract text. |
get_scihub_link_tool | (doi) | Attempts to resolve a strict paywalled DOI to a free direct PDF link using Sci-Hub. |
fetch_pdf_text_scihub_tool | (doi) | All-in-one bypass: resolve DOI via Sci-Hub β download PDF β extract text. |
Citation & Writing
| Tool | Signature | Description |
|---|---|---|
get_bibtex_tool | (doi) | Generate a BibTeX entry for LaTeX via CrossRef content negotiation. |
format_citation_tool | (doi, style="apa") | Format citation in APA, IEEE, Chicago, Harvard, Vancouver, MLA, or Turabian. |
Open Access Resolution
| Tool | Signature | Description |
|---|---|---|
get_unpaywall_link_tool | (doi) | Resolve a DOI to all available OA locations via Unpaywall. |
Topic Mapping & Batch Analysis
| Tool | Signature | Description |
|---|---|---|
search_topics_tool | (query, limit=10) | Browse research topics/concepts. Returns fields, domains, and publication volume. |
batch_lookup_tool | (dois: list[str]) | Batch-fetch metadata for multiple DOIs in a single call (max 50). |
Technical Design & Reliability
Scholar MCP is engineered for precision and fault tolerance in high-stakes research environments, utilizing several layers of protection to ensure data integrity:
-
Strict Data Contracts (Pydantic)
- All upstream API responses are validated against Pydantic models before being returned to the agent.
- Ensures a predictable, type-safe interface even if upstream database schemas change.
-
Fault-Tolerant Networking (Tenacity)
- Integrated Exponential Backoff using
tenacityfor transient HTTP errors (429, 5xx). - Configurable rate-limit awareness for Elsevier and OpenAlex "polite pool" routing.
- Integrated Exponential Backoff using
-
Resource Safety & Concurrency
- Context-Managed Extractors: Automatic cleanup of PDF buffers and file descriptors.
- Isolated Concurrency: Batch operations utilize
asyncio.gatherwith localized exception handling to prevent session-wide failures.
-
System Observability
- Structured standard-error (
stderr) logging provides execution visibility during the tool lifecycle without interfering with the MCP JSON-RPC protocol.
- Structured standard-error (
-
Automated Verification
- Comprehensive test suite leveraging respx for deterministic API mocking, ensuring 100% coverage of edge cases without network externalites.
Project Structure
Scholar-MCP/
βββ .github/workflows/ # GitHub Actions (CI & Releases)
βββ scripts/ # Automation & Validation scripts
βββ tests/ # Pytest suite (respx mocked)
βββ server.py # FastMCP tool entry point
βββ api.py # API Clients (Scopus, OpenAlex, Unpaywall, CrossRef)
βββ extractor.py # PDF/HTML Extraction & Rendering
βββ models.py # Pydantic Data Validation
βββ server.json # MCP Registry Manifest
βββ pyproject.toml # Python packaging configuration
βββ requirements.txt # Dependencies
βββ VERSION # Version tracking (v1.0.0)
βββ LICENSE # MIT License
βββ README.md # Documentation
βββ .env.example # Template for API keys
βββ .gitignore # Git exclusion rules
Troubleshooting
| Symptom | Cause | Resolution |
|---|---|---|
HTTP 401 from Scopus | Standard API keys lack META_ABS view access. | Set SCOPUS_INST_TOKEN or use OpenAlex as fallback. |
HTTP 403 on PDF download | Publisher anti-bot protection (Cloudflare, DataDome). | Provide the PDF manually to the LLM. |
| Empty Unpaywall results | Paper is behind a strict paywall with no OA copies. | Request the PDF from the author via ResearchGate or institutional access. |
SCOPUS_API_KEY is not set | Missing environment variable. | Ensure .env is configured or pass via MCP client env block. |
Contributing
- Fork the repository.
- Create a feature branch (
git checkout -b feature/my-feature). - Commit your changes (
git commit -m 'feat: add new capability'). - Push to the branch (
git push origin feature/my-feature). - Open a Pull Request.
Please ensure all code follows PEP 8 conventions.
License
MIT License. See LICENSE for details.
Disclaimer: Automated querying of publisher APIs must comply with the respective Terms of Service of Elsevier, OpenAlex, and Unpaywall. Do not distribute API keys. Adhere to all applicable rate limits.
mcp-name: io.github.mlintangmz2765/scholar
