π¦
Mcpcorpus
MCPCorpus is a comprehensive dataset for analyzing the Model Context Protocol (MCP) ecosystem, containing ~14K MCP servers and 300 MCP clients with 20+ normalized metadata attributes.
0 installs
32 stars
6 forks
Trust: 46 β Fair
Ai
Installation
npx mcpcorpusAsk AI about Mcpcorpus
Powered by Claude Β· Grounded in docs
I know everything about Mcpcorpus. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
MCPCorpus: A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis
MCPCorpus is a comprehensive dataset for analyzing the Model Context Protocol (MCP) ecosystem, containing ~14K MCP servers and 300 MCP clients with 20+ normalized metadata attributes.
π Dataset Overview
- Scale: ~14K MCP servers + 300 MCP clients
- Attributes: 20+ normalized metadata per artifact
- Sources: GitHub, community hubs, package managers
- Applications: Ecosystem analysis, security research, interoperability studies
π Structure
MCPCorpus/
βββ Crawler/ # Data collection tools
β βββ Servers/ # Server data
β βββ Clients/ # Client data
β βββ github_info_collector.py # GitHub metadata collector
| βββ data_cleaner.py # Data normalization
β βββ tool_extractor.py # mcp tool extract
βββ Website/ # Web search interface
βββ server.py # Local web server
βββ index.html # Search interface
βββ mcpso_servers_cleaned.json
βββ mcpso_clients_cleaned.json
π Quick Start
Explore Dataset
cd Website
python server.py
# Open http://localhost:8000
Access Data Programmatically
import json
import pandas as pd
# Load datasets
with open('Crawler/Servers/mcpso_servers_cleaned.json', 'r') as f:
servers = json.load(f)
with open('Crawler/Clients/mcpso_clients_cleaned.json', 'r') as f:
clients = json.load(f)
# Convert to DataFrame
servers_df = pd.DataFrame(servers)
clients_df = pd.DataFrame(clients)
Update Dataset (Optional)
# Collect new data
cd Crawler/Servers && python Server_request.py
cd ../Clients && python Client_request.py
# Add GitHub metadata
cd .. && python github_info_collector.py --token YOUR_GITHUB_TOKEN
π Citation
If you use MCPCorpus in your research, please cite it as:
@misc{lin2025largescaleevolvabledatasetmodel,
title={A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis},
author={Zhiwei Lin and Bonan Ruan and Jiahao Liu and Weibo Zhao},
year={2025},
eprint={2506.23474},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2506.23474},
}
