Datahill
An MCP service that automates data analysis through IPython sessions.
Installation
npx datahillAsk AI about Datahill
Powered by Claude · Grounded in docs
I know everything about Datahill. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
IPython 数据分析 MCP 服务器 / IPython Data Analysis MCP Server
中文版本
基于真正 IPython 内核的轻量级数据分析 MCP (Model Context Protocol) 工具,提供完整的交互式 Python 数据分析环境,支持会话管理、数据加载、实时数据查看等核心功能。
🚀 核心特性
- 真正的 IPython 环境: 基于 IPython InteractiveShell,支持所有 IPython 功能
- 多会话管理: 独立的会话空间,变量隔离,持久化状态
- 智能数据加载: 支持 CSV/Excel/JSON,自动编码检测,智能变量命名
- 实时监控: 内存使用监控、变量管理、执行历史追踪
- 完整功能支持: Python代码、IPython魔法命令、系统命令执行
- 智能采样: 大数据集友好的列数据查看,避免上下文溢出
📋 功能清单
17个核心工具函数
-
会话管理
create_ipython_session- 创建新的 IPython 会话list_ipython_sessions- 列出所有活跃会话get_session_status- 获取会话详细状态delete_ipython_session- 删除指定会话
-
代码执行
execute_code- 执行 Python 代码、魔法命令、系统命令get_execution_history- 获取执行历史记录
-
数据加载
load_csv_file- 加载 CSV 文件(自动编码检测)load_excel_file- 加载 Excel 文件(支持 .xlsx/.xls)load_json_file- 加载 JSON 文件
-
数据操作与查看
list_dataframes- 列出会话中所有 DataFrameget_dataframe_info- 获取 DataFrame 详细信息preview_dataframe- 预览 DataFrame 数据get_dataframe_summary- 获取统计摘要sample_column_data- 智能采样查看列数据
-
内存与变量管理
check_memory_usage- 检查内存使用情况get_variable_info- 获取变量详细信息clear_variables- 清理变量释放内存
🛠️ 安装配置
方法一:使用 uvx 直接运行(推荐)
无需克隆项目,直接使用 uvx 从 GitHub 运行:
# 安装 uvx(如果还没有安装)
pip install uvx
# 直接运行 MCP 服务器
uvx --from git+https://github.com/Hillyess/dataHill.git DATA_MCP.py
方法二:本地安装开发
# 1. 克隆项目
git clone git@github.com:Hillyess/dataHill.git
cd dataHill
# 2. 创建虚拟环境
conda create -n data-analyzer python=3.10
conda activate data-analyzer
# 3. 安装依赖
pip install -r requirements.txt
# 4. 测试安装
python DATA_MCP.py
配置 MCP 客户端
Claude Desktop 配置
编辑 Claude Desktop 配置文件:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
推荐配置(使用 uvx):
{
"mcpServers": {
"dataHill": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/Hillyess/dataHill.git",
"DATA_MCP.py"
]
}
}
}
本地开发配置(如果使用方法二):
{
"mcpServers": {
"dataHill": {
"command": "python",
"args": ["/path/to/your/DATA_MCP.py"],
"env": {
"PYTHONPATH": "/path/to/your/project"
}
}
}
}
📖 使用指南
基本工作流程
# 1. 创建会话
create_ipython_session()
# 返回: {"success": true, "session_id": "session_a1b2c3d4", ...}
# 2. 加载数据
load_csv_file("data.csv", "session_a1b2c3d4", "df")
# 3. 查看数据信息
get_dataframe_info("df", "session_a1b2c3d4")
# 4. 智能采样查看数据
sample_column_data("df", "column_name", "session_a1b2c3d4", method="mixed", sample_size=20)
# 5. 执行分析
execute_code("df.describe()", "session_a1b2c3d4")
# 6. 内存监控
check_memory_usage("session_a1b2c3d4")
# 7. 清理会话
delete_ipython_session("session_a1b2c3d4")
🔧 系统要求
- Python: 3.8+
- 内存: 建议 4GB+ (取决于数据规模)
- 操作系统: Windows/macOS/Linux
- MCP 客户端: Claude Desktop 或其他支持 stdio 的 MCP 客户端
📦 依赖项
核心依赖
fastmcp>=0.5.0- MCP 服务器框架ipython>=8.0.0- IPython 交互式环境pandas>=2.0.0- 数据处理和分析numpy>=1.24.0- 数值计算基础库
数据支持
openpyxl>=3.1.0- Excel .xlsx 文件支持xlrd>=2.0.0- Excel .xls 文件支持
系统监控
psutil>=5.9.0- 内存和系统监控
🤝 贡献指南
- Fork 本项目
- 创建特性分支 (
git checkout -b feature/AmazingFeature) - 提交更改 (
git commit -m 'Add some AmazingFeature') - 推送到分支 (
git push origin feature/AmazingFeature) - 开启 Pull Request
📄 许可证
本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。
🙋♂️ 支持与反馈
- 问题报告: GitHub Issues
- 功能请求: GitHub Discussions
English Version
A lightweight data analysis MCP (Model Context Protocol) tool based on real IPython kernel, providing complete interactive Python data analysis environment with session management, data loading, real-time data viewing and other core functions.
🚀 Core Features
- Real IPython Environment: Based on IPython InteractiveShell, supports all IPython features
- Multi-Session Management: Independent session spaces, variable isolation, persistent state
- Intelligent Data Loading: Supports CSV/Excel/JSON, automatic encoding detection, smart variable naming
- Real-time Monitoring: Memory usage monitoring, variable management, execution history tracking
- Complete Feature Support: Python code, IPython magic commands, system command execution
- Smart Sampling: Large dataset friendly column data viewing, avoiding context overflow
📋 Feature List
17 Core Tool Functions
-
Session Management
create_ipython_session- Create new IPython sessionlist_ipython_sessions- List all active sessionsget_session_status- Get detailed session statusdelete_ipython_session- Delete specified session
-
Code Execution
execute_code- Execute Python code, magic commands, system commandsget_execution_history- Get execution history
-
Data Loading
load_csv_file- Load CSV files (automatic encoding detection)load_excel_file- Load Excel files (supports .xlsx/.xls)load_json_file- Load JSON files
-
Data Operations & Viewing
list_dataframes- List all DataFrames in sessionget_dataframe_info- Get detailed DataFrame informationpreview_dataframe- Preview DataFrame dataget_dataframe_summary- Get statistical summarysample_column_data- Smart sampling for column data viewing
-
Memory & Variable Management
check_memory_usage- Check memory usageget_variable_info- Get detailed variable informationclear_variables- Clear variables to free memory
🛠️ Installation & Configuration
Method 1: Direct Run with uvx (Recommended)
No need to clone the project, run directly from GitHub using uvx:
# Install uvx (if not already installed)
pip install uvx
# Run MCP server directly
uvx --from git+https://github.com/Hillyess/dataHill.git DATA_MCP.py
Method 2: Local Installation for Development
# 1. Clone project
git clone git@github.com:Hillyess/dataHill.git
cd dataHill
# 2. Create virtual environment
conda create -n data-analyzer python=3.10
conda activate data-analyzer
# 3. Install dependencies
pip install -r requirements.txt
# 4. Test installation
python DATA_MCP.py
Configure MCP Client
Claude Desktop Configuration
Edit Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Recommended Configuration (using uvx):
{
"mcpServers": {
"dataHill": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/Hillyess/dataHill.git",
"DATA_MCP.py"
]
}
}
}
Local Development Configuration (if using Method 2):
{
"mcpServers": {
"dataHill": {
"command": "python",
"args": ["/path/to/your/DATA_MCP.py"],
"env": {
"PYTHONPATH": "/path/to/your/project"
}
}
}
}
📖 Usage Guide
Basic Workflow
# 1. Create session
create_ipython_session()
# Returns: {"success": true, "session_id": "session_a1b2c3d4", ...}
# 2. Load data
load_csv_file("data.csv", "session_a1b2c3d4", "df")
# 3. View data information
get_dataframe_info("df", "session_a1b2c3d4")
# 4. Smart sampling for data viewing
sample_column_data("df", "column_name", "session_a1b2c3d4", method="mixed", sample_size=20)
# 5. Execute analysis
execute_code("df.describe()", "session_a1b2c3d4")
# 6. Memory monitoring
check_memory_usage("session_a1b2c3d4")
# 7. Clean up session
delete_ipython_session("session_a1b2c3d4")
🔧 System Requirements
- Python: 3.8+
- Memory: Recommended 4GB+ (depends on data scale)
- Operating System: Windows/macOS/Linux
- MCP Client: Claude Desktop or other stdio-supported MCP clients
📦 Dependencies
Core Dependencies
fastmcp>=0.5.0- MCP server frameworkipython>=8.0.0- IPython interactive environmentpandas>=2.0.0- Data processing and analysisnumpy>=1.24.0- Numerical computation foundation
Data Support
openpyxl>=3.1.0- Excel .xlsx file supportxlrd>=2.0.0- Excel .xls file support
System Monitoring
psutil>=5.9.0- Memory and system monitoring
🤝 Contributing
- Fork this project
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙋♂️ Support & Feedback
- Issue Reports: GitHub Issues
- Feature Requests: GitHub Discussions
⭐ If this project helps you, please give us a Star!
