Data Query Builder MCP
MCP Server that turns natural-language questions into SQL queries on CSV data. Built with Python, FastMCP, and SQLite for Gemini CLI integration.
Ask AI about Data Query Builder MCP
Powered by Claude Β· Grounded in docs
I know everything about Data Query Builder MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Data Query Builder β MCP Server
Overview
An MCP server that turns natural-language data questions into SQL queries. Load any CSV file into an in-memory SQLite database, explore schemas, run read-only queries, and compute column statistics β all through AI-powered tool use via Gemini CLI.
Setup
cd Project
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
# source .venv/bin/activate
pip install "mcp[cli]"
No additional dependencies β sqlite3 and csv are part of the Python standard library.
Gemini CLI Configuration
Add the following to ~/.gemini/settings.json:
{
"mcpServers": {
"data-query-builder": {
"command": "C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\.venv\\Scripts\\python.exe",
"args": ["C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\server.py"]
}
}
}
After editing, relaunch Gemini CLI to pick up the new server.
Tools
load_csv
Description: Load a CSV file into a new SQLite table with auto-detected column types (INTEGER, REAL, TEXT).
| Parameter | Type | Required | Description |
|---|---|---|---|
file_path | str | Yes | Absolute or relative path to the CSV file |
table_name | str | Yes | Name for the new SQLite table |
Example: "Load the file sample_data.csv as a table called sales" β calls load_csv
list_tables
Description: List all tables currently loaded in the database with their row counts and column info.
| Parameter | Type | Required | Description |
|---|---|---|---|
| (none) |
Example: "What data do I have loaded?" β calls list_tables
describe_schema
Description: Show the full database schema β all tables, columns, and their data types.
| Parameter | Type | Required | Description |
|---|---|---|---|
| (none) |
Example: "What columns does the sales table have?" β calls describe_schema
run_query
Description: Execute a read-only SQL SELECT query and return formatted results. Rejects any write operations (DROP, DELETE, ALTER, INSERT, UPDATE) for safety.
| Parameter | Type | Required | Description |
|---|---|---|---|
sql | str | Yes | A SQL SELECT query |
limit | int | No | Max rows to return (default 50, max 500) |
Example: "Show me the average price per category" β calls run_query with SELECT category, AVG(price) FROM sales GROUP BY category
get_statistics
Description: Compute summary statistics for a column β count, min, max, mean, sum, and null count (numeric columns) or count, distinct, nulls (text columns).
| Parameter | Type | Required | Description |
|---|---|---|---|
table_name | str | Yes | Name of the table to analyze |
column | str | Yes | Column to compute statistics for |
Example: "Give me stats on the price column" β calls get_statistics with table_name="sales", column="price"
Resources
db://schema
Current database schema as JSON β lists all tables, their columns, and data types.
db://query-history
JSON list of all SQL queries executed this session, including SQL text, row count, and column names.
Security
run_queryrejects any SQL containingDROP,DELETE,ALTER,INSERT,UPDATE,CREATE, or other write keywords.- The database is in-memory only β no persistent changes to disk.
- A default row limit of 50 prevents excessively large outputs.
Limitations
- In-memory database: Data does not persist between server restarts. CSVs must be reloaded each session.
- CSV only: Does not support Excel, JSON, or other file formats directly.
- No joins at load time: Each CSV becomes a separate table; joins must be done via SQL queries.
- Type detection is heuristic: Based on the first 100 rows β mixed-type columns may be misdetected.
Test Scenarios
| Scenario | Expected Tool Sequence |
|---|---|
| "Load sales data and find highest revenue region" | load_csv β describe_schema β run_query |
| "Average price per product category?" | load_csv β run_query β get_statistics |
| "What tables are loaded and what columns do they have?" | list_tables β describe_schema |
Comparison Results
With vs. Without Tools
| Dimension | Without Tools | With Tools |
|---|---|---|
| Accuracy | Hallucinates numbers or uses only pasted data | Uses real computed values from SQL queries |
| Specificity | Generic advice about data analysis | Specific answers from actual data |
| Completeness | Limited to what's in the prompt | Can explore schema, run multiple queries |
| Confidence | Hedges and qualifies heavily | Cites exact tool results and row counts |
| Latency | One fast response | Multiple tool-call round-trips |
Prompting Strategy Comparison
| Aspect | Strategy 1 (Minimal) | Strategy 3 (Expert Workflow) |
|---|---|---|
| Tool calls triggered | Fewer, less targeted | More, systematically sequenced |
| Planning | Dives in immediately | States plan before acting |
| Synthesis quality | Shallow summary | Structured analysis citing tool outputs |
| Errors / dead ends | May skip schema exploration | Explores schema first, then queries purposefully |
