📦

Data Query Builder MCP

MCP Server that turns natural-language questions into SQL queries on CSV data. Built with Python, FastMCP, and SQLite for Gemini CLI integration.

0 installs

Trust: 34 — Low

Files

Ask AI about Data Query Builder MCP

I know everything about Data Query Builder MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Data Query Builder — MCP Server

Overview

An MCP server that turns natural-language data questions into SQL queries. Load any CSV file into an in-memory SQLite database, explore schemas, run read-only queries, and compute column statistics — all through AI-powered tool use via Gemini CLI.

Setup

cd Project
python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
# source .venv/bin/activate

pip install "mcp[cli]"

No additional dependencies — sqlite3 and csv are part of the Python standard library.

Gemini CLI Configuration

Add the following to ~/.gemini/settings.json:

{
  "mcpServers": {
    "data-query-builder": {
      "command": "C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\.venv\\Scripts\\python.exe",
      "args": ["C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\server.py"]
    }
  }
}

After editing, relaunch Gemini CLI to pick up the new server.

Tools

`load_csv`

Description: Load a CSV file into a new SQLite table with auto-detected column types (INTEGER, REAL, TEXT).

Parameter	Type	Required	Description
`file_path`	str	Yes	Absolute or relative path to the CSV file
`table_name`	str	Yes	Name for the new SQLite table

Example: "Load the file sample_data.csv as a table called sales" → calls load_csv

`list_tables`

Description: List all tables currently loaded in the database with their row counts and column info.

Parameter	Type	Required	Description
(none)

Example: "What data do I have loaded?" → calls list_tables

`describe_schema`

Description: Show the full database schema — all tables, columns, and their data types.

Parameter	Type	Required	Description
(none)

Example: "What columns does the sales table have?" → calls describe_schema

`run_query`

Description: Execute a read-only SQL SELECT query and return formatted results. Rejects any write operations (DROP, DELETE, ALTER, INSERT, UPDATE) for safety.

Parameter	Type	Required	Description
`sql`	str	Yes	A SQL SELECT query
`limit`	int	No	Max rows to return (default 50, max 500)

Example: "Show me the average price per category" → calls run_query with SELECT category, AVG(price) FROM sales GROUP BY category

`get_statistics`

Description: Compute summary statistics for a column — count, min, max, mean, sum, and null count (numeric columns) or count, distinct, nulls (text columns).

Parameter	Type	Required	Description
`table_name`	str	Yes	Name of the table to analyze
`column`	str	Yes	Column to compute statistics for

Example: "Give me stats on the price column" → calls get_statistics with table_name="sales", column="price"

Resources

`db://schema`

Current database schema as JSON — lists all tables, their columns, and data types.

`db://query-history`

JSON list of all SQL queries executed this session, including SQL text, row count, and column names.

Security

run_query rejects any SQL containing DROP, DELETE, ALTER, INSERT, UPDATE, CREATE, or other write keywords.
The database is in-memory only — no persistent changes to disk.
A default row limit of 50 prevents excessively large outputs.

Limitations

In-memory database: Data does not persist between server restarts. CSVs must be reloaded each session.
CSV only: Does not support Excel, JSON, or other file formats directly.
No joins at load time: Each CSV becomes a separate table; joins must be done via SQL queries.
Type detection is heuristic: Based on the first 100 rows — mixed-type columns may be misdetected.

Test Scenarios

Scenario	Expected Tool Sequence
"Load sales data and find highest revenue region"	`load_csv` → `describe_schema` → `run_query`
"Average price per product category?"	`load_csv` → `run_query` → `get_statistics`
"What tables are loaded and what columns do they have?"	`list_tables` → `describe_schema`

Comparison Results

With vs. Without Tools

Dimension	Without Tools	With Tools
Accuracy	Hallucinates numbers or uses only pasted data	Uses real computed values from SQL queries
Specificity	Generic advice about data analysis	Specific answers from actual data
Completeness	Limited to what's in the prompt	Can explore schema, run multiple queries
Confidence	Hedges and qualifies heavily	Cites exact tool results and row counts
Latency	One fast response	Multiple tool-call round-trips

Prompting Strategy Comparison

Aspect	Strategy 1 (Minimal)	Strategy 3 (Expert Workflow)
Tool calls triggered	Fewer, less targeted	More, systematically sequenced
Planning	Dives in immediately	States plan before acting
Synthesis quality	Shallow summary	Structured analysis citing tool outputs
Errors / dead ends	May skip schema exploration	Explores schema first, then queries purposefully