qsearch
Multi-engine search for AI agents. Trust scoring, local corpus, MCP-native. Self-hostable, BYOK.
Ask AI about qsearch
Powered by Claude Β· Grounded in docs
I know everything about qsearch. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
qsearch
I built this for my own daily research. After running 100+ research sprints, my agent kept hallucinating because it read 200-char snippets. qsearch gives it full content with multi-engine provenance β running locally, owned by me.
AI agents lose 17β33% of facts to hallucination because they read 200-character snippets, not full pages (Stanford 2024). Existing search APIs hide which engines agreed on a result. Existing knowledge graphs are enterprise-priced or vendor-locked.
qsearch is the open-source search layer that gives agents full content with multi-engine provenance β running on your machine, owned by you, ready for MCP today.
β v0.4.0 live at qsearch.pro. Multi-engine attribution, trust corpus with per-URL provenance (
engines[],sweep_count,trust_score), corpus viewer at/ui, MCP-over-HTTP for Claude Code and any spec-compliant client. π Vision: docs/VISION.md Β· Technical spec: docs/TRUST_MESH.md Β· Architecture: docs/FEDERATION_ARCHITECTURE.md
Quick start
# 1. Clone
git clone https://github.com/theYahia/qsearch.git
cd qsearch
# 2. Get a Brave Search API key (BYOK, $5/mo for ~1000 queries)
# β https://brave.com/search/api/ β sign up β copy key
# 3. Configure
cp .env.example .env.local
# Set BRAVE_API_KEY=your_key
# Set SEARXNG_URL=http://localhost:8888 (for multi-engine attribution)
# 4. Start infrastructure (Meilisearch + Qdrant + SearXNG)
docker compose up -d
# 5. Install & run
npm install
npm start # β qsearch v0.4.0 on http://localhost:8080
# 6. (Optional) MCP server for Claude Code / Workbench / OpenClaw
npm run start:mcp # β http://0.0.0.0:8081
# 7. Test multi-engine attribution
curl -X POST http://localhost:8080/sweep \
-H "Content-Type: text/plain" \
--data-binary $'t1|self-hosted search engine\n'
# β parsed_snippets.md with "Engines: google, duckduckgo, brave (count=3)"
BYOK design: Brave key + SearXNG instance both stay on your machine. No data exfiltration.
How I use it daily
Every research sprint I run a dual sweep:
# Brave sweep (primary, authoritative)
python research/scripts/brave_sweep.py queries.txt _raw_data/topic_2026-04-28/brave/
# qsearch sweep (secondary, auto-indexes into corpus)
curl -X POST http://localhost:8080/sweep?topic=my_topic \
-H "Content-Type: text/plain" --data-binary @queries.txt
After 10+ sprints on the same domain, /corpus/top?min_engines=3 shows which URLs survived multiple independent search engines across multiple sessions. Those are the ones I actually trust.
Why qsearch exists
Every AI agent today hits the same broken loop:
Agent β Tavily/Exa/Serper API β 200-char snippets β hallucinated answer
Three failures:
-
Snippets aren't enough. Stanford's 2024 production RAG audit measured 17β33% hallucination on Lexis+ AI and Westlaw despite "hallucination-free" claims. On Wikipedia QA, full content beats snippet-RAG by +7.3pp (arxiv 2501.01880).
-
No trust signal. Search APIs return ranked lists without telling you which engines agreed. SEO-spam at position 3 looks identical to authoritative source at position 4.
-
No memory. Every search starts from zero. The same trash gets surfaced again. The same authority goes unrecognized.
qsearch addresses all three:
- Full content fetched and cleaned, not just snippets.
engines[]field per result β Google + DDG + Brave + Qwant + Startpage attribution exposed (via SearXNG aggregation).- Local corpus accumulates β every URL grows a trust profile across sweeps.
How it works
flowchart LR
A[Your agent] -->|query| Q[qsearch]
Q -->|fan out| B[Brave Search API]
Q -->|fan out| S["SearXNG\n(Google, DDG, Brave, Qwant, β¦)"]
B -->|results| Q
S -->|results + engines[]| Q
Q -->|index by URL| C["Local corpus\n(Meilisearch + Qdrant)"]
C -->|trust score| Q
Q -->|re-ranked + full content + provenance| A
style C fill:#fde68a,stroke:#d97706,color:#000
style Q fill:#93c5fd,stroke:#2563eb,color:#000
style S fill:#86efac,stroke:#16a34a,color:#000
The yellow node is your private corpus. URLs found by 5 engines + 3 sweeps + 4 topics get a trust score that emerges naturally β no human ranking, no centralized authority, no cloud round-trip.
How qsearch compares
| Tavily | Exa | Serper | Brave API | SearXNG | qsearch | |
|---|---|---|---|---|---|---|
| Open source core | β | β | β | β | β | β |
| Full content (not snippets) | partial | partial | β | β | β | β |
| Multi-engine attribution | β | β | β | β | partial | β
(engines[]) |
| Persistent local corpus | β | β | β | β | β | β |
| Trust score per URL | β | β | β | β | β | β |
| Self-hostable | β | β | β | β | β | β |
| MCP-native | partial | β | β | β | β | β |
| BYOK upstream | β | β | β | N/A | β | β |
API β v0.4.0
Search endpoints
| Endpoint | Description | Backend |
|---|---|---|
POST /search | Web search + corpus first, trust-weighted re-rank | Brave or SearXNG |
POST /sweep | Batch search via SearXNG (with engines[]) | SearXNG |
POST /news | News search | Brave (requires key) |
POST /context | Deep page extraction | Brave (requires key) |
POST /index | Crawl URL or index local .md glob | Crawl4AI |
GET /trust/:url | Trust score + provenance for any URL in corpus | β |
GET /corpus/top | Top URLs ranked by trust (?limit=20&min_engines=3) | β |
GET /corpus/stats | Corpus size + counts | β |
GET /ui | Corpus browser β search, trust scores, provenance modal | β |
GET /health | Service status | β |
/search accepts: query, n_results (1β20), freshness (pd/pw/pm/py), search_lang, country, corpus_first (default true), corpus_only (default false).
/sweep accepts: text/plain body with label|query lines (one per line). Auto-indexes results into Meilisearch with engines[] and engine_count filterable.
Multi-engine attribution example
curl -X POST http://localhost:8080/sweep \
-H "Content-Type: text/plain" \
--data-binary $'t1|self-hosted search engine 2026\n'
Output excerpt (parsed_snippets.md):
**1. GitHub - searxng/searxng**
- URL: https://github.com/searxng/searxng
- Engines: google, duckduckgo, brave, qwant (count=4)
> A privacy-respecting, hackable metasearch engine...
**2. random-blog.io/seo-spam-2026**
- URL: https://random-blog.io/seo-spam-2026
- Engines: google (count=1)
> Best self-hosted search engines you must try...
URL #1 has engine_count=4 β found by 4 independent engines. URL #2 has engine_count=1 β found by only one. The trust signal is built into the data, not bolted on.
Filter by trust in Meilisearch
curl -H "Authorization: Bearer masterKey" \
"http://localhost:7700/indexes/qsearch_corpus/documents?filter=engine_count%20%3E%3D%203"
Returns only URLs found by 3+ engines β your high-trust subset.
MCP integration
Claude Code
Add to ~/.claude/settings.json:
{
"mcpServers": {
"qsearch": {
"type": "http",
"url": "http://localhost:8081"
}
}
}
Available tools:
mcp__qsearch__web_searchβ web search via Brave or SearXNGmcp__qsearch__sweepβ batch research sweep with multi-engine attributionmcp__qsearch__index_researchβ index local.mdfiles by globmcp__qsearch__news_searchβ news search (Brave key required)mcp__qsearch__context_searchβ deep page content (Brave key required)
Other MCP-over-HTTP clients
qsearch publishes Streamable HTTP transport at / on port :8081. Compatible with Claude Desktop (HTTP mode), OpenClaw, and any spec-compliant MCP client.
Stack
| Component | Tech |
|---|---|
| Runtime | Node.js β₯20 |
| Web search | Brave Search API (BYOK) |
| Meta-search | SearXNG (self-hosted, optional) |
| Full-text corpus | Meilisearch v1.7 |
| Vector corpus | Qdrant v1.17.1 (Linux/macOS bare-runtime; offline on Windows) |
| Crawler | Crawl4AI 0.8.6 (Python subprocess) |
| Embedder (optional) | llama.cpp /v1/embeddings server |
| LLM cleaner (optional) | Any GGUF model via llama.cpp or local inference |
| MCP | @modelcontextprotocol/sdk |
| License | Apache-2.0 |
Roadmap
| Version | Feature | When |
|---|---|---|
| v0.3.1 | Multi-engine engines[] attribution + dual sweep + corpus + MCP | shipped |
| v0.4.0 | Trust layer: /trust/:url, /corpus/top, /ui viewer, trust-weighted re-rank, sort/pagination, corpus merge-on-upsert, snippet sanitization | shipped |
| v0.5 | Launch: awesome list PRs, MCP Registry publish, Show HN, newsletter distribution | in progress |
| v0.6+ | Optional federation (research direction β no timeline until v0.5 validated) | open |
See docs/VISION.md for the full picture and why federation is research-direction-only until we can ship it without overpromise.
Honest trade-offs
- Cold start. First sweep takes 5β10 seconds (engine fan-out + corpus indexing). Best run as long-lived daemon.
- Vector search Windows-blocked. Qdrant requires bare-runtime; not all platforms supported. Full-text Meilisearch works everywhere.
- SearXNG rate limits. Self-host required β public instances get blocked by Google. Our docker-compose handles this.
engines[]requires SearXNG. Pure-Brave mode still works but loses the multi-engine signal.- Full content has latency cost. ~31s vs ~3s naive snippet retrieval (Bidirectional RAG study). qsearch makes this opt-in via
/contextendpoint.
Follow
- π Live demo: qsearch.pro
- β Star: github.com/theYahia/qsearch
- π¦ X: @TheTieTieTies
License
Apache-2.0 β see LICENSE. Independent. BYOK. Self-hostable. No vendor lock-in.
