🔍

qsearch

Multi-engine search for AI agents. Trust scoring, local corpus, MCP-native. Self-hostable, BYOK.

0 installs

Trust: 34 — Low

Ask AI about qsearch

I know everything about qsearch. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

qsearch

I built this for my own daily research. After running 100+ research sprints, my agent kept hallucinating because it read 200-char snippets. qsearch gives it full content with multi-engine provenance — running locally, owned by me.

AI agents lose 17–33% of facts to hallucination because they read 200-character snippets, not full pages (Stanford 2024). Existing search APIs hide which engines agreed on a result. Existing knowledge graphs are enterprise-priced or vendor-locked.

qsearch is the open-source search layer that gives agents full content with multi-engine provenance — running on your machine, owned by you, ready for MCP today.

✅ v0.4.0 live at qsearch.pro. Multi-engine attribution, trust corpus with per-URL provenance (engines[], sweep_count, trust_score), corpus viewer at /ui, MCP-over-HTTP for Claude Code and any spec-compliant client. 📖 Vision: docs/VISION.md · Technical spec: docs/TRUST_MESH.md · Architecture: docs/FEDERATION_ARCHITECTURE.md

Quick start

# 1. Clone
git clone https://github.com/theYahia/qsearch.git
cd qsearch

# 2. Get a Brave Search API key (BYOK, $5/mo for ~1000 queries)
#    → https://brave.com/search/api/ → sign up → copy key

# 3. Configure
cp .env.example .env.local
# Set BRAVE_API_KEY=your_key
# Set SEARXNG_URL=http://localhost:8888 (for multi-engine attribution)

# 4. Start infrastructure (Meilisearch + Qdrant + SearXNG)
docker compose up -d

# 5. Install & run
npm install
npm start            # → qsearch v0.4.0 on http://localhost:8080

# 6. (Optional) MCP server for Claude Code / Workbench / OpenClaw
npm run start:mcp    # → http://0.0.0.0:8081

# 7. Test multi-engine attribution
curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine\n'
# → parsed_snippets.md with "Engines: google, duckduckgo, brave (count=3)"

BYOK design: Brave key + SearXNG instance both stay on your machine. No data exfiltration.

How I use it daily

Every research sprint I run a dual sweep:

# Brave sweep (primary, authoritative)
python research/scripts/brave_sweep.py queries.txt _raw_data/topic_2026-04-28/brave/

# qsearch sweep (secondary, auto-indexes into corpus)
curl -X POST http://localhost:8080/sweep?topic=my_topic \
  -H "Content-Type: text/plain" --data-binary @queries.txt

After 10+ sprints on the same domain, /corpus/top?min_engines=3 shows which URLs survived multiple independent search engines across multiple sessions. Those are the ones I actually trust.

Why qsearch exists

Every AI agent today hits the same broken loop:

Agent → Tavily/Exa/Serper API → 200-char snippets → hallucinated answer

Three failures:

Snippets aren't enough. Stanford's 2024 production RAG audit measured 17–33% hallucination on Lexis+ AI and Westlaw despite "hallucination-free" claims. On Wikipedia QA, full content beats snippet-RAG by +7.3pp (arxiv 2501.01880).
No trust signal. Search APIs return ranked lists without telling you which engines agreed. SEO-spam at position 3 looks identical to authoritative source at position 4.
No memory. Every search starts from zero. The same trash gets surfaced again. The same authority goes unrecognized.

qsearch addresses all three:

Full content fetched and cleaned, not just snippets.
engines[] field per result — Google + DDG + Brave + Qwant + Startpage attribution exposed (via SearXNG aggregation).
Local corpus accumulates — every URL grows a trust profile across sweeps.

How it works

flowchart LR
    A[Your agent] -->|query| Q[qsearch]
    Q -->|fan out| B[Brave Search API]
    Q -->|fan out| S["SearXNG\n(Google, DDG, Brave, Qwant, …)"]
    B -->|results| Q
    S -->|results + engines[]| Q
    Q -->|index by URL| C["Local corpus\n(Meilisearch + Qdrant)"]
    C -->|trust score| Q
    Q -->|re-ranked + full content + provenance| A

    style C fill:#fde68a,stroke:#d97706,color:#000
    style Q fill:#93c5fd,stroke:#2563eb,color:#000
    style S fill:#86efac,stroke:#16a34a,color:#000

The yellow node is your private corpus. URLs found by 5 engines + 3 sweeps + 4 topics get a trust score that emerges naturally — no human ranking, no centralized authority, no cloud round-trip.

How qsearch compares

	Tavily	Exa	Serper	Brave API	SearXNG	qsearch
Open source core	❌	❌	❌	❌	✅	✅
Full content (not snippets)	partial	partial	❌	❌	❌	✅
Multi-engine attribution	❌	❌	❌	❌	partial	✅ (`engines[]`)
Persistent local corpus	❌	❌	❌	❌	❌	✅
Trust score per URL	❌	❌	❌	❌	❌	✅
Self-hostable	❌	❌	❌	❌	✅	✅
MCP-native	partial	✅	❌	✅	❌	✅
BYOK upstream	❌	❌	❌	N/A	✅	✅

API — v0.4.0

Search endpoints

Endpoint	Description	Backend
`POST /search`	Web search + corpus first, trust-weighted re-rank	Brave or SearXNG
`POST /sweep`	Batch search via SearXNG (with `engines[]`)	SearXNG
`POST /news`	News search	Brave (requires key)
`POST /context`	Deep page extraction	Brave (requires key)
`POST /index`	Crawl URL or index local `.md` glob	Crawl4AI
`GET /trust/:url`	Trust score + provenance for any URL in corpus	—
`GET /corpus/top`	Top URLs ranked by trust (`?limit=20&min_engines=3`)	—
`GET /corpus/stats`	Corpus size + counts	—
`GET /ui`	Corpus browser — search, trust scores, provenance modal	—
`GET /health`	Service status	—

/search accepts: query, n_results (1–20), freshness (pd/pw/pm/py), search_lang, country, corpus_first (default true), corpus_only (default false).

/sweep accepts: text/plain body with label|query lines (one per line). Auto-indexes results into Meilisearch with engines[] and engine_count filterable.

Multi-engine attribution example

curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine 2026\n'

Output excerpt (parsed_snippets.md):

**1. GitHub - searxng/searxng**
- URL: https://github.com/searxng/searxng
- Engines: google, duckduckgo, brave, qwant (count=4)
  > A privacy-respecting, hackable metasearch engine...

**2. random-blog.io/seo-spam-2026**
- URL: https://random-blog.io/seo-spam-2026
- Engines: google (count=1)
  > Best self-hosted search engines you must try...

URL #1 has engine_count=4 — found by 4 independent engines. URL #2 has engine_count=1 — found by only one. The trust signal is built into the data, not bolted on.

Filter by trust in Meilisearch

curl -H "Authorization: Bearer masterKey" \
  "http://localhost:7700/indexes/qsearch_corpus/documents?filter=engine_count%20%3E%3D%203"

Returns only URLs found by 3+ engines — your high-trust subset.

MCP integration

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "qsearch": {
      "type": "http",
      "url": "http://localhost:8081"
    }
  }
}

Available tools:

mcp__qsearch__web_search — web search via Brave or SearXNG
mcp__qsearch__sweep — batch research sweep with multi-engine attribution
mcp__qsearch__index_research — index local .md files by glob
mcp__qsearch__news_search — news search (Brave key required)
mcp__qsearch__context_search — deep page content (Brave key required)

Other MCP-over-HTTP clients

qsearch publishes Streamable HTTP transport at / on port :8081. Compatible with Claude Desktop (HTTP mode), OpenClaw, and any spec-compliant MCP client.

Stack

Component	Tech
Runtime	Node.js ≥20
Web search	Brave Search API (BYOK)
Meta-search	SearXNG (self-hosted, optional)
Full-text corpus	Meilisearch v1.7
Vector corpus	Qdrant v1.17.1 (Linux/macOS bare-runtime; offline on Windows)
Crawler	Crawl4AI 0.8.6 (Python subprocess)
Embedder (optional)	llama.cpp `/v1/embeddings` server
LLM cleaner (optional)	Any GGUF model via llama.cpp or local inference
MCP	`@modelcontextprotocol/sdk`
License	Apache-2.0

Roadmap

Version	Feature	When
v0.3.1	Multi-engine `engines[]` attribution + dual sweep + corpus + MCP	shipped
v0.4.0	Trust layer: `/trust/:url`, `/corpus/top`, `/ui` viewer, trust-weighted re-rank, sort/pagination, corpus merge-on-upsert, snippet sanitization	shipped
v0.5	Launch: awesome list PRs, MCP Registry publish, Show HN, newsletter distribution	in progress
v0.6+	Optional federation (research direction — no timeline until v0.5 validated)	open

See docs/VISION.md for the full picture and why federation is research-direction-only until we can ship it without overpromise.

Honest trade-offs

Cold start. First sweep takes 5–10 seconds (engine fan-out + corpus indexing). Best run as long-lived daemon.
Vector search Windows-blocked. Qdrant requires bare-runtime; not all platforms supported. Full-text Meilisearch works everywhere.
SearXNG rate limits. Self-host required — public instances get blocked by Google. Our docker-compose handles this.
engines[] requires SearXNG. Pure-Brave mode still works but loses the multi-engine signal.
Full content has latency cost. ~31s vs ~3s naive snippet retrieval (Bidirectional RAG study). qsearch makes this opt-in via /context endpoint.

Follow

🌐 Live demo: qsearch.pro
⭐ Star: github.com/theYahia/qsearch
🐦 X: @TheTieTieTies

License

Apache-2.0 — see LICENSE. Independent. BYOK. Self-hostable. No vendor lock-in.