GitRag
No description available
Ask AI about GitRag
Powered by Claude Β· Grounded in docs
I know everything about GitRag. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
title: GitRag Free MCP emoji: π sdk: docker app_port: 7860 suggested_hardware: cpu-basic startup_duration_timeout: 1h models:
- Qwen/Qwen3-Embedding-0.6B
- jinaai/jina-reranker-v3
GitRag
GitRag is a Git-aware code indexing and retrieval toolkit.
It focuses on two jobs:
- indexing repository content into a vector-capable database
- retrieving relevant code snippets, with optional repo/branch filtering and MCP exposure
The current codebase includes:
- a Git-based indexer that can run in full or delta mode
- Tree-sitter-based chunking for code and structured text handling for common data files
- libSQL and PostgreSQL persistence backends
- a retriever with optional reranking
- an MCP server that exposes a
search_codetool
What GitRag does well
- Indexes Git changes instead of blindly reprocessing everything on every run
- Understands deletions and removes deleted paths from the index
- Supports full re-indexing with resume-friendly behavior when combined with remote persistence
- Filters retrieval by repo and branch when you want narrower search scope
- Skips ignored paths via
GITRAG_IGNORE - Can serve search over MCP with authentication enabled by default
Configuration
GitRag selects its persistence backend with DB_PROVIDER.
libSQL
Use DB_PROVIDER=libsql and provide:
DATABASE_URL(the code also acceptsTURSO_DATABASE_URL)DB_AUTH_TOKEN(the code also acceptsTURSO_AUTH_TOKEN)
PostgreSQL
Use DB_PROVIDER=postgres and provide:
DATABASE_URLDB_AUTH_TOKEN
Excluding files
Use GITRAG_IGNORE with a comma- or semicolon-separated list of glob patterns.
Example:
export GITRAG_IGNORE="dist/**,build/**,*.min.js"
Indexing behavior
The indexer accepts:
- a required
repoidentifier --fullfor a full scan--branchfor branch-aware indexing--from-shaand--to-shafor delta indexing across a commit range
For long runs, GitRag also supports a soft timeout via the SOFT_TIMEOUT environment variable. When the timeout is
exceeded, the indexer exits with code 75, which is useful for retry-based workflows.
GitHub Action
The repository includes a composite GitHub Action in action.yml for running the indexer.
Basic example:
name: GitRag Indexing
on:
push:
branches: ["master"]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dorphalsig/gitrag@master
with:
repo: ${{ github.repository }}
db_provider: libsql
database_url: ${{ secrets.TURSO_DATABASE_URL }}
db_auth_token: ${{ secrets.TURSO_AUTH_TOKEN }}
branch: ${{ github.ref_name }}
Optional inputs supported by the action:
full_indexsoft_timeout
If full_index is not enabled, the action passes the Git commit range to the indexer so it can process changes
incrementally.
MCP server
GitRag ships an MCP server package that exposes a search_code tool backed by the retriever.
The current server implementation creates a FastMCP server and returns:
- structured result objects
- a markdown rendering of the matched snippets
- XML-safe formatted snippets for downstream consumers
MCP auth setup
Authentication is enabled by default.
To run the authenticated MCP server, set:
SCALEKIT_ENVIRONMENT_URLSCALEKIT_CLIENT_IDSCALEKIT_RESOURCE_ID
Optional:
MCP_BASE_URLβ overrides the callback/base URL used by the Scalekit provider. If unset, GitRag falls back tohttp://127.0.0.1:8000/mcp.
Auth behavior is controlled like this:
GITRAG_MCP_DISABLE_AUTH=1disables auth- otherwise,
GITRAG_MCP_REQUIRE_AUTH=true|falsecan explicitly enable or disable auth - if neither variable is set, auth stays enabled
For local development, disabling auth can be useful while wiring up clients or tests. It should not be treated as the default production setup.
Minimal server wiring
from gitrag_mcp_server.server import create_mcp_server
from Retriever import Retriever
retriever = Retriever(...)
mcp = create_mcp_server(retriever=retriever)
mcp.run(transport="sse", port=8000, host="0.0.0.0")
Retrieval
The retriever returns the top matching chunks for a query. It can:
- search with vector + text persistence backends
- limit results with
top_k - scope results by
repoandbranch - configurable reranking model via
rerankerenviroment variable (defaults tojinaai/jina-reranker-v3)
Supported content
Current tests and code cover:
- source files chunked through Tree-sitter
- markdown
- JSON / JSONL
- YAML
- XML
- TOML
License
MIT.
