π¦
FastMCP Multi LLM Tool
Updated for Code Changes
0 installs
Trust: 34 β Low
Ai
Ask AI about FastMCP Multi LLM Tool
Powered by Claude Β· Grounded in docs
I know everything about FastMCP Multi LLM Tool. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
Multi-LLM FastMCP Orchestration System
This project demonstrates the Multi-Domain Local Server Architecture pattern using FastMCP 2.0, where multiple local Ollama LLMs act as specialized tools within a unified async context.
Architecture Overview
The system leverages FastMCP's in-memory transport to create "virtual microservices" - each LLM gets its own FastMCP server instance with dedicated tools, all orchestrated within a single Python process.
# Each LLM is a separate domain-specific server
gemma3_server = FastMCP("gemma3_LLM")
llama3_server = FastMCP("llama3.1_LLM")
qwen_server = FastMCP("qwen2.5_LLM")
deepseek_server = FastMCP("deepseek_LLM")
# All work together in shared async context
async with gemma3_client, llama3_client, qwen_client, deepseek_client:
# Orchestrate multiple models simultaneously
Key Features
- Zero Network Overhead: All communication happens in-memory
- Shared Async Context: All LLM servers alive simultaneously
- Parallel Execution: Run multiple models concurrently
- Clean Separation: Each model has its own semantic boundary
- FastAPI Integration: REST API wrapper for web access
Available Models
- Gemma3 4B: Fast, efficient for quick analysis
- Llama 3.1 8B: Balanced performance and quality
- Qwen 2.5 14B: Strong instruction following
- DeepSeek R1 7B: Advanced reasoning capabilities
Quick Start
- Ensure Ollama is running with required models:
ollama list # Should show gemma3:4b, llama3.1:8b, qwen2.5:14b-instruct-q4_K_M, deepseek-r1:7b
- Install dependencies:
pip install -r requirements.txt
- Run the orchestration demo:
python multi_llm_fastmcp.py
- Start the FastAPI server:
python fastapi_multi_llm.py
- Test async coordination:
python test_multi_llm.py
API Endpoints
GET /- API overviewGET /models- List available LLM modelsGET /prompts- List available prompt typesGET /resources- List available data resourcesPOST /analyze- Analyze data with specified modelsPOST /compare- Compare all models on same taskPOST /consensus- Get multi-model consensusGET /health- Check all LLM server health
Example Usage
# Analyze medical data with multiple models
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{
"prompt_type": "medical_analyst",
"resource_uri": "resource://medical/patient-123",
"models": ["gemma3", "deepseek"]
}'
# Compare all models on code review
curl -X POST http://localhost:8000/compare \
-H "Content-Type: application/json" \
-d '{
"prompt_type": "code_reviewer",
"resource_uri": "resource://code/sample-function"
}'
Architecture Benefits
- Transactional Semantics: If any client fails, all contexts clean up together
- Natural Orchestration: Weave between different models in natural Python flow
- Type Safety: Full Pydantic validation throughout
- Scalability: Easy to add new models or domains
Project Structure
/
βββ main.py # Original FastAPI shell
βββ multi_llm_fastmcp.py # Core multi-LLM orchestration
βββ fastapi_multi_llm.py # FastAPI REST wrapper
βββ test_multi_llm.py # Async coordination tests
βββ requirements.txt # Python dependencies
βββ README.md # This file
Future Enhancements
- Add more specialized prompts for different domains
- Integrate with MCP servers from .claude.json
- Add streaming responses
- Implement model voting/consensus algorithms
- Add performance monitoring and metrics
