📦

FastMCP Multi LLM Tool

Updated for Code Changes

0 installs

Trust: 34 — Low

Ask AI about FastMCP Multi LLM Tool

I know everything about FastMCP Multi LLM Tool. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Multi-LLM FastMCP Orchestration System

This project demonstrates the Multi-Domain Local Server Architecture pattern using FastMCP 2.0, where multiple local Ollama LLMs act as specialized tools within a unified async context.

Architecture Overview

The system leverages FastMCP's in-memory transport to create "virtual microservices" - each LLM gets its own FastMCP server instance with dedicated tools, all orchestrated within a single Python process.

# Each LLM is a separate domain-specific server
gemma3_server = FastMCP("gemma3_LLM")
llama3_server = FastMCP("llama3.1_LLM")
qwen_server = FastMCP("qwen2.5_LLM")
deepseek_server = FastMCP("deepseek_LLM")

# All work together in shared async context
async with gemma3_client, llama3_client, qwen_client, deepseek_client:
    # Orchestrate multiple models simultaneously

Key Features

Zero Network Overhead: All communication happens in-memory
Shared Async Context: All LLM servers alive simultaneously
Parallel Execution: Run multiple models concurrently
Clean Separation: Each model has its own semantic boundary
FastAPI Integration: REST API wrapper for web access

Available Models

Gemma3 4B: Fast, efficient for quick analysis
Llama 3.1 8B: Balanced performance and quality
Qwen 2.5 14B: Strong instruction following
DeepSeek R1 7B: Advanced reasoning capabilities

Quick Start

Ensure Ollama is running with required models:

ollama list  # Should show gemma3:4b, llama3.1:8b, qwen2.5:14b-instruct-q4_K_M, deepseek-r1:7b

Install dependencies:

pip install -r requirements.txt

Run the orchestration demo:

python multi_llm_fastmcp.py

Start the FastAPI server:

python fastapi_multi_llm.py

Test async coordination:

python test_multi_llm.py

API Endpoints

GET / - API overview
GET /models - List available LLM models
GET /prompts - List available prompt types
GET /resources - List available data resources
POST /analyze - Analyze data with specified models
POST /compare - Compare all models on same task
POST /consensus - Get multi-model consensus
GET /health - Check all LLM server health

Example Usage

# Analyze medical data with multiple models
curl -X POST http://localhost:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "prompt_type": "medical_analyst",
    "resource_uri": "resource://medical/patient-123",
    "models": ["gemma3", "deepseek"]
  }'

# Compare all models on code review
curl -X POST http://localhost:8000/compare \
  -H "Content-Type: application/json" \
  -d '{
    "prompt_type": "code_reviewer",
    "resource_uri": "resource://code/sample-function"
  }'

Architecture Benefits

Transactional Semantics: If any client fails, all contexts clean up together
Natural Orchestration: Weave between different models in natural Python flow
Type Safety: Full Pydantic validation throughout
Scalability: Easy to add new models or domains

Project Structure

/
├── main.py                    # Original FastAPI shell
├── multi_llm_fastmcp.py      # Core multi-LLM orchestration
├── fastapi_multi_llm.py      # FastAPI REST wrapper
├── test_multi_llm.py         # Async coordination tests
├── requirements.txt          # Python dependencies
└── README.md                # This file

Future Enhancements

Add more specialized prompts for different domains
Integrate with MCP servers from .claude.json
Add streaming responses
Implement model voting/consensus algorithms
Add performance monitoring and metrics