LLM Use
LLM orchestration toolkit for agent workflows: planner + workers + synthesis, optional router (LLM + learned fallback), supports OpenAI/Anthropic/Ollama/llama.cpp, real scraping with caching, MCP server integration, and a TUI chat UI.
Installation
npx llm-useAsk AI about LLM Use
Powered by Claude · Grounded in docs
I know everything about LLM Use. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Universal LLM orchestrator for running a “planner + workers + synthesis” flow across multiple providers (Anthropic, OpenAI, Ollama, llama.cpp). It chooses between single‑shot or parallel execution, aggregates costs, and stores session logs locally.
Highlights
- Provider‑agnostic: mix cloud and local models.
- Cost tracking per run with a breakdown.
- Session history saved to
~/.llm-use/sessions. - Works fully offline with Ollama.
- Optional real web scraping + caching.
- Optional MCP server (via PolyMCP).
- TUI chat mode with live logs.
Requirements
- Python 3.10+
- Optional provider SDKs:
anthropic,openai requests(for Ollama HTTP calls)- Ollama installed and running for local models
- Optional:
beautifulsoup4for scraping - Optional:
polymcp+uvicornfor MCP server
Installation
pip install requests
# Optional: cloud providers
pip install anthropic openai
# Optional: Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Optional: scraping
pip install beautifulsoup4
# Optional: MCP server
pip install polymcp uvicorn
# Optional: Playwright (dynamic scraping)
pip install playwright
playwright install
# Install as a package (editable)
pip install -e .
Quick Start (Local Only)
ollama pull llama3.1:70b
ollama pull llama3.1:8b
python3 cli.py exec \
--orchestrator ollama:llama3.1:70b \
--worker ollama:llama3.1:8b \
--task "Research AI from 5 sources"
Quick Start (Hybrid)
export ANTHROPIC_API_KEY="sk-ant-..."
ollama pull llama3.1:8b
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--task "Compare 10 products"
TUI Chat
python3 cli.py chat \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b
MCP Server (PolyMCP)
python3 cli.py mcp \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--host 127.0.0.1 \
--port 8000
Install Extras (Helper)
python3 cli.py install --all
Usage
Basic
python3 cli.py exec \
--orchestrator <provider>:<model> \
--worker <provider>:<model> \
--task "your task"
Router (Cheap Model to Skip Orchestration)
python3 cli.py exec \
--router ollama:llama3.1:8b \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Explain TCP in 5 bullets"
Router via llama.cpp Local Path
python3 cli.py exec \
--router-path /path/to/your/router/model \
--llama-cpp-url http://localhost:8080 \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Explain TCP in 5 bullets"
If the router model fails or is unavailable, it falls back to a heuristic router.
Heuristic Router Rules (No Hardcoded Keywords)
By default the heuristic uses only length + URL signals. You can add your own patterns in router_rules.json (or set LLM_USE_ROUTER_RULES to a custom path).
Learned Router (Lightweight ML)
The router also learns from past tasks by storing (task, mode) pairs and using cosine similarity on token vectors. This is local, cheap, and improves routing over time. Clear the cache to reset (~/.llm-use/cache.sqlite).
Parallel Worker Control
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker anthropic:claude-3-5-haiku-20241022 \
--max-workers 8 \
--task "Summarize 20 documents"
Disable Cache
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--no-cache \
--task "Draft a brief memo"
Real Scraping (Workers)
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--enable-scrape \
--task "Find 3 sources about X and summarize them"
Dynamic Scraping (Playwright)
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--enable-scrape \
--scrape-backend playwright \
--task "Find 3 sources about X and summarize them"
Stats
python3 cli.py stats
Router Reset (Clear Learned Memory)
python3 cli.py router-reset
Router Export / Import
python3 cli.py router-export --out router_examples.json
python3 cli.py router-import --in router_examples.json
The export includes created timestamp and optional confidence if available.
Python Package
pip install -e .
llm-use exec --orchestrator ollama:llama3.1:70b --worker ollama:llama3.1:8b --task "Hello"
Concrete Examples (Agent Support)
These examples show how to use the orchestrator as the “brain” that delegates work to cheaper or local workers.
Multi‑source research with final synthesis
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker openai:gpt-4o-mini \
--task "Collect 8 reliable sources on X and produce a pros/cons summary"
Concurrent document analysis (agent brief)
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--max-workers 6 \
--task "Analyze 6 documents and return an executive brief with risks and opportunities"
Privacy‑first local pipeline (offline agent)
python3 cli.py exec \
--orchestrator ollama:qwen2.5:72b \
--worker ollama:mistral:7b \
--task "Extract requirements from internal notes and produce a checklist"
Brainstorm + validation (creative + critic)
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker ollama:llama3.1:8b \
--task "Generate 20 ideas, then pick the top 5 with brief rationale"
Best Practices for Agents
- Define the expected output format in the task (bullets, table, JSON).
- Avoid vague tasks: ask for decomposition and synthesis with clear criteria.
- Use cheaper workers for data gathering and a stronger orchestrator for synthesis.
- Set
--max-workersbased on rate limits and the number of subtasks. - For sensitive data, prefer Ollama or isolated environments.
File/CSV Examples (Prompt‑In‑File)
If your agent works on structured inputs, it helps to include the content directly in the prompt.
Summarize a local file
python3 cli.py exec \
--orchestrator openai:gpt-4o \
--worker openai:gpt-4o-mini \
--task "Summarize in 5 bullets the content of this file:\n\n$(cat notes.txt)"
CSV analysis (schema + insights)
python3 cli.py exec \
--orchestrator anthropic:claude-3-7-sonnet-20250219 \
--worker anthropic:claude-3-5-haiku-20241022 \
--task "Analyze the CSV below, describe the schema and 3 insights:\n\n$(cat data.csv)"
JSON output for agent pipelines
python3 cli.py exec \
--orchestrator ollama:llama3.1:70b \
--worker ollama:llama3.1:8b \
--task "Extract requirements in JSON with keys: title, priority, rationale:\n\n$(cat requirements.md)"
Providers and Models
The following model names are recognized out of the box. You can also pass custom models with provider:model.
Anthropic
claude-3-5-haiku-20241022claude-3-7-sonnet-20250219claude-4-opus-20250514
OpenAI
gpt-4o-minigpt-4oo1
Ollama
llama3.1:70bllama3.1:8bqwen2.5:72bmistral:7b
llama.cpp (OpenAI-compatible server)
Use llama_cpp:<model> with a llama.cpp server that exposes /v1/chat/completions.
Python API
from llm_use import Orchestrator, ModelConfig
orch = Orchestrator(
orchestrator=ModelConfig(name="llama3.1:70b", provider="ollama"),
worker=ModelConfig(name="llama3.1:8b", provider="ollama")
)
result = orch.execute("Your task")
print(f"Cost: ${result['cost']:.6f}")
print(result["output"])
Cost Notes
Costs are estimated using provider list prices per million tokens and token counts returned by the SDKs. For Ollama, cost is zero by default. Token usage for Ollama is estimated from word counts.
Troubleshooting
Ollama not found
ollama serve
ollama list
Missing API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
Testing
pip install pytest
pytest
License
MIT
