AI Agent Prod Deployment
From MCP server to GPU-accelerated AI agent | A Google Cloud deployment series using FastMCP, ADK, Ollama, and Gemma 3.
Ask AI about AI Agent Prod Deployment
Powered by Claude Β· Grounded in docs
I know everything about AI Agent Prod Deployment. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
AI Agent from Production to Deployment
A 3-part project built with Google Cloud, showing how to go from a simple MCP server all the way to a GPU-accelerated AI agent, which is all deployed on Google Cloud Run.
Each part is self-contained with its own README, but they build on each other.
What is Built
| Part | Folder | What is built | Key tech |
|---|---|---|---|
| 1 | 1-mcp-server/ | A secure, production-ready MCP server that exposes zoo animal data as tools for LLMs | FastMCP, Cloud Run, Gemini CLI |
| 2 | 2-adk-agent/ | A multi-agent zoo tour guide that uses the MCP server + Wikipedia | ADK, SequentialAgent, LangChain, Cloud Run |
| 3 | 3-gpu-agent/ | A GPU-accelerated Gemma agent with elasticity testing | Ollama, Gemma 3 270M, NVIDIA L4, Cloud Run |
Architecture Overview
Prerequisites
Before starting, make sure you have:
- A Google Cloud account with billing enabled (free trial available)
- The gcloud CLI installed and authenticated
- Python 3.13+
- uv β the Python package manager used throughout (install)
All parts can run in Google Cloud Shell, which comes with all of the above pre-installed. Click here to open it.
Getting Started
Clone the repo and navigate into it:
git clone https://github.com/YOUR_USERNAME/zoo-mcp-on-cloudrun.git
cd zoo-mcp-on-cloudrun
Then start with Part 1:
cd 1-mcp-server
cat README.md
Cost Estimate
| Part | Approximate cost |
|---|---|
| Part 1 | < $1 USD |
| Part 2 | < $1 USD |
| Part 3 | ~$2β4/hr while GPU is running (NVIDIA L4) |
Each part README includes a clean up section to delete resources and avoid ongoing charges.
Project at a Glance
Part 1 β Deploying a secure MCP server on Cloud Run
Building a Model Context Protocol server using FastMCP that exposes zoo animal data as tools. Deploying it to Cloud Run with authentication required, then connecting to it using Gemini CLI.
Concepts involved: MCP concepts, FastMCP, deploying from source on Cloud Run, IAM-based auth, service accounts.
β Go to Part 1
Part 2 β Building and Deploying an ADK Agent that uses an MCP Server
Building a multi-agent zoo tour guide using Google's Agent Development Kit (ADK). The agent uses the MCP server from Part 1 as its toolset, augmented with the Wikipedia API for general knowledge. Deploying the agent to Cloud Run.
Concepts involved: ADK agents, SequentialAgent, MCPToolset, LangchainTool, state management, adk deploy.
β Go to Part 2
Part 3 β Deploying an ADK agent to Cloud Run with GPU
Deploying a GPU-accelerated Gemma 3 model via Ollama on Cloud Run, then wiring it up to an ADK agent. Running elasticity tests with Locust to observe how both services handle load independently.
Concepts involved: GPU on Cloud Run, Ollama, LiteLlm, FastAPI + ADK, Locust load testing, autoscaling behavior.
β Go to Part 3
Project Structure
ai-agent-prod-deployment/
βββ README.md # You are here
βββ .gitignore
βββ 1-mcp-server/
β βββ README.md
β βββ server.py # FastMCP zoo server with 2 tools
β βββ Dockerfile
β βββ pyproject.toml
βββ 2-adk-agent/
β βββ README.md
β βββ zoo_guide_agent/
β β βββ __init__.py
β β βββ agent.py # Multi-agent workflow (greeter β researcher β formatter)
β βββ requirements.txt
β βββ .env.example
βββ 3-gpu-agent/
βββ README.md
βββ ollama-backend/
β βββ Dockerfile # Gemma 3 270M via Ollama
βββ adk-agent/
βββ production_agent/
β βββ __init__.py
β βββ agent.py # Gemma-powered conversational agent
βββ server.py # FastAPI server with ADK integration
βββ Dockerfile
βββ elasticity_test.py # Locust load test
βββ pyproject.toml
βββ .env.example
