gemini-image-video-mcp
Gemini Image and Video Generator
Ask AI about gemini-image-video-mcp
Powered by Claude Β· Grounded in docs
I know everything about gemini-image-video-mcp. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Gemini AI MCP Server
A comprehensive Model Context Protocol (MCP) server that integrates Google's Gemini AI capabilities for image generation, video creation, text processing, and image analysis. Built specifically for smithery.ai deployment with full TypeScript support.
π Features
π¨ Image Generation
- Nano Banana Model (Gemini 2.5 Flash Image) - Fast, efficient image generation
- Imagen 3/4 Models - High-fidelity, realistic image creation
- Multiple Styles - Natural, artistic, photorealistic, cartoon, anime
- Flexible Aspect Ratios - 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9
- Batch Processing - Generate multiple images simultaneously
- Image Editing - Edit existing images with text prompts
π¬ Video Generation
- Veo 3.1 Model - State-of-the-art video generation
- High Quality Output - 720p and 1080p resolution options
- Flexible Duration - 4-8 second videos
- Multiple Styles - Natural, cinematic, artistic, animation
- Image-to-Video - Transform static images into dynamic videos
- Camera Movements - Static, pan, zoom, tilt, tracking
- Batch Video Generation - Create multiple videos efficiently
π Text Processing
- Gemini 2.5 Flash - Advanced language model for text generation
- Customizable Parameters - Temperature, max tokens, top-p, top-k
- Multiple Use Cases - Creative writing, analysis, summaries
- Image Analysis - Computer vision capabilities with OCR
- Batch Analysis - Process multiple images simultaneously
ποΈ Media Management
- Reference Image Upload - Upload and register images for future use
- Media Library - List and manage all generated content
- Download Management - Download individual or batch media files
- Organized Storage - Categorize and tag media items
- Metadata Support - Rich metadata for better organization
π οΈ Installation
Prerequisites
- Node.js 18+
- npm or yarn
- Google Gemini API key
Quick Setup
-
Clone or download the project
cd gemini-mcp-server -
Install dependencies
npm install -
Set your API key
export GEMINI_API_KEY="your-gemini-api-key-here" -
Build the project
npm run build -
Start the server
./run.sh
π§ Deployment to smithery.ai
Method 1: Direct Deployment
-
Upload to smithery.ai
- Upload the project directory to smithery.ai
- The platform will automatically detect the
mcp-server.jsonconfiguration - Set the
GEMINI_API_KEYenvironment variable in smithery.ai
-
Test the deployment
- Use smithery.ai's testing interface
- Verify all tools are available and functional
Method 2: Local Testing
-
Set environment variables
export GEMINI_API_KEY="your-api-key" export LOG_LEVEL=debug -
Run locally
./run.sh
π Available Tools
Image Generation Tools
generate_image_nano_banana
Generate images using the Nano Banana model for fast, efficient results.
{
"prompt": "A serene mountain landscape at sunset",
"style": "natural",
"aspectRatio": "16:9",
"quality": "standard"
}
generate_image_imagen
Create high-quality images using Imagen 3/4 models.
{
"prompt": "A photorealistic portrait of a person reading a book",
"model": "imagen4",
"style": "photorealistic",
"quality": "high"
}
batch_generate_images
Generate multiple images in one request.
{
"model": "nano-banana",
"items": [
{
"prompt": "A red apple on a white background",
"style": "photorealistic"
},
{
"prompt": "A green apple on a white background",
"style": "photorealistic"
}
]
}
edit_image_with_prompt
Edit existing images using AI.
{
"imageUrl": "https://example.com/image.jpg",
"editType": "modify",
"prompt": "Change the background to a beach setting"
}
Video Generation Tools
generate_video_veo
Generate videos using Veo 3.1.
{
"prompt": "A cat playing in a sunny garden",
"duration": 8,
"resolution": "1080p",
"style": "natural"
}
image_to_video
Convert images to videos.
{
"imageUrl": "https://example.com/portrait.jpg",
"prompt": "The person turns their head slightly to the left",
"cameraMovement": "pan",
"duration": 6
}
batch_generate_videos
Generate multiple videos efficiently.
{
"items": [
{
"prompt": "Ocean waves on a beach",
"style": "cinematic"
},
{
"prompt": "Forest with falling leaves",
"style": "artistic"
}
]
}
Media Management Tools
upload_reference_image
Upload images for future use.
{
"imageUrl": "https://example.com/style-image.jpg",
"title": "Art Style Reference",
"description": "Impressionist art style for reference",
"category": "art"
}
list_generated_media
Browse your media library.
{
"limit": 10,
"offset": 0
}
download_media
Download generated content.
{
"mediaId": "gemini_12345_example"
}
delete_media
Remove media files.
{
"mediaId": "gemini_12345_example",
"confirmDelete": true
}
Text Processing Tools
generate_text
Generate text content.
{
"prompt": "Write a creative story about AI and creativity",
"model": "gemini-2.5-flash",
"temperature": 0.8,
"maxTokens": 1000
}
analyze_image
Analyze image content.
{
"imageUrl": "https://example.com/photo.jpg",
"analysisType": "analyze",
"prompt": "Identify the main objects and describe the scene"
}
batch_analyze_images
Analyze multiple images.
{
"imageUrls": [
"https://example.com/img1.jpg",
"https://example.com/img2.jpg"
],
"analysisType": "describe",
"comparisonMode": true
}
ποΈ Architecture
Project Structure
gemini-mcp-server/
βββ src/
β βββ index.ts # Main server implementation
β βββ constants.ts # Configuration and schemas
β βββ gemini-client.ts # Gemini API integration
β βββ logger.ts # Logging utilities
β βββ tools/ # Tool implementations
β βββ index.ts # Tool registry
β βββ registry.ts # Tool registration logic
β βββ image-generation.ts # Image tools
β βββ video-generation.ts # Video tools
β βββ media-management.ts # Media tools
β βββ text-processing.ts # Text tools
βββ dist/ # Compiled JavaScript
βββ package.json # Dependencies and scripts
βββ tsconfig.json # TypeScript configuration
βββ run.sh # Startup script
βββ mcp-server.json # Smithery.ai configuration
βββ README.md # This file
Key Components
- GeminiAPIClient - Handles all communication with Google's Gemini API
- Tool Registry - Centralized tool registration and execution
- Progress Tracking - Real-time updates for long-running operations
- Error Handling - Comprehensive error management and reporting
- Media Management - Storage and organization of generated content
π Environment Variables
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY | Yes | Your Google Gemini API key |
LOG_LEVEL | No | Logging level (debug, info, warn, error) |
NODE_ENV | No | Environment (development, production) |
π§ͺ Testing
Manual Testing
- Start the server:
./run.sh - Use any MCP-compatible client to connect
- Test individual tools with various parameters
- Verify error handling and edge cases
API Health Check
The server includes a built-in health check that verifies Gemini API connectivity.
π‘οΈ Security Considerations
- API keys are never logged or stored permanently
- All user inputs are validated using Zod schemas
- Network requests have timeout and retry mechanisms
- Error messages don't expose sensitive information
π API Models Used
Image Generation
gemini-2.5-flash-image- Nano Banana model for fast generationimagen-3.0-generate-002- Imagen 3 for high qualityimagen-4.0-generate-preview-06-06- Imagen 4 for best quality
Video Generation
veo-3.1-generate- Veo 3.1 for standard video generationveo-3.1-fast-generate- Veo 3.1 Fast for quicker results
Text Generation
gemini-2.5-flash- Latest Gemini 2.5 Flash modelgemini-2.0-flash-exp- Experimental Gemini 2.0 Flash
π€ Contributing
This MCP server is designed to be extensible. To add new tools:
- Create a new tool file in
src/tools/ - Implement the tool using the
UnifiedToolinterface - Register the tool using
registerTool() - Add comprehensive documentation and examples
π License
MIT License - see LICENSE file for details.
π Resources
π Support
For issues, questions, or contributions:
- Check the documentation above
- Review the tool examples
- Test with different parameters
- Verify your API key has appropriate permissions
Built with β€οΈ for the MCP community and smithery.ai
