Gemini Tts MCP
ποΈ High-fidelity Text-to-Speech (TTS) for Gemini CLI & MCP. Powered by Gemini 2.5 Flash/Pro with 30+ premium voices and natural language style control.
Ask AI about Gemini Tts MCP
Powered by Claude Β· Grounded in docs
I know everything about Gemini Tts MCP. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
Gemini TTS MCP Server
An MCP (Model Context Protocol) server that brings high-quality Text-to-Speech (TTS) capabilities to your AI assistants using Google's Gemini 2.5 TTS models.
Features
- High-Fidelity Audio: Generate natural-sounding speech using Gemini 2.5 Flash and Pro TTS preview models.
- Natural Language Control: Guide the voice style using natural language (e.g., "Say cheerfully", "Whisper softly").
- Multiple Voices: Choose from 30+ prebuilt voices (Aoede, Zephyr, Puck, Charon, and more).
- Direct Local Saving: Optionally save the generated audio directly to a
.wavfile on your machine. - Base64 Output: Returns raw audio data for flexible handling by the AI client.
Setup
Prerequisites
- Gemini API Key: Obtain an API key from the Google AI Studio.
- Node.js: Ensure you have Node.js 18+ installed.
Installation as Gemini CLI Extension
You can install this directly into Gemini CLI:
gemini extensions install https://github.com/notsointresting/gemini-tts-mcp
The CLI will prompt you for your GEMINI_API_KEY during installation and store it securely in your system keychain.
Manual Installation (Development)
-
Clone this repository:
git clone https://github.com/notsointresting/gemini-tts-mcp.git cd gemini-tts-mcp -
Install dependencies:
npm install -
Build the project:
npm run build -
Install locally:
gemini extensions install .
Configuration (Standalone MCP)
The server requires the GEMINI_API_KEY environment variable to be set.
For Claude Desktop
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"gemini-tts": {
"command": "node",
"args": ["/path/to/gemini-tts-mcp/build/index.js"],
"env": {
"GEMINI_API_KEY": "your_api_key_here"
}
}
}
}
Rate Limits
Gemini TTS models are currently in Preview, and rate limits are more restrictive than standard text models.
Gemini API (Google AI Studio)
Limits vary by usage tier. Below are typical preview limits:
| Tier | Requests Per Minute (RPM) | Tokens Per Minute (TPM) | Requests Per Day (RPD) |
|---|---|---|---|
| Free | 2 RPM | 32,000 TPM | 50 RPD |
| Tier 1 | 5 RPM | 100,000 TPM | 2,000 RPD |
| Tier 2 | 10 RPM | 200,000 TPM | 5,000 RPD |
| Tier 3 | 15 RPM | 500,000 TPM | 10,000 RPD |
Check your active limits in Google AI Studio Settings.
Vertex AI (Google Cloud)
Vertex AI generally provides higher base quotas:
- Requests: 60 RPM (per project).
- Input Size: Max 4,000 bytes per text field.
- Output Duration: Max 655 seconds (~10.9 minutes) of audio per request.
Tools
1. generate_speech
Generates speech from the provided text.
Parameters:
text(required): The text to convert to speech.voice(optional): The prebuilt voice name (default:Aoede).model(optional): The Gemini model to use (gemini-2.5-flash-preview-ttsorgemini-2.5-pro-preview-tts).outputPath(optional): Local path to save the.wavfile (e.g.,./output.wav).
Example Usage: "Generate speech for 'Hello world' using the voice Zephyr and save it to hello.wav"
2. list_voices
Lists all available prebuilt voices supported by the server.
Supported Voices
Aoede, Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Callirhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafar.
License
MIT
