📦

Gemini Tts MCP

🎙️ High-fidelity Text-to-Speech (TTS) for Gemini CLI & MCP. Powered by Gemini 2.5 Flash/Pro with 30+ premium voices and natural language style control.

0 installs

Trust: 34 — Low

Content

Ask AI about Gemini Tts MCP

I know everything about Gemini Tts MCP. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

Gemini TTS MCP Server

An MCP (Model Context Protocol) server that brings high-quality Text-to-Speech (TTS) capabilities to your AI assistants using Google's Gemini 2.5 TTS models.

Features

High-Fidelity Audio: Generate natural-sounding speech using Gemini 2.5 Flash and Pro TTS preview models.
Natural Language Control: Guide the voice style using natural language (e.g., "Say cheerfully", "Whisper softly").
Multiple Voices: Choose from 30+ prebuilt voices (Aoede, Zephyr, Puck, Charon, and more).
Direct Local Saving: Optionally save the generated audio directly to a .wav file on your machine.
Base64 Output: Returns raw audio data for flexible handling by the AI client.

Setup

Prerequisites

Gemini API Key: Obtain an API key from the Google AI Studio.
Node.js: Ensure you have Node.js 18+ installed.

Installation as Gemini CLI Extension

You can install this directly into Gemini CLI:

gemini extensions install https://github.com/notsointresting/gemini-tts-mcp

The CLI will prompt you for your GEMINI_API_KEY during installation and store it securely in your system keychain.

Manual Installation (Development)

Clone this repository:

git clone https://github.com/notsointresting/gemini-tts-mcp.git
cd gemini-tts-mcp

Install dependencies:
```
npm install
```
Build the project:
```
npm run build
```
Install locally:
```
gemini extensions install .
```

Configuration (Standalone MCP)

The server requires the GEMINI_API_KEY environment variable to be set.

For Claude Desktop

Add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "gemini-tts": {
      "command": "node",
      "args": ["/path/to/gemini-tts-mcp/build/index.js"],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Rate Limits

Gemini TTS models are currently in Preview, and rate limits are more restrictive than standard text models.

Gemini API (Google AI Studio)

Limits vary by usage tier. Below are typical preview limits:

Tier	Requests Per Minute (RPM)	Tokens Per Minute (TPM)	Requests Per Day (RPD)
Free	2 RPM	32,000 TPM	50 RPD
Tier 1	5 RPM	100,000 TPM	2,000 RPD
Tier 2	10 RPM	200,000 TPM	5,000 RPD
Tier 3	15 RPM	500,000 TPM	10,000 RPD

Check your active limits in Google AI Studio Settings.

Vertex AI (Google Cloud)

Vertex AI generally provides higher base quotas:

Requests: 60 RPM (per project).
Input Size: Max 4,000 bytes per text field.
Output Duration: Max 655 seconds (~10.9 minutes) of audio per request.

Tools

1. `generate_speech`

Generates speech from the provided text.

Parameters:

text (required): The text to convert to speech.
voice (optional): The prebuilt voice name (default: Aoede).
model (optional): The Gemini model to use (gemini-2.5-flash-preview-tts or gemini-2.5-pro-preview-tts).
outputPath (optional): Local path to save the .wav file (e.g., ./output.wav).

Example Usage: "Generate speech for 'Hello world' using the voice Zephyr and save it to hello.wav"

2. `list_voices`

Lists all available prebuilt voices supported by the server.

Supported Voices

Aoede, Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Callirhoe, Autonoe, Enceladus, Iapetus, Umbriel, Algieba, Despina, Erinome, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Schedar, Gacrux, Pulcherrima, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Sulafar.

License

MIT