Skip to content

Conversation

@apappascs
Copy link
Contributor

This commit adds Text-to-Speech support for Google GenAI (Gemini) with the following components:

API Client Layer

  • GeminiTtsApi: Low-level REST client for Gemini TTS API
  • Support for single-speaker and multi-speaker (conversational) TTS
  • PCM audio format (s16le, 24kHz, mono)
  • Request/response POJOs with Jackson annotations
  • Convenience factory methods for simpler API usage

Model Layer

  • GeminiTtsModel: Spring AI TextToSpeechModel implementation
  • GeminiTtsOptions: Builder-based configuration with runtime overrides
  • Support for 30+ voices across 24+ languages
  • Prompt-based style control (accent, pace, delivery)

Spring Boot Integration

  • Auto-configuration with properties binding
  • Dedicated starter: spring-ai-starter-model-google-genai-tts
  • Configuration prefix: spring.ai.google.genai.tts
  • Conditional bean creation based on spring.ai.model.audio.speech property

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

  • Add a Signed-off-by line to each commit (git commit -s) per the DCO
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

This commit adds comprehensive Text-to-Speech support for Google GenAI (Gemini) with the following components: ## API Client Layer - GeminiTtsApi: Low-level REST client for Gemini TTS API - Support for single-speaker and multi-speaker (conversational) TTS - PCM audio format (s16le, 24kHz, mono) - Request/response POJOs with Jackson annotations - Convenience factory methods for simpler API usage ## Model Layer - GeminiTtsModel: Spring AI TextToSpeechModel implementation - GeminiTtsOptions: Builder-based configuration with runtime overrides - Support for 30+ voices across 24+ languages - Prompt-based style control (accent, pace, delivery) ## Spring Boot Integration - Auto-configuration with properties binding - Dedicated starter: spring-ai-starter-model-google-genai-tts - Configuration prefix: spring.ai.google.genai.tts - Conditional bean creation based on spring.ai.model.audio.speech property Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
@apappascs apappascs force-pushed the feature/google-genai-tts branch from f5abaa9 to b716b59 Compare December 21, 2025 10:40
@apappascs
Copy link
Contributor Author

cc: @ddobrin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant