Skip to content
14 changes: 9 additions & 5 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,20 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
ruby-version: ['3.1', '3.2', '3.3', '3.4', 'jruby-10.0.1.0']
rails-version: ['rails-7.1', 'rails-7.2', 'rails-8.0']
ruby-version: ['3.1', '3.2', '3.3', '3.4', 'jruby-10.0.2.0']
rails-version: ['rails-7.1', 'rails-7.2', 'rails-8.0', 'rails-8.1']
exclude:
# Rails 8 requires Ruby 3.2+
- ruby-version: '3.1'
rails-version: 'rails-8.0'
- ruby-version: '3.1'
rails-version: 'rails-8.1'
# JRuby only supports up to 7.1 right now
- ruby-version: 'jruby-10.0.1.0'
- ruby-version: 'jruby-10.0.2.0'
rails-version: 'rails-8.1'
- ruby-version: 'jruby-10.0.2.0'
rails-version: 'rails-8.0'
- ruby-version: 'jruby-10.0.1.0'
- ruby-version: 'jruby-10.0.2.0'
rails-version: 'rails-7.2'

steps:
Expand Down Expand Up @@ -200,4 +204,4 @@ jobs:
fi
}
env:
GEM_HOST_API_KEY: "${{secrets.RUBYGEMS_AUTH_TOKEN}}"
GEM_HOST_API_KEY: "${{secrets.RUBYGEMS_AUTH_TOKEN}}"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ build-iPhoneSimulator/
# for a library or gem, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
Gemfile.lock
gemfiles/*.lock
# .ruby-version
# .ruby-gemset

Expand Down
6 changes: 6 additions & 0 deletions Appraisals
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,9 @@ appraise 'rails-8.0' do
gem 'rails', '~> 8.0.0'
end
end

appraise 'rails-8.1' do
group :development do
gem 'rails', '~> 8.1.0'
end
end
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@ RubyLLM.embed "Ruby is elegant and expressive"
RubyLLM.transcribe "meeting.wav"
```

```ruby
# Text to speech
RubyLLM.tts "Hello, welcome to RubyLLM!"
```

```ruby
# Moderate content for safety
RubyLLM.moderate "Check if this text is safe"
Expand Down
96 changes: 96 additions & 0 deletions docs/_core_features/text-to-speech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
layout: default
title: Text to Speech
nav_order: 7
description: Convert text to speech
redirect_from:
- /guides/audio-transcription
- /guides/transcription
---

# {{ page.title }}
{: .d-inline-block .no_toc }

v1.9.0+
{: .label .label-green }

{{ page.description }}
{: .fs-6 .fw-300 }

## Table of contents
{: .no_toc .text-delta }

1. TOC
{:toc}

---

After reading this guide, you will know:

* How to generate speech from text.
* How to save audio files.
* How to select different voices.
* How to access raw audio data.
* Specifics of language support.

## Basic Text to Speech

Generate audio with the global `RubyLLM.tts` method:

```ruby
audio = RubyLLM.tts("Hello, welcome to RubyLLM!")

```

## Save Audio File
You can save the generated audio to a file.
If you are using OpenAI, the audio will be saved as an MP3 file.

```ruby
audio = RubyLLM.tts("This is a text to speech example.", provider: :openai, model: "gpt-4o-mini-tts")
audio.save("example.mp3")
```

If you are using Gemini, the audio will be saved as a raw PCM file.

```ruby
audio = RubyLLM.tts("This is a text to speech example.", provider: :gemini, model: "gemini-2.5-flash-preview-tts")
audio.save("example.pcm")
```

You can convert it to MP3 using ffmpeg:

```bash
ffmpeg -f s16le -ar 24000 -ac 1 -i example.pcm example.mp3
```

### Select Voice
You can specify different voices. Supported voices for OpenAI
are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse.

For Gemini have a look at the [gemini voices](https://ai.google.dev/gemini-api/docs/speech-generation#voices).

```ruby
# Using a specific voice
voice = "ash"
audio = RubyLLM.tts("Hello, this is a #{voice}`s voice.", voice: voice)
```

### Access Audio Data
You can access the raw audio data:

```ruby
audio = RubyLLM.tts("Accessing raw audio data.")
audio.data # => binary audio data (MP3 for OpenAI, PCM for Gemini)
```

### Language Support
OpenAi and Gemini gather language support automatically based on the text provided.
Previously, you could specify the language manually in Gemini.

## Next Steps

* [Chatting with AI Models]({% link _core_features/chat.md %}): Learn about conversational AI.
* [Image Generation]({% link _core_features/image-generation.md %}): Generate images from text.
* [Error Handling]({% link _advanced/error-handling.md %}): Master handling API errors.

5 changes: 5 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ RubyLLM.embed "Ruby is elegant and expressive"
RubyLLM.transcribe "meeting.wav"
```

```ruby
# Text to speech
RubyLLM.tts "Hello, welcome to RubyLLM!"
```

```ruby
# Moderate content for safety
RubyLLM.moderate "Check if this text is safe"
Expand Down
Loading