Posted on Feb 7

Creating a Text-to-Speech AI Agent in JavaScript using OpenAI API

#webdev #javascript #programming #beginners

Introduction

Have you ever wanted to convert text into speech using AI? OpenAI’s Text-to-Speech (TTS) API allows developers to generate high-quality speech from text. In this blog, we will build a simple AI-powered TTS agent in JavaScript using OpenAI's API. By the end, you'll have a working program that converts any text into speech and plays it back.

Prerequisites

Before we begin, ensure you have the following:

Node.js installed (Download here)
An OpenAI API Key (Get it here)
Basic knowledge of JavaScript

Step 1: Install DependenciesWe will use axios to interact with

OpenAI’s API and play-sound to play the generated audio.

npm install axios play-sound

Step 2: Writing the TTS Function

We will create a function that:

Sends a request to OpenAI’s TTS API
Saves the generated audio
Plays the audio file

const axios = require('axios'); const player = require('play-sound')(); const fs = require('fs'); const OPENAI_API_KEY = 'your-api-key'; async function textToSpeech(text) { try { const response = await axios.post( 'https://api.openai.com/v1/audio/speech', { model: 'tts-1', input: text, voice: 'alloy', }, { headers: { 'Authorization': `Bearer ${OPENAI_API_KEY}`, 'Content-Type': 'application/json' }, responseType: 'arraybuffer' } ); const filePath = 'output.mp3'; fs.writeFileSync(filePath, response.data); console.log('Playing audio...'); player.play(filePath); } catch (error) { console.error('Error:', error.response ? error.response.data : error.message); } } textToSpeech("Hello, this is an AI-generated voice!");

Step 3: Running the Script

Save the file as tts.js and run it using:

node tts.js

Learn how to create image analysis with the Google Cloud Vision API.

Customization

Change the Voice: OpenAI provides multiple voices like alloy, echo, fable, etc. Try different voices!
Integrate into a Web App: Use this in a frontend React/Next.js project by calling the API via a backend.

Conclusion

With just a few lines of JavaScript, we have successfully built a powerful AI-powered text-to-speech agent. Whether for accessibility, automation, or just for fun, AI-driven voice synthesis is a game-changer. Try it out and enhance your projects with realistic AI voices!