DEV Community

TJ Durnford
TJ Durnford

Posted on

Connecting LLMs to Twilio: A Step-by-Step Guide

Integrating OpenAI's LLM with Twilio Using Vercel AI SDK

In this guide, we'll walk through the process of integrating OpenAI's language models with Twilio's Conversation Relay using the Vercel AI SDK. This integration allows you to create a virtual voice assistant that can handle user queries and provide information via a phone call. We'll cover setting up the project, configuring Redis, and running the project. Additionally, we'll explain how the bufferTransform function helps in sending larger chunks of data to Twilio, avoiding the inefficiency of sending one token at a time.

Prerequisites

  • Node.js and npm installed on your machine.
  • A Twilio account.
  • An OpenAI API key.
  • A Redis instance for managing conversation state.

Step 1: Setting Up the Project

First, create a new directory for your project and initialize it with npm:

mkdir twilio-openai-integration cd twilio-openai-integration npm init -y 
Enter fullscreen mode Exit fullscreen mode

Install the necessary dependencies:

npm install ai express express-ws redis twilio @ai-sdk/openai uuid ws dotenv npm install --save-dev typescript @types/node @types/ws @types/express-ws @types/express 
Enter fullscreen mode Exit fullscreen mode

Step 2: Project Structure

Create the following file structure:

twilio-openai-integration/ │ ├── managers/ │ └── ConversationManager.ts │ ├── types/ │ └── twilio.ts │ ├── utils/ │ └── bufferTransform.ts │ ├── .env └── index.ts 
Enter fullscreen mode Exit fullscreen mode

Step 3: Environment Configuration

Create a .env file in the root of your project and add your environment variables:

OPENAI_API_KEY=your-openai-api-key PORT=5000 REDIS_URL=redis://localhost:6379 SERVER_DOMAIN=http://localhost:5000 TAVILY_API_KEY=your-twilio-api-key 
Enter fullscreen mode Exit fullscreen mode

Step 4: Implementing the Server

In index.ts, implement the server logic:

import express from "express"; import ExpressWs from "express-ws"; import VoiceResponse from "twilio/lib/twiml/VoiceResponse"; import { CoreMessage, streamText } from "ai"; import { openai } from "@ai-sdk/openai"; import { v4 as uuid } from "uuid"; import { type WebSocket } from "ws"; import "dotenv/config"; import { ConversationManager } from "./managers/ConversationManager"; import { EventMessage } from "./types/twilio"; import { bufferTransform } from "./utils/bufferTransform"; const app = ExpressWs(express()).app; const PORT = parseInt(process.env.PORT || "5000"); const welcomeGreeting = "Hi there! How can I help you today?"; const systemInstructions = "You are a virtual voice assistant. You can help the user with their questions and provide information."; app.use(express.urlencoded({ extended: false })); app.post("/call/incoming", async (_, res) => { const response = new VoiceResponse(); response.connect().conversationRelay({ url: `wss://${process.env.SERVER_DOMAIN}/call/connection`, welcomeGreeting, }); res.writeHead(200, { "Content-Type": "text/xml" }); res.end(response.toString()); }); app.ws("/call/connection", (ws: WebSocket) => { const sessionId = uuid(); ws.on("message", async (data: string) => { const event: EventMessage = JSON.parse(data); const conversation = new ConversationManager(sessionId); if (event.type === "setup") { // Add welcome message to conversation transcript const welcomeMessage: CoreMessage = { role: "assistant", content: welcomeGreeting, }; await conversation.addMessage(welcomeMessage); } else if (event.type === "prompt") { // Add user message to conversation and retrieve all messages const message: CoreMessage = { role: "user", content: event.voicePrompt }; await conversation.addMessage(message); const messages = await conversation.getMessages(); const controller = new AbortController(); // Stream text from OpenAI model const { textStream, text: completeText } = await streamText({ abortSignal: controller.signal, experimental_transform: bufferTransform, model: openai("gpt-4o-mini"), messages, maxSteps: 10, system: systemInstructions, }); // Iterate over text stream and send messages to Twilio for await (const text of textStream) { if (controller.signal.aborted) { break; } ws.send( JSON.stringify({ type: "text", token: text, last: false, }) ); } // Send last message to Twilio if (!controller.signal.aborted) { ws.send( JSON.stringify({ type: "text", token: "", last: true, }) ); } // Add complete text to conversation transcript const agentMessage: CoreMessage = { role: "assistant", content: await completeText, }; void conversation.addMessage(agentMessage); } else if (event.type === "end") { // Clear conversation transcript when call ends void conversation.clearMessages(); } }); ws.on("error", console.error); }); app.listen(PORT, () => { console.log(`Local: http://localhost:${PORT}`); console.log(`Remote: https://${process.env.SERVER_DOMAIN}`); }); 
Enter fullscreen mode Exit fullscreen mode

Explanation

  • Express and WebSocket Setup: We use express-ws to handle WebSocket connections, which are essential for real-time communication with Twilio's Conversation Relay.
  • Twilio VoiceResponse: This sets up a Twilio call and connects it to our WebSocket endpoint.
  • WebSocket Handling: We handle different types of events (setup, prompt, end) to manage the conversation state and interact with the OpenAI model.
  • OpenAI Integration: We use the Vercel AI SDK to stream text from OpenAI's model, transforming it with bufferTransform to send larger chunks.

Step 5: Implementing bufferTransform

In utils/bufferTransform.ts, implement the buffer transformation logic:

import { StreamTextTransform, TextStreamPart } from "ai"; export const bufferTransform: StreamTextTransform<any> = () => { let buffer = ""; let threshold = 200; return new TransformStream<TextStreamPart<any>, TextStreamPart<any>>({ transform(chunk, controller) { if (chunk.type === "text-delta") { buffer += chunk.textDelta; if (buffer.length >= threshold) { controller.enqueue({ ...chunk, textDelta: buffer }); buffer = ""; if (threshold < 5000) { threshold += 200; } } } else { controller.enqueue(chunk); } }, flush(controller) { if (buffer.length > 0) { controller.enqueue({ type: "text-delta", textDelta: buffer }); } }, }); }; 
Enter fullscreen mode Exit fullscreen mode

Explanation

  • Buffering: The bufferTransform function accumulates text tokens into a buffer. Once the buffer reaches a certain size (threshold), it sends the accumulated text as a single chunk.
  • Dynamic Threshold: The threshold increases gradually to optimize the size of the chunks being sent, improving efficiency by reducing the number of WebSocket messages.

Step 6: Running the Project

Ensure your Redis instance is running and accessible. Then, start your server:

npm run build node dist/index.ts 
Enter fullscreen mode Exit fullscreen mode

Your server should now be running, ready to handle incoming calls and relay conversations through Twilio.

Conclusion

By following these steps, you've set up a system that integrates OpenAI's language models with Twilio's Conversation Relay, using the Vercel AI SDK. This setup allows for efficient communication by buffering text tokens and sending them in larger chunks, enhancing the performance of your virtual voice assistant.

Full Code on GitHub

You can view the full code for this project on GitHub: GitHub Repository

Top comments (0)