Posted on Mar 21, 2023

Streaming ChatGPT API responses with python and JavaScript

It took me a while to figure out how to get a python flask server and web client to support streaming OpenAI completions so I figured I'd share.

 from flask import Flask, stream_template, request, Response import openai from dotenv import load_dotenv import os load_dotenv() # put these values in an .env file parallel to this file openai.organization = os.environ.get("OPENAI_ORG") openai.api_key = os.environ.get('OPENAI_API_KEY') def send_messages(messages): return openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=messages, stream=True ) app = Flask(__name__) @app.route('/chat', methods=['GET', 'POST']) def chat(): if request.method == 'POST': messages = request.json['messages'] def event_stream(): for line in send_messages(messages=messages): print(line) text = line.choices[0].delta.get('content', '') if len(text): yield text return Response(event_stream(), mimetype='text/event-stream') else: return stream_template('./chat.html') if __name__ == '__main__': app.run()

chat.html

 <!DOCTYPE html> <html> <head> <title>Chat</title> </head> <body> <h1>Chat</h1> <form id="chat-form"> <label for="message">Message:</label> <input type="text" id="message" name="message"> <button type="submit">Send</button> </form> <div id="chat-log"></div> <script src="{{ url_for('static', filename='chat.js') }}"> </script> </body> </html>

You can't use EventSource for this if you want to use POST method, this uses fetch API instead.

chat.js

 const form = document.querySelector("#chat-form"); const chatlog = document.querySelector("#chat-log"); form.addEventListener("submit", async (event) => { event.preventDefault(); // Get the user's message from the form const message = form.elements.message.value; // Send a request to the Flask server with the user's message const response = await fetch("/chat", { method: "POST", headers: { "Content-Type": "application/json", }, body: JSON.stringify({ messages: [{ role: "user", content: message }] }), }); // Create a new TextDecoder to decode the streamed response text const decoder = new TextDecoder(); // Set up a new ReadableStream to read the response body const reader = response.body.getReader(); let chunks = ""; // Read the response stream as chunks and append them to the chat log while (true) { const { done, value } = await reader.read(); if (done) break; chunks += decoder.decode(value); chatlog.innerHTML = chunks; } });

Obviously this is not an optimal chat user experience but it'll get you started.

Top comments (3)

mercm8 • Nov 13 '23 • Edited

I tried this and it worked great running on localhost, but when I tried deploying it to my makeshift webserver (rpi / nginx) it stopped streaming and waited for the response stream to finish before the message appeared. Any idea why?

edit: I needed to add 'X-Accel-Buffering' = 'no' to response headers, changing the code to
response = Response(event_stream(), mimetype='text/event-stream') response.headers['X-Accel-Buffering'] = 'no' return response

Brayden Moore • Jul 29 '23

Exactly what I was looking for. Also dig the way you write code. E.g. one app route with an if/else rather than one for POSTs and another just to display the template. Nice.

Jethro Larson • Oct 9 '24

Just going for the most concise article :)