DEV Community

Surya
Surya

Posted on • Edited on

How I Built a Natural Language to SQL Translator Using Ruby and GPT-4

Introduction

Building dynamic reports often requires writing SQL queries, which can be inefficient and difficult to scale in a production environment. To streamline this process, I built a solution using Large Language Models (LLMs) that generates reports from natural language queries.

I implemented this in Ruby on Rails ๐Ÿ’Ž, my preferred language, but the approach can be applied in any language like Python ๐Ÿ, Java โ˜•, or JavaScript ๐Ÿ“œ.


Overall Workflow

Here's the overall workflow:

User Query โžก๏ธ Table Identification โžก๏ธ SQL Generation โžก๏ธ SQL Execution โžก๏ธ Result Delivery

We separate the process into two clear steps:

  1. Table Identification
  2. SQL Generation and Execution

The Problem

Users need to retrieve data insights from databases without writing SQL queries manually. A system should:

  1. Understand user intent ๐Ÿง โ€“ Identify the relevant database table.
  2. Generate the appropriate SQL query ๐Ÿ“ โ€“ Convert natural language queries into executable SQL.
  3. Fetch and return results in a structured format ๐Ÿ“ˆ โ€“ Output the data in an easy-to-consume format.

This approach ensures a seamless experience for non-technical users while leveraging the power of LLMs to dynamically generate reports.


The Solution

I built a modular system that consists of three primary components:

1. Llm::Chat - The OpenAI Chat API Wrapper

This class handles communication with OpenAI's Chat API.

require 'net/http' require 'json' class Llm::Chat OPENAI_API_URL = "https://api.openai.com/v1/chat/completions" API_KEY = ENV['OPENAI_API_KEY'] def initialize(payload:) @payload = payload end def call response = request_openai parse_response(response) end private def request_openai uri = URI(OPENAI_API_URL) http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true request = Net::HTTP::Post.new(uri.path, headers) request.body = @payload.to_json http.request(request) end def headers { "Content-Type" => "application/json", "Authorization" => "Bearer #{API_KEY}" } end def parse_response(response) JSON.parse(response.body)["choices"]&.first["message"]["content"].strip rescue nil end end 
Enter fullscreen mode Exit fullscreen mode

Prompts Used

We use predefined prompts to maintain consistency across interactions with the LLM.

module Prompts TABLE_IDENTIFICATION = "Given a user query, determine the most relevant table or tables from [users, departments, tickets]. If the query involves multiple tables (e.g., grouping users by department), return a comma-separated list of table names. Only return the table name(s) with no extra text." SQL_GENERATION = "Generate a MySQL query based on the table structure: %{table_structure}. Support queries involving multiple tables where applicable (e.g., grouping users by department). Only return the SQL query as plain text with no formatting, explanations, or markdown." end 
Enter fullscreen mode Exit fullscreen mode

2. Identifying the Relevant Table

The first step is to determine the database table that best matches the user's query. This is handled by TableIdentifier.

class TableIdentifier def initialize(query:) @query = query end def call chat end private def chat Llm::Chat.new(payload: chat_payload).call end def chat_payload { "model": "gpt-4", "messages": [ { "role": "system", "content": Prompts::TABLE_IDENTIFICATION }, { "role": "user", "content": @query } ], "max_tokens": 100 } end end 
Enter fullscreen mode Exit fullscreen mode

Example Table Structures

For reference, here are example table structures:

Users Table ๐Ÿ‘ฅ

CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(255), email VARCHAR(255), status VARCHAR(50), department_id INT, created_at TIMESTAMP, updated_at TIMESTAMP ); 
Enter fullscreen mode Exit fullscreen mode

Departments Table ๐Ÿข

CREATE TABLE departments ( id INT PRIMARY KEY, name VARCHAR(255), manager_id INT, created_at TIMESTAMP, updated_at TIMESTAMP ); 
Enter fullscreen mode Exit fullscreen mode

Tickets Table ๐ŸŽซ

CREATE TABLE tickets ( id INT PRIMARY KEY, user_id INT, subject VARCHAR(255), status VARCHAR(50), created_at TIMESTAMP, updated_at TIMESTAMP ); 
Enter fullscreen mode Exit fullscreen mode

3. Generating and Executing the SQL Query

Once the table is identified, the next step is generating a valid SQL query. This is done using ReportGenerator.

class ReportGenerator require "#{Rails.root}/lib/llm" def initialize(query:) @query = query end def call report end private def report [ { type: "text", data: "Here is your report" }, { type: "table", data: ActiveRecord::Base.connection.select_all(query).to_a } ] end def table_structure ActiveRecord::Base.connection.execute("SHOW CREATE TABLE #{table_name}").first[1] end def table_name TableIdentifier.new(query: @query).call end def query Llm::Chat.new(payload: query_payload).call end def query_payload { "model": "gpt-4", "messages": [ { "role": "system", "content": Prompts::SQL_GENERATION % { table_structure: table_structure } }, { "role": "user", "content": @query } ], "max_tokens": 1000 } end end 
Enter fullscreen mode Exit fullscreen mode

Example Usage ๐Ÿ› ๏ธ

With this setup, generating a report is as simple as making a method call:

ReportGenerator.new(query: "count of inactive users").call ReportGenerator.new(query: "list of active users").call ReportGenerator.new(query: "number of users per department").call 
Enter fullscreen mode Exit fullscreen mode

For a query like "count of inactive users", the system will generate:

SELECT COUNT(*) FROM users WHERE status = 'inactive'; 
Enter fullscreen mode Exit fullscreen mode

For a query like "number of users per department", the system will generate:

SELECT d.name, COUNT(u.id) FROM users u JOIN departments d ON u.department_id = d.id GROUP BY d.name; 
Enter fullscreen mode Exit fullscreen mode

Disclaimer โš ๏ธ

The LLM-generated prompts may require tweaking based on trial and error to achieve optimal results. You may need to fine-tune them based on your database schema and specific reporting requirements.


Benefits of This Approach ๐Ÿš€

  1. No manual SQL writing โœ… โ€“ Users can generate reports using natural language.
  2. Adaptability ๐Ÿ”„ โ€“ The model can be fine-tuned to support new tables or complex queries.
  3. Security ๐Ÿ”’ โ€“ The system ensures only relevant tables and safe queries are executed.
  4. Scalability ๐Ÿ“ˆ โ€“ Works across multiple datasets without custom development for each request.

Conclusion ๐ŸŽฏ

By leveraging LLMs, we can automate the process of translating user intent into SQL queries, making data retrieval seamless and efficient. This approach eliminates the need for manual query writing while ensuring accuracy and adaptability.

Would you implement a similar solution in your application? Let me know your thoughts!

Top comments (0)