Most LLMs and SMLs are not designed for calulations (not talking about OpenAI o1 or o3 models). Just imagine the following dialogue:
- Company: Today is Wednesday; you can return the delivery parcel within 24 hours.
- Client: Okay, let's do it on Tuesday.
Are you sure the next AI response will be correct? As a human, you can understand that next Tuesday is six days ahead, while 24 hours is just one day. However, most LLMs cannot reliably handle such logic. Their responses are non-deterministic.
This issue worsens as the context grows. If you have 30 rules and a conversation history of 30 messages, the AI loses focus and makes mistakes easily.
Common Use-Case
- You're developing an AI scheduling chatbot or AI agent for your company.
- The company has scheduling rules that are frequently updated.
- Before scheduling, the chatbot must validate customer input parameters.
- If validation fails, the chatbot must inform the customer.
What Can We Do?
Combine traditional code execution with LLMs. This idea is not new but remains underutilized:
- OpenAI integrates this feature into its Assistant API, but not in Complitions API.
- Google recently introduced code interpreter capabilities in Gemini 2.0 Flash.
Our Solution Tech Stack
- Docker (Podman)
- LangGraph.js
- Piston
Code Interpreter Sandbox
To securely run generated code, the most popular cloud code interpreters are e2b, Google, and OpenAI as I mentioned before.
However, I was looking for an open-source, self-hosted solution for flexibility and cost-effectiveness. So, 2 good options:
- Piston
- Jupyter
I chose Piston for its ease of deployment.
Piston Installation
It took me a while to understand how to add python execution environment to Piston.
0. Enable cgroup v2
For Windows WSL, this article was helpful.
1. Run a Container
docker run --privileged -p 2000:2000 -v d:\piston:'/piston' --name piston_api ghcr.io/engineer-man/piston
2. Checkout the Piston Repository
git clone https://github.com/engineer-man/piston
3. Add Python Support
Run the following command:
node cli/index.js ppman install python
By default, this command uses your container API running on localhost:2000
to install Python.
Example Code Execution
Using the Piston Node.js Client:
import piston from "piston-client"; const codeInterpreter = piston({ server: "http://localhost:2000" }); const result = await codeInterpreter.execute('python', 'print("Hello World!")'); console.log(result);
AI Agents Implementation
We're going to use some advanced techniques:
- Graph and subgraph architecture
- Parallel node execution
- Qdrant for storage
- Observability via LangSmith
- GPT-4o-mini, a cost-efficient LLM
Refer to the LangSmith trace for a detailed overview of the flow:
https://smith.langchain.com/public/b3a64491-b4e1-423d-9802-06fcf79339d2/r
Step 1: Extract datetime-related scheduling parameters from user input
Example: "Tomorrow, last Friday, in 2 hours, at noon time."
We use code interpreter to ensure reliable extraction, as LLMs can fail even with current date-time contextual information.
Example Prompt for Python Code Generation:
Your task is to transform natural language text into Python code that extracts datetime-related scheduling parameters from user input. ## Instructions: - You are allowed to use only the "datetime" and "calendar" libraries. - You can define additional private helper methods to improve code readability and modularize validation logic. - Do not include any import statements in the output. - Assume all input timestamps are provided in the GMT+8 timezone. Adjust calculations accordingly. - The output should be a single method definition with the following characteristics: - Method name: \`getCustomerSchedulingParameters\` - Arguments: None - Return: A JSON object with the keys: - \`appointment_date\`: The day of the month (integer or \`None\`). - \`appointment_month\`: The month of the year (integer or \`None\`). - \`appointment_year\`: The year (integer or \`None\`). - \`appointment_time_hour\`: The hour of the day in 24-hour format (integer or \`None\`). - \`appointment_time_minute\`: The minute of the hour (integer or \`None\`). - \`duration_hours\`: The duration of the appointment in hours (float or \`None\`). - \`frequency\`: The recurrence of the appointment. Can be \`"Adhoc"\`, \`"Daily"\`, \`"Weekly"\`, or \`"Monthly"\` (string or \`None\`). - If a specific value is not found in the text, return \`None\` for that field. - Focus only on extracting values explicitly mentioned in the input text; do not make assumptions. - Do not include print statements or logging in the output. ## Example: ### Input: "I want to book an appointment for next Monday at 2pm for 2.5 hours." ### Output: def getCustomerSchedulingParameters(): """Extracts and returns scheduling parameters from user input in GMT+8 timezone. Returns: A JSON object with the required scheduling parameters. """ def _get_next_monday(): """Helper function to calculate the date of the next Monday.""" current_time = datetime.utcnow() + timedelta(hours=8) # Adjust to GMT+8 today = current_time.date() days_until_monday = (7 - today.weekday() + 0) % 7 # Monday is 0 return today + timedelta(days=days_until_monday) next_monday = _get_next_monday() return { "appointment_date": next_monday.day, "appointment_month": next_monday.month, "appointment_year": next_monday.year, "appointment_time_hour": 14, "appointment_time_minute": 0, "duration_hours": 2.5, "frequency": "Adhoc" } ### Notes: Ensure the output is plain Python code without any formatting or additional explanations.
Step 2: Fetch Rules from Storage
And then transform them into Python code for validation.
Step 3: Run Generated Code in Sandbox:
const pythonCodeToInvoke = ` import sys import datetime import calendar import json ${state.pythonValidationMethod} ${state.pythonParametersExtractionMethod} parameters = getCustomerSchedulingParameters() valiation_errors = validateCustomerSchedulingParameters(parameters["appointment_year"], parameters["appointment_month"], parameters["appointment_date"], parameters["appointment_time_hour"], parameters["appointment_time_minute"], parameters["duration_hours"], parameters["frequency"]) print(json.dumps({"validation_errors": valiation_errors}))`; const traceableCodeInterpreterFunction = await traceable((pythonCodeToInvoke: string) => codeInterpreter.execute('python', pythonCodeToInvoke, { args: [] })); const result = await traceableCodeInterpreterFunction(pythonCodeToInvoke);
Potential Improvements
- Implement an iterative loop for LLMs to debug and refine Python code execution dynamically.
- Human in the loop for validation method code generation.
- Caching generated code.
Final Thoughts
Bytecode execution and token-based LLMs are highly complementary technologies, unlocking a new level of flexibility. This synergistic approach has a bright future, for example AWS's recent "Bedrock Automated Reasoning", which appears to offer a similar solution within their enterprise ecosystem. Google and Microsoft also will show us something similar very soon.
Top comments (0)