Posted on Feb 28

Supercharging Your AI Agents with Pydantic: A Developer's Guide 🚀

Introduction 👋

Have you ever built an AI agent that sometimes returns unpredictable data structures? Or perhaps you've dealt with the frustration of parsing JSON from a language model only to have your application crash because a field was missing or had the wrong type?

I've been there too! That's why today I want to talk about one of my favorite tools for taming the wild outputs of AI agents: Pydantic!

In this guide, I'll show you how Pydantic can transform your AI agent development from a game of chance into a reliable, robust process. Let's dive in! 🏊‍♂️

What is Pydantic and Why Should AI Developers Care? 🤔

Pydantic is a Python library for data validation that uses type annotations. Think of it as a bouncer for your data - it checks IDs (types) at the door and makes sure only the right data gets into your application.

from pydantic import BaseModel class User(BaseModel): name: str age: int is_active: bool

But why is this particularly useful for AI agents? Here's the thing:

AI models (especially LLMs) are amazing at generating content, but they're not always precise about formatting. They might return JSON where a number is accidentally a string ("42" instead of 42), or they might forget a field entirely. Without validation, these small inconsistencies can cause big problems downstream in your application.

The AI Agent's Output Problem 🤖💬

Imagine asking an AI agent to return information about a product:

{ "name": "Super Widget", "price": "29.99", "in_stock": "true", "features": ["durable", "lightweight"] }

Notice the issues? price is a string instead of a float, and in_stock is a string instead of a boolean. Your application might crash when it tries to do math with that price or make a decision based on the stock status.

Pydantic to the rescue! It can automatically convert these types for you, so "29.99" becomes 29.99 (float) and "true" becomes True (boolean).

How Pydantic Makes AI Agents More Reliable 🛡️

1. Automatic Type Conversion

Pydantic doesn't just validate - it tries to convert data to the right type when possible. This is perfect for AI outputs that are almost correct.

from pydantic import BaseModel class Product(BaseModel): name: str price: float in_stock: bool features: list[str] # Even though price and in_stock are strings, Pydantic will convert them product = Product( name="Super Widget", price="29.99", in_stock="true", features=["durable", "lightweight"] ) print(product) # Product(name='Super Widget', price=29.99, in_stock=True, features=['durable', 'lightweight'])

2. Clear Error Messages

When validation fails, Pydantic tells you exactly what went wrong:

try: Product(name="Broken Widget", price="expensive", in_stock=True, features="strong") except Exception as e: print(f"Error: {e}") # Error: 2 validation errors for Product # price # Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='expensive', input_type=str] # features # Input should be a valid list [type=list_type, input_value='strong', input_type=str]

These detailed errors make debugging so much easier when your AI agent returns unexpected data.

3. Schema Generation for Guiding AI Outputs

One of my favorite Pydantic features for AI development is its ability to generate JSON schemas:

print(Product.model_json_schema()) # { # "title": "Product", # "type": "object", # "properties": { # "name": {"title": "Name", "type": "string"}, # "price": {"title": "Price", "type": "number"}, # "in_stock": {"title": "In Stock", "type": "boolean"}, # "features": { # "title": "Features", # "type": "array", # "items": {"type": "string"} # } # }, # "required": ["name", "price", "in_stock", "features"] # }

You can use this schema to guide your AI model, especially if you're using function calling with OpenAI or similar features with other providers. This dramatically improves the chances of getting correctly formatted responses!

Real-World Use Cases 🌍

Let's look at how developers are using Pydantic with AI agents in the wild:

OpenAI Function Calling + Pydantic

from openai import OpenAI from pydantic import BaseModel, Field class WeatherInfo(BaseModel): location: str = Field(..., description="The city and state") temperature: float = Field(..., description="Current temperature in Celsius") condition: str = Field(..., description="Weather condition (sunny, cloudy, etc.)") # Get the JSON schema for OpenAI function_def = { "name": "get_weather", "description": "Get the current weather in a location", "parameters": WeatherInfo.model_json_schema() } client = OpenAI() response = client.chat.completions.create( model="gpt-4-turbo", messages=[ {"role": "user", "content": "What's the weather like in Boston?"} ], functions=[function_def], function_call={"name": "get_weather"} ) # Extract and validate the response function_response = response.choices[0].message.function_call.arguments weather = WeatherInfo.model_validate_json(function_response) print(f"It's {weather.temperature}°C and {weather.condition} in {weather.location}")

This approach has two benefits:

The schema guides the AI to produce properly structured output
Pydantic validates that output as an extra safety measure

Multi-Agent Systems (CrewAI)

In systems where multiple AI agents collaborate, consistent data structures are critical. Frameworks like CrewAI use Pydantic to ensure agents communicate properly:

from pydantic import BaseModel from crewai import Agent, Task, Crew class ResearchReport(BaseModel): topic: str key_findings: list[str] sources: list[str] # Define a task with a Pydantic output schema research_task = Task( description="Research the latest advancements in quantum computing", expected_output="A structured research report with key findings and sources", output_pydantic=ResearchReport ) # When the agent completes the task, CrewAI validates the output against the model

Best Practices for Using Pydantic with AI Agents ✅

After working with numerous AI projects, here are my top recommendations:

1. Start with a Clear Data Model

Define Pydantic models that capture exactly what you need from your AI agent. Be specific about types and constraints:

from typing import Optional, Literal from pydantic import BaseModel, Field, conint class ProductRecommendation(BaseModel): product_name: str price_range: str = Field(..., pattern=r"^\$\d+-\$\d+$") # Ensure format like "$10-$20"  rating: conint(ge=1, le=5) # Integer between 1-5  category: Literal["electronics", "clothing", "home", "books", "other"] features: list[str] = Field(..., min_items=2, max_items=5) in_stock: bool shipping_days: Optional[int] = None

This detailed model serves as documentation and validation in one package.

2. Handle Validation Errors Gracefully

Always wrap your Pydantic validation in try/except blocks:

try: recommendation = ProductRecommendation.model_validate_json(ai_response) # Use the structured data except Exception as e: # Log the error  print(f"AI output validation failed: {e}") # Possible strategies:  # 1. Use a fallback approach  # 2. Re-prompt the AI with the error details  # 3. Apply some fixes and retry validation

3. Consider Re-Prompting When Validation Fails

One powerful approach is to tell the AI exactly what went wrong and ask it to fix the response:

def get_validated_response(prompt): for attempt in range(3): # Try up to 3 times  response = ai_model.generate(prompt) try: result = ProductRecommendation.model_validate_json(response) return result except Exception as e: if attempt < 2: # Don't update prompt on the last attempt  prompt += f"\nYour previous response had validation errors: {e}. Please fix them and try again." # If we get here, all attempts failed  raise ValueError("Could not get valid response after multiple attempts")

This feedback loop helps the AI learn from its mistakes!

A Simple Tutorial: Getting Started with Pydantic and AI 🏗️

Let's bring everything together with a simple tutorial. We'll create a movie recommendation agent that returns properly structured data:

Step 1: Define Your Pydantic Model

from pydantic import BaseModel, Field from typing import List, Optional class MovieRecommendation(BaseModel): title: str year: int = Field(..., ge=1900, le=2030) genres: List[str] = Field(..., min_items=1) rating: float = Field(..., ge=0.0, le=10.0) director: str streaming_on: Optional[List[str]] = None description: str = Field(..., max_length=500)

Step 2: Create a Function to Get Recommendations from an AI

def get_movie_recommendation(genre_preference, mood, decade_preference=None): # Construct a prompt for the AI  prompt = f""" Suggest a movie based on the following: Genre preference: {genre_preference} Mood: {mood} Decade preference: {decade_preference or 'any'} Return the recommendation as a JSON object with the following fields: - title: the movie title - year: the release year (1900-2030) - genres: list of genres - rating: rating out of 10 - director: the director's name - streaming_on: list of streaming platforms (if known) or null - description: brief description (max 500 chars) """ # In a real application, you'd call your AI model here  # For demonstration, let's pretend we got this response:  ai_response = """ { "title": "The Grand Budapest Hotel", "year": 2014, "genres": ["Comedy", "Drama", "Adventure"], "rating": 8.1, "director": "Wes Anderson", "streaming_on": ["HBO Max", "Disney+"], "description": "A writer encounters the owner of an aging high-class hotel, who tells him of his early years serving as a lobby boy in the hotel's glorious years under an exceptional concierge." } """ try: # Parse and validate the AI response  recommendation = MovieRecommendation.model_validate_json(ai_response) return recommendation except Exception as e: print(f"Error validating AI response: {e}") # In a real application, you might implement retry logic here  return None

Step 3: Use the Recommendation in Your Application

def display_recommendation(recommendation): if not recommendation: return "Sorry, couldn't generate a valid recommendation." return f""" 🎬 {recommendation.title} ({recommendation.year}) - {recommendation.rating}/10 Directed by: {recommendation.director} Genres: {', '.join(recommendation.genres)} {recommendation.description} {f"Available on: {', '.join(recommendation.streaming_on)}" if recommendation.streaming_on else "Streaming info not available"} """ # Get and display a recommendation user_genre = "sci-fi" user_mood = "thoughtful" user_decade = "2010s" movie = get_movie_recommendation(user_genre, user_mood, user_decade) print(display_recommendation(movie))

Advanced Tips and Tricks 🧠

Want to take your Pydantic + AI game to the next level? Here are some advanced techniques:

Custom Validators for Domain-Specific Rules

from pydantic import BaseModel, Field, validator class TravelRecommendation(BaseModel): destination: str best_months: list[str] budget_usd: int = Field(..., gt=0) @validator('best_months') def check_valid_months(cls, months): valid_months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] for month in months: if month not in valid_months: raise ValueError(f"Invalid month: {month}") return months

Nested Models for Complex Data

class Address(BaseModel): street: str city: str state: str country: str postal_code: str class Contact(BaseModel): name: str email: str phone: Optional[str] = None address: Address class BusinessListing(BaseModel): name: str category: str rating: float contact: Contact hours: dict[str, str]

Working with LangChain's Pydantic Output Parser

from langchain.output_parsers import PydanticOutputParser from langchain.prompts import PromptTemplate from langchain.llms import OpenAI from pydantic import BaseModel, Field class AmazonProduct(BaseModel): name: str = Field(description="The product name") price: float = Field(description="The product price in USD") rating: float = Field(description="Rating from 1-5") reviews: int = Field(description="Number of reviews") parser = PydanticOutputParser(pydantic_object=AmazonProduct) prompt = PromptTemplate( template="Extract product information from this text:\n{text}\n{format_instructions}", input_variables=["text"], partial_variables={"format_instructions": parser.get_format_instructions()} ) model = OpenAI() input_text = """ This amazing laptop is the MacBook Pro 16-inch, priced at $2,399. It has received excellent feedback from customers, with a 4.8 star rating based on 3,842 reviews. """ output = model(prompt.format(text=input_text)) product = parser.parse(output)

Conclusion 🎯

Pydantic is more than just a validation library - it's your AI agent's best friend! By defining clear data models and validating inputs and outputs, you can:

Make your AI applications more reliable
Catch errors early before they cascade into bigger problems
Guide your models to produce better-structured outputs
Create self-documenting code that clearly specifies what data you expect

The next time you're building an AI agent, take the time to define your data models with Pydantic. Your future self (and your users) will thank you!

Have you used Pydantic with AI projects? Feel free to share your experiences in the comments!

DEV Community