Elizabeth Fuentes L for AWS

Posted on Aug 18 • Edited on Aug 22 • Originally published at builder.aws.com

Building Strands Agents with a few lines of code: Implementing Observability with LangFuse

#python #aws #programming #ai

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Elizabeth Fuentes L

AWS Developer Advocate specializing in AI/ML and Generative AI. I simplify complex cloud concepts through hands-on tutorials and real-world examples.

Getting Started with Strands Agents: Build Your First AI Agent - FREE course

GitHub repository

This third part of the Building Strands Agents series focuses on implementing observability with LangFuse to monitor your agents in real-time.

🎯 Why Observability and Evaluation Matter

When you deploy agents in production, you need to answer these questions: Does your agent respond accurately? How long do responses take? Where are the bottlenecks? Which conversations fail, and why?

Without proper Observability, you're flying blind. Your agents might be hallucinating, performing poorly, or wasting computational resources, and you won't know until users complain.

Observability Components

The Strands Agents SDK includes all observability APIs. The following are key observability data points:

Metrics - Essential for understanding agent performance, optimizing behavior, and monitoring resource usage.

Traces - A fundamental component of the Strands SDK's observability framework, providing detailed insights into your agent's execution.

Logs - Strands SDK uses Python's standard logging module to provide visibility into operations.

Evaluation - Essential for measuring agent performance, tracking improvements, and ensuring your agents meet quality standards. With Strands SDK, you can perform Manual Evaluation, Structured Testing, LLM Judge Evaluation, and Tool-Specific Evaluation.

OpenTelemetry Integration

Strands natively integrates with OpenTelemetry, an industry standard for distributed tracing. You can visualize and analyze traces using any OpenTelemetry-compatible tool. This integration provides:

Compatibility with existing observability tools: Send traces to platforms such as Jaeger, Grafana Tempo, AWS X-Ray, Datadog, and more.
Standardized attribute naming: Uses OpenTelemetry semantic conventions.
Flexible export options: Console output for development, OTLP endpoint for production.
Auto-instrumentation: The SDK creates traces automatically when you activate tracing.

🍽️🔍Observability and Evaluation with Restaurant Agent

This tutorial uses the 06_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb notebook to demonstrate building a restaurant recommendation agent with observability and evaluation capabilities. This tutorial is designed for developers new to AI agents, observability, and evaluation.

⭐ Based on the code from 08-observability-and-evaluation/Observability-and-Evaluation-sample.ipynb of the Strands Agents Samples repository

What you'll Build

You create these key components:

Local Vector Database: A searchable collection of restaurant information that your agent can query
Strands Agent: An AI assistant that can recommend restaurants based on user preferences.
LangFuse: A tool that shows how your agent works and makes decisions.
RAGAS: A framework that evaluates your agent's performance (covered in the next part).

🚀 Getting Started

Clone the sample repository:

git clone https://github.com/aws-samples/sample-getting-started-with-strands-agents-course cd Lab6

Create and activate a virtual environment:

python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Each package serves a specific purpose:

langchain: Helps us build applications with language models
langfuse: Provides observability for our agent
ragas: Helps us evaluate our agent's performance
chromadb: A database for storing and searching vector embeddings
docx2txt: Converts Word documents to text
boto3: AWS SDK for Python, used to access AWS services and Use Amazon Bedrock Models
strands: Framework for building AI agents

✅ Create Vector Database from Restaurant Data

You'll create a vector database using restaurant data files in the restaurant-data folder. These files contain information about different restaurants, their menus, and specialties.

To complete this step, run the corresponding cells in the notebook

📊 Set Up Langfuse for Observability

LangFuse is an open-source observability platform specifically designed for LLM applications. It provides comprehensive tracking of your agent interactions, including:

Trace Analysis: Complete conversation flows from input to output
Performance Metrics: Response times, token usage, and cost tracking
Error Monitoring: Failed requests and exception handling
User Analytics: Conversation patterns and user engagement

The platform integrates seamlessly with popular frameworks and provides both hosted and self-hosted deployment options.

Follow the steps to create a new project in Langfuse

Setting Up LangFuse Integration

First, configure your agent to send traces to LangFuse:

from langfuse import Langfuse from strands import Agent, AgentConfig # Initialize LangFuse client langfuse = Langfuse( secret_key="your-secret-key", public_key="your-public-key", #host = "https://cloud.langfuse.com" # 🇪🇺 EU region  host = "https://us.cloud.langfuse.com" # 🇺🇸 US region ) # Create restaurant recommendation agent with observability restaurant_agent = Agent( name = "Restaurant Recommendation Agent", model=model, tools=[search_restaurants], # Give the agent access to our search tool  system_prompt="""You are a helpful restaurant recommendation assistant. Use the search_restaurants tool to find information about restaurants based on user queries. Provide detailed recommendations based on the search results. If asked about restaurants that aren't in the database, politely explain that you can only provide information about restaurants in your database. Always be friendly, helpful, and concise in your responses. """, record_direct_tool_call = True, # Record when tools are used  trace_attributes={ "session.id": str(uuid.uuid4()), # Generate a unique session ID  "user.id": "user-email-example@domain.com", # Example user ID  "langfuse.tags": [ "Agent-SDK-Example", "Strands-Project-Demo", "Observability-Tutorial" ] } )

✅ Test the Agent with Tracing

Now let's test our agent with a simple query and see how it performs. The agent will use the search tool to find relevant information and then generate a response.

# Test the agent with a simple query response = restaurant_agent("I'm looking for a restaurant with good vegetarian options. Any recommendations?") print(response)

✅ Review the traces

After running the agent, you can review the traces in LangFuse:

Go to the tracing menu in your LangFuse project

Select the trace you want to view

Examine how the agent processed the request, what tools it used, and what response it generated

This gives you visibility into how your agent is working and helps you identify any issues or areas for improvement.

Stay tuned for finale part: Evaluation with RAGAS (Retrieval Augmented Generation Assessment), where we'll dive deep into measuring and improving your agent's performance using systematic evaluation metrics!