π»πͺπ¨π± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
Getting Started with Strands Agents: Build Your First AI Agent - FREE course
This third part of the Building Strands Agents series focuses on implementing observability with LangFuse to monitor your agents in real-time.
π― Why Observability and Evaluation Matter
When you deploy agents in production, you need to answer these questions: Does your agent respond accurately? How long do responses take? Where are the bottlenecks? Which conversations fail, and why?
Without proper Observability, you're flying blind. Your agents might be hallucinating, performing poorly, or wasting computational resources, and you won't know until users complain.
Observability Components
The Strands Agents SDK includes all observability APIs. The following are key observability data points:
Metrics - Essential for understanding agent performance, optimizing behavior, and monitoring resource usage.
Traces - A fundamental component of the Strands SDK's observability framework, providing detailed insights into your agent's execution.
Logs - Strands SDK uses Python's standard logging module to provide visibility into operations.
Evaluation - Essential for measuring agent performance, tracking improvements, and ensuring your agents meet quality standards. With Strands SDK, you can perform Manual Evaluation, Structured Testing, LLM Judge Evaluation, and Tool-Specific Evaluation.
OpenTelemetry Integration
Strands natively integrates with OpenTelemetry, an industry standard for distributed tracing. You can visualize and analyze traces using any OpenTelemetry-compatible tool. This integration provides:
- Compatibility with existing observability tools: Send traces to platforms such as Jaeger, Grafana Tempo, AWS X-Ray, Datadog, and more.
- Standardized attribute naming: Uses OpenTelemetry semantic conventions.
- Flexible export options: Console output for development, OTLP endpoint for production.
- Auto-instrumentation: The SDK creates traces automatically when you activate tracing.
π½οΈπObservability and Evaluation with Restaurant Agent
This tutorial uses the 06_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb notebook to demonstrate building a restaurant recommendation agent with observability and evaluation capabilities. This tutorial is designed for developers new to AI agents, observability, and evaluation.
β Based on the code from 08-observability-and-evaluation/Observability-and-Evaluation-sample.ipynb of the Strands Agents Samples repository
What you'll Build
You create these key components:
- Local Vector Database: A searchable collection of restaurant information that your agent can query
- Strands Agent: An AI assistant that can recommend restaurants based on user preferences.
- LangFuse: A tool that shows how your agent works and makes decisions.
- RAGAS: A framework that evaluates your agent's performance (covered in the next part).
π Getting Started
Clone the sample repository:
git clone https://github.com/aws-samples/sample-getting-started-with-strands-agents-course cd Lab6
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install the required packages:
pip install -r requirements.txt
Each package serves a specific purpose:
- langchain: Helps us build applications with language models
- langfuse: Provides observability for our agent
- ragas: Helps us evaluate our agent's performance
- chromadb: A database for storing and searching vector embeddings
- docx2txt: Converts Word documents to text
- boto3: AWS SDK for Python, used to access AWS services and Use Amazon Bedrock Models
- strands: Framework for building AI agents
β Create Vector Database from Restaurant Data
You'll create a vector database using restaurant data files in the restaurant-data folder. These files contain information about different restaurants, their menus, and specialties.
To complete this step, run the corresponding cells in the notebook
π Set Up Langfuse for Observability
LangFuse is an open-source observability platform specifically designed for LLM applications. It provides comprehensive tracking of your agent interactions, including:
- Trace Analysis: Complete conversation flows from input to output
- Performance Metrics: Response times, token usage, and cost tracking
- Error Monitoring: Failed requests and exception handling
- User Analytics: Conversation patterns and user engagement
The platform integrates seamlessly with popular frameworks and provides both hosted and self-hosted deployment options.
Follow the steps to create a new project in Langfuse
Setting Up LangFuse Integration
First, configure your agent to send traces to LangFuse:
from langfuse import Langfuse from strands import Agent, AgentConfig # Initialize LangFuse client langfuse = Langfuse( secret_key="your-secret-key", public_key="your-public-key", #host = "https://cloud.langfuse.com" # πͺπΊ EU region host = "https://us.cloud.langfuse.com" # πΊπΈ US region ) # Create restaurant recommendation agent with observability restaurant_agent = Agent( name = "Restaurant Recommendation Agent", model=model, tools=[search_restaurants], # Give the agent access to our search tool system_prompt="""You are a helpful restaurant recommendation assistant. Use the search_restaurants tool to find information about restaurants based on user queries. Provide detailed recommendations based on the search results. If asked about restaurants that aren't in the database, politely explain that you can only provide information about restaurants in your database. Always be friendly, helpful, and concise in your responses. """, record_direct_tool_call = True, # Record when tools are used trace_attributes={ "session.id": str(uuid.uuid4()), # Generate a unique session ID "user.id": "user-email-example@domain.com", # Example user ID "langfuse.tags": [ "Agent-SDK-Example", "Strands-Project-Demo", "Observability-Tutorial" ] } )
β Test the Agent with Tracing
Now let's test our agent with a simple query and see how it performs. The agent will use the search tool to find relevant information and then generate a response.
# Test the agent with a simple query response = restaurant_agent("I'm looking for a restaurant with good vegetarian options. Any recommendations?") print(response)
β Review the traces
After running the agent, you can review the traces in LangFuse:
- Go to the tracing menu in your LangFuse project
- Select the trace you want to view
- Examine how the agent processed the request, what tools it used, and what response it generated
This gives you visibility into how your agent is working and helps you identify any issues or areas for improvement.
Stay tuned for finale part: Evaluation with RAGAS (Retrieval Augmented Generation Assessment), where we'll dive deep into measuring and improving your agent's performance using systematic evaluation metrics!
π What's Next?
Evaluation with RAGAS part, we'll cover:
- Setting up RAGAS evaluation framework
- Measuring faithfulness and answer relevancy
- Automated performance assessment
- Creating feedback loops for continuous improvement
π Resources
- Getting Started with Strands Agents: Build Your First AI Agent - FREE course
- Strands Agent Documentation
- Part 1: Basic Multi-Modal Processing
- Complete Code Examples
- AWS Bedrock Documentation
- Getting Started with Strands Agents
Β‘Gracias!
π»πͺπ¨π± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr
Top comments (1)
Interesting, but when launched with agentcore launch it does not works, no traces are sent, all the traces go to aws xray claudwatch, any idea why?