DEV Community

Cover image for Building Strands Agents with a few lines of code: Implementing Observability with LangFuse
Elizabeth Fuentes L for AWS

Posted on • Edited on • Originally published at builder.aws.com

Building Strands Agents with a few lines of code: Implementing Observability with LangFuse

πŸ‡»πŸ‡ͺπŸ‡¨πŸ‡± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Getting Started with Strands Agents: Build Your First AI Agent - FREE course

GitHub repository

This third part of the Building Strands Agents series focuses on implementing observability with LangFuse to monitor your agents in real-time.

🎯 Why Observability and Evaluation Matter

When you deploy agents in production, you need to answer these questions: Does your agent respond accurately? How long do responses take? Where are the bottlenecks? Which conversations fail, and why?

Without proper Observability, you're flying blind. Your agents might be hallucinating, performing poorly, or wasting computational resources, and you won't know until users complain.

Observability Components

The Strands Agents SDK includes all observability APIs. The following are key observability data points:

Metrics - Essential for understanding agent performance, optimizing behavior, and monitoring resource usage.

Traces - A fundamental component of the Strands SDK's observability framework, providing detailed insights into your agent's execution.

Logs - Strands SDK uses Python's standard logging module to provide visibility into operations.

Evaluation - Essential for measuring agent performance, tracking improvements, and ensuring your agents meet quality standards. With Strands SDK, you can perform Manual Evaluation, Structured Testing, LLM Judge Evaluation, and Tool-Specific Evaluation.

OpenTelemetry Integration

Strands natively integrates with OpenTelemetry, an industry standard for distributed tracing. You can visualize and analyze traces using any OpenTelemetry-compatible tool. This integration provides:

  • Compatibility with existing observability tools: Send traces to platforms such as Jaeger, Grafana Tempo, AWS X-Ray, Datadog, and more.
  • Standardized attribute naming: Uses OpenTelemetry semantic conventions.
  • Flexible export options: Console output for development, OTLP endpoint for production.
  • Auto-instrumentation: The SDK creates traces automatically when you activate tracing.

πŸ½οΈπŸ”Observability and Evaluation with Restaurant Agent

This tutorial uses the 06_Observability_with_LangFuse_and_Evaluation_with_RAGAS.ipynb notebook to demonstrate building a restaurant recommendation agent with observability and evaluation capabilities. This tutorial is designed for developers new to AI agents, observability, and evaluation.

⭐ Based on the code from 08-observability-and-evaluation/Observability-and-Evaluation-sample.ipynb of the Strands Agents Samples repository

What you'll Build

You create these key components:

  1. Local Vector Database: A searchable collection of restaurant information that your agent can query
  2. Strands Agent: An AI assistant that can recommend restaurants based on user preferences.
  3. LangFuse: A tool that shows how your agent works and makes decisions.
  4. RAGAS: A framework that evaluates your agent's performance (covered in the next part).

πŸš€ Getting Started

Clone the sample repository:

git clone https://github.com/aws-samples/sample-getting-started-with-strands-agents-course cd Lab6 
Enter fullscreen mode Exit fullscreen mode

Create and activate a virtual environment:

python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate 
Enter fullscreen mode Exit fullscreen mode

Install the required packages:

pip install -r requirements.txt 
Enter fullscreen mode Exit fullscreen mode

Each package serves a specific purpose:

  • langchain: Helps us build applications with language models
  • langfuse: Provides observability for our agent
  • ragas: Helps us evaluate our agent's performance
  • chromadb: A database for storing and searching vector embeddings
  • docx2txt: Converts Word documents to text
  • boto3: AWS SDK for Python, used to access AWS services and Use Amazon Bedrock Models
  • strands: Framework for building AI agents

βœ… Create Vector Database from Restaurant Data

You'll create a vector database using restaurant data files in the restaurant-data folder. These files contain information about different restaurants, their menus, and specialties.

To complete this step, run the corresponding cells in the notebook

πŸ“Š Set Up Langfuse for Observability

LangFuse is an open-source observability platform specifically designed for LLM applications. It provides comprehensive tracking of your agent interactions, including:

  • Trace Analysis: Complete conversation flows from input to output
  • Performance Metrics: Response times, token usage, and cost tracking
  • Error Monitoring: Failed requests and exception handling
  • User Analytics: Conversation patterns and user engagement

The platform integrates seamlessly with popular frameworks and provides both hosted and self-hosted deployment options.

Follow the steps to create a new project in Langfuse

Setting Up LangFuse Integration

First, configure your agent to send traces to LangFuse:

from langfuse import Langfuse from strands import Agent, AgentConfig # Initialize LangFuse client langfuse = Langfuse( secret_key="your-secret-key", public_key="your-public-key", #host = "https://cloud.langfuse.com" # πŸ‡ͺπŸ‡Ί EU region  host = "https://us.cloud.langfuse.com" # πŸ‡ΊπŸ‡Έ US region ) # Create restaurant recommendation agent with observability restaurant_agent = Agent( name = "Restaurant Recommendation Agent", model=model, tools=[search_restaurants], # Give the agent access to our search tool  system_prompt="""You are a helpful restaurant recommendation assistant. Use the search_restaurants tool to find information about restaurants based on user queries. Provide detailed recommendations based on the search results. If asked about restaurants that aren't in the database, politely explain that you can only provide information about restaurants in your database. Always be friendly, helpful, and concise in your responses. """, record_direct_tool_call = True, # Record when tools are used  trace_attributes={ "session.id": str(uuid.uuid4()), # Generate a unique session ID  "user.id": "user-email-example@domain.com", # Example user ID  "langfuse.tags": [ "Agent-SDK-Example", "Strands-Project-Demo", "Observability-Tutorial" ] } ) 
Enter fullscreen mode Exit fullscreen mode

βœ… Test the Agent with Tracing

Now let's test our agent with a simple query and see how it performs. The agent will use the search tool to find relevant information and then generate a response.

# Test the agent with a simple query response = restaurant_agent("I'm looking for a restaurant with good vegetarian options. Any recommendations?") print(response) 
Enter fullscreen mode Exit fullscreen mode

βœ… Review the traces

After running the agent, you can review the traces in LangFuse:

  1. Go to the tracing menu in your LangFuse project

  1. Select the trace you want to view

  1. Examine how the agent processed the request, what tools it used, and what response it generated


This gives you visibility into how your agent is working and helps you identify any issues or areas for improvement.

Stay tuned for finale part: Evaluation with RAGAS (Retrieval Augmented Generation Assessment), where we'll dive deep into measuring and improving your agent's performance using systematic evaluation metrics!

πŸ”— What's Next?

Evaluation with RAGAS part, we'll cover:

  • Setting up RAGAS evaluation framework
  • Measuring faithfulness and answer relevancy
  • Automated performance assessment
  • Creating feedback loops for continuous improvement

πŸ“š Resources


Β‘Gracias!

GitHub repository

πŸ‡»πŸ‡ͺπŸ‡¨πŸ‡± Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Top comments (1)

Collapse
 
mauricio_carlezzo_08c9938 profile image
Mauricio Carlezzo

Interesting, but when launched with agentcore launch it does not works, no traces are sent, all the traces go to aws xray claudwatch, any idea why?