Skip to content

Conversation

@doronkopit5
Copy link
Member

@doronkopit5 doronkopit5 commented Nov 3, 2025

Fixes TLP-1126

Important

Introduces OpenTelemetry MCP server with multi-backend support, LLM-focused tools, and comprehensive configuration and testing.

  • Behavior:
    • Implements OpenTelemetry MCP server with Jaeger, Tempo, and Traceloop backends.
    • Supports CLI/HTTP transports, configuration loading, and LLM-focused tools (trace/span search, get-trace, LLM usage, model stats, expensive/slow traces, tool listing, error discovery).
  • Configuration:
    • config.py: Manages server and backend configurations using environment variables.
  • Models:
    • models.py: Defines data models for traces, spans, and filters.
  • Server:
    • server.py: Main entry point, initializes server and tools, handles backend creation.
  • Tools:
    • Implements tools for searching traces/spans, listing services/models/tools, and analyzing errors, expensive/slow traces, and LLM usage.
  • Utilities:
    • utils.py: Provides utility functions like ISO timestamp parsing.
  • Tests:
    • Adds tests for models and Traceloop backend in test_models.py and test_traceloop.py.

This description was created by Ellipsis for 76486fa. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features

    • Full MCP server with selectable Jaeger/Tempo/Traceloop backends, CLI/HTTP transports, and many LLM-focused tools (trace/span search, get-trace, LLM usage, model stats, expensive/slow traces, tool listing, error discovery).
  • Configuration

    • Added example environment file, project metadata, and pinned Python 3.12 runtime.
  • Documentation

    • Major README overhaul and new CLAUDE guide.
  • Docker / Local

    • Multi-stage Docker image, .dockerignore, healthcheck, and local startup script.
  • CI / Tests

    • New CI workflow and expanded test suite with fixtures.
doronkopit5 and others added 2 commits November 2, 2025 16:50
…figurations, core server logic, and tools for querying OpenTelemetry traces. Added support for Jaeger, Tempo, and Traceloop backends, along with data models and testing framework setup.
@doronkopit5 doronkopit5 marked this pull request as draft November 3, 2025 08:08
@coderabbitai
Copy link

coderabbitai bot commented Nov 3, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a new OpenTelemetry MCP Server package: project metadata, packaging, Docker/CI, docs, env config, typed models/attributes, a backend abstraction with Jaeger/Tempo/Traceloop implementations, a client-side FilterEngine, FastMCP server/tools, utilities, tests, and local/startup scripts.

Changes

Cohort / File(s) Summary
Infra & Editor
\.dockerignore, \.env.example, \.python-version, .github/workflows/ci.yml, \.vscode/settings.json
Add Docker ignore, example env vars for backend config, pin Python 3.12, CI workflow (lint/test/matrix build-test), and VSCode UI settings.
Project Build & Docker
pyproject.toml, Dockerfile, start_locally.sh
Add pyproject metadata, dependencies, console entry; multi-stage Dockerfile (builder/runtime, non-root user, healthcheck); local start script.
Documentation
README.md, CLAUDE.md
Add comprehensive README and CLAUDE.md describing architecture, backends, tools, configuration, and development workflow.
Package Init & Version
src/opentelemetry_mcp/__init__.py
Add package initializer with docstring and __version__ = "0.1.0".
Attributes & Models
src/opentelemetry_mcp/attributes.py, src/opentelemetry_mcp/models.py
Add SpanAttributes, SpanEvent, HealthCheckResponse, enums, Filter model/validation, SpanData, LLMSpanAttributes, UsageMetrics, TraceData, Trace/Span queries and summaries, computed properties and helpers.
Config
src/opentelemetry_mcp/config.py
Add BackendConfig and ServerConfig with dotenv loading, validators, from_env constructors, CLI override support and URL validation.
Backend Abstraction & Engine
src/opentelemetry_mcp/backends/__init__.py, .../base.py, .../filter_engine.py
Re-export BaseBackend; add abstract BaseBackend with lazy httpx client, async context management; add FilterEngine.apply_filters for client-side filtering.
Backends (impl.)
src/opentelemetry_mcp/backends/jaeger.py, .../tempo.py, .../traceloop.py
Implement JaegerBackend, TempoBackend (TraceQL + OTLP parsing), and TraceloopBackend (API v2, hybrid filtering, conversions, health).
Server & Tools
src/opentelemetry_mcp/server.py, src/opentelemetry_mcp/tools/*, src/opentelemetry_mcp/tools/__init__.py
Add FastMCP server, lazy backend factory, Click CLI entry, and many tools (search_traces, get_trace, get_llm_usage, list_services, find_errors, list_models, model_stats, expensive/slow traces, list_llm_tools, search_spans).
Utilities
src/opentelemetry_mcp/utils.py
Add parse_iso_timestamp utility with Z handling and clear error messages.
Tests & Fixtures
tests/*
Add test header, fixtures, and unit tests for models and Traceloop backend behavior (tests/conftest.py, tests/test_models.py, tests/test_traceloop.py).

Sequence Diagram(s)

sequenceDiagram autonumber actor User participant MCP as MCP Server (FastMCP) participant Config as ServerConfig participant Backend as Backend (Jaeger/Tempo/Traceloop) participant Filter as FilterEngine User->>MCP: invoke tool (e.g. search_traces) MCP->>Config: load config / apply CLI overrides MCP->>Backend: lazy instantiate & health_check Backend-->>MCP: health response MCP->>Backend: search_traces(query) Backend->>Backend: split native vs client filters Backend->>Backend: call provider API (native filters) Backend-->>MCP: TraceData list MCP->>Filter: apply_filters(traces, client_filters) Filter-->>MCP: filtered traces MCP-->>User: return JSON result 
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Areas needing extra attention:
    • src/opentelemetry_mcp/backends/{jaeger.py,tempo.py,traceloop.py} — API mappings, pagination, query translation, parsing, hybrid filtering, error handling.
    • src/opentelemetry_mcp/backends/filter_engine.py — operator semantics, dotted-field extraction, type conversions, EXISTS/NOT_EXISTS logic.
    • src/opentelemetry_mcp/models.py & src/opentelemetry_mcp/attributes.py — Pydantic validators, aliasing, LLM token extraction, computed properties.
    • src/opentelemetry_mcp/server.py — lazy backend lifecycle across event loops, health-check semantics, and tool error handling.
    • CI/Docker workflow and Dockerfile multi-stage build specifics.

Possibly related PRs

Poem

🐇 I hopped through traces, tidy and quick,
I counted tokens, parsed spans, did the trick.
Jaeger, Tempo, Traceloop lined in a row,
I stitched their whispers into one bright flow.
A rabbit cheers — the traces grow 🌼

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Initial commit of OpenTelemetry MCP Server' accurately describes the PR's primary purpose: introducing the initial implementation of an OpenTelemetry MCP Server with multiple backends and tools.
Linked Issues check ✅ Passed The PR implements the complete initial server setup from TLP-1126, including multi-backend support (Jaeger, Tempo, Traceloop), LLM-focused tools, configuration management, models, and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are within scope: infrastructure files (.dockerignore, Dockerfile, CI), configuration (.env.example, config.py), core implementation (models, backends, tools, server), and supporting files (documentation, tests, scripts). No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 98.46% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dk/initial-server

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 09e6e55 in 2 minutes and 56 seconds. Click for details.
  • Reviewed 5352 lines of code in 27 files
  • Skipped 1 files when reviewing.
  • Skipped posting 17 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/openllmetry_mcp/backends/tempo.py:137
  • Draft comment:
    Mismatch in key names when extracting operations. The code checks for 'rootServiceName' but then retrieves 'rootTraceName'; verify the intended key.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% Looking at lines 138-139, the code checks if "rootServiceName" in trace_result: and then does operations.add(trace_result.get("rootTraceName", "")). This is indeed suspicious - why check for the existence of one key but then retrieve a different key? This could be intentional if the author wants to only add operation names when a rootServiceName exists (perhaps to filter out certain traces), but it could also be a bug. The logic seems odd: if rootServiceName exists, add the rootTraceName. Without seeing the Tempo API documentation or the actual response format, I cannot be 100% certain this is wrong. However, the comment is pointing out a legitimate code smell that warrants investigation. The comment asks to "verify the intended key" which is asking the author to confirm/explain, which violates the rules about not asking authors to confirm intentions. The comment is phrased as "verify the intended key" which is asking the author to confirm their intention, which violates the rule against asking authors to explain or double-check. Additionally, without seeing the actual Tempo API response format, I cannot be certain this is a bug - it could be intentional logic where the presence of rootServiceName is used as a filter condition before extracting rootTraceName. While the phrasing does ask for verification, the underlying issue it identifies (checking for one key but retrieving another) is a legitimate code smell that could indicate a bug. However, since I cannot definitively prove this is wrong without more context about the Tempo API, and the comment is phrased as a request for verification rather than stating a clear issue, it should be deleted per the rules. The comment should be deleted because it asks the author to "verify" rather than stating a definitive issue, and without access to Tempo API documentation or other context, I cannot confirm this is definitely a bug versus intentional filtering logic.
2. pyproject.toml:2
  • Draft comment:
    Typo alert: The project name "openllmetry-mcp" on line 2 might be a typo. Should it be "opentelemetry-mcp"? Please verify the intended name.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to verify if "openllmetry" is a typo for "opentelemetry". However, the name appears consistently 5+ times throughout the file in different contexts (project name, description, script name, package references). The description explicitly mentions "OpenLLMetry support", which suggests this is a specific product/project name, not OpenTelemetry. This appears to be intentional branding - "OpenLLMetry" seems to be a specific tool/product related to OpenTelemetry but with its own name. The comment is speculative and asks the author to verify, which violates the rules. Could "OpenLLMetry" be a lesser-known variant or typo that was propagated throughout the file? Maybe the author made the same mistake in multiple places and this comment would catch it. While it's theoretically possible the typo was repeated, the fact that it appears in the description as "OpenLLMetry support" (capitalized as a proper noun) and is used consistently in 5+ different contexts strongly indicates this is intentional branding. Additionally, the comment asks the author to "verify", which is explicitly against the rules - comments should not ask for verification or confirmation. This comment should be deleted. It's a speculative comment that asks the author to verify their intention, which violates the rules. The consistent use of "openllmetry" throughout the file (including in the description as a proper noun) provides strong evidence this is intentional naming, not a typo.
3. pyproject.toml:4
  • Draft comment:
    The description on line 4 mentions "OpenTelemetry" traces and "OpenLLMetry support". This inconsistency may be a typographical error. Please check the intended naming.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment assumes "OpenLLMetry" is a typo of "OpenTelemetry", but this appears to be incorrect. OpenLLMetry seems to be the actual name of the project/tool (as evidenced by the package name "openllmetry-mcp" and module name "openllmetry_mcp"). The description is saying this MCP server queries OpenTelemetry traces from LLM applications that have OpenLLMetry support - meaning it works with the OpenTelemetry standard and specifically supports the OpenLLMetry implementation. This is not a typo or inconsistency, it's intentional naming. The comment is asking the author to "check the intended naming" which violates the rule about not asking authors to confirm their intention. I might be missing context about whether OpenLLMetry is actually a real project or if it's indeed a typo. However, given that the entire project is named "openllmetry-mcp" consistently throughout, it seems very unlikely this is a typo. Even if I'm uncertain about the external context, the consistent use of "openllmetry" throughout the entire file (project name, module name, coverage settings) provides strong evidence this is intentional. The comment is speculative ("may be a typographical error") and asks the author to verify, which violates the rules. This comment should be deleted. It incorrectly assumes "OpenLLMetry" is a typo when it's actually the intentional project name used consistently throughout the file. The comment is speculative and asks the author to verify their intention, which violates the review rules.
4. pyproject.toml:31
  • Draft comment:
    Typo alert: The script name "openllmetry-mcp" on line 31 seems inconsistent with the repository name "opentelemetry-mcp-server". Confirm if the spelling is intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "confirm if the spelling is intentional" - this is exactly the type of comment the rules say to avoid. The rules explicitly state "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". Additionally, the naming is actually consistent within the file itself - "openllmetry" appears multiple times (project name, script name, module name, description). The comment references a "repository name" that isn't visible in the diff, which means it's making assumptions about external context. This is speculative and not based on the actual code changes. Perhaps the repository name really is "opentelemetry-mcp-server" and there genuinely is an inconsistency that would confuse users. The comment might be catching a legitimate naming issue that could cause problems with installation or discovery of the package. Even if there is an inconsistency with the repository name, the comment is asking the author to "confirm" rather than stating a clear problem. The naming within the file itself is completely consistent. Without seeing the repository name in the diff, I cannot verify this claim, and per the rules, I should not keep speculative comments that require external context. This comment should be deleted. It asks the author to confirm their intention rather than pointing out a clear issue, and it references information (repository name) that isn't visible in the diff. The naming within the file is internally consistent.
5. src/openllmetry_mcp/__init__.py:1
  • Draft comment:
    Typographical error: The docstring reads "OpenLLMetry MCP Server"; it seems you intended to write "OpenTelemetry MCP Server". Please update the spelling accordingly.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The module is literally named openllmetry_mcp (visible in the file path), so "OpenLLMetry" is clearly the intended project name, not a typo. The docstring correctly uses both "OpenLLMetry" (the project name) and "OpenTelemetry" (the technology it works with). The comment is fundamentally wrong - it's asking to change the project's name to something else. This violates the rule about not making comments unless there's clearly a code change required, and this is definitely not a required change. Could "OpenLLMetry" actually be a typo that was propagated to the module name as well? Maybe the entire project was misnamed from the start? While it's theoretically possible the entire project is misnamed, that would be a much larger architectural decision beyond the scope of a single line comment on a PR. The consistent use of "OpenLLMetry" in the module path strongly suggests this is intentional branding. The automated tool is almost certainly wrong here. This comment should be deleted. "OpenLLMetry" is clearly the intentional project name (as evidenced by the module path), not a typo for "OpenTelemetry". The comment is incorrect and would ask the author to change their project's name unnecessarily.
6. src/openllmetry_mcp/attributes.py:12
  • Draft comment:
    The docstring simultaneously references both "OpenTelemetry semantic conventions" and "OpenLLMetry standard" (e.g., in the support for gen_ai.* attributes). Please clarify and standardize the naming to avoid confusion.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
7. src/openllmetry_mcp/backends/__init__.py:1
  • Draft comment:
    Typo: The docstring refers to "OpenTelemetry" while the project and module names use "OpenLLMetry." For consistency and clarity, please update the docstring to "Backend implementations for different OpenLLMetry trace storage systems."
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is about documentation/naming consistency. The project is clearly named "openllmetry" but the docstring refers to "OpenTelemetry". However, OpenTelemetry is an actual technology/standard that this project likely uses or wraps. OpenLLMetry might be a wrapper around OpenTelemetry for LLM observability. The docstring might be technically correct - these backends could be for different OpenTelemetry trace storage systems, even if the wrapper is called OpenLLMetry. Without more context about what OpenLLMetry is and its relationship to OpenTelemetry, I cannot be certain this is actually a typo or inconsistency. This seems like a minor documentation preference issue that lacks strong evidence of being incorrect. I might be missing that OpenLLMetry is indeed meant to be a distinct branding from OpenTelemetry, and using the wrong term could confuse users. The comment could be highlighting a legitimate branding/consistency issue that should be fixed. While there could be a branding concern, OpenTelemetry is likely the underlying technology being used, so the docstring might be technically accurate. This is a subjective documentation preference without strong evidence it's wrong. The rules state not to make comments that are "obvious or unimportant" and to only keep comments with "STRONG EVIDENCE" of correctness. This comment is about a minor documentation preference regarding terminology. Without strong evidence that "OpenTelemetry" is incorrect (it's likely the underlying technology), and given this is a subjective style issue rather than a clear bug, this comment should be deleted.
8. src/openllmetry_mcp/backends/base.py:1
  • Draft comment:
    The module and repository names use openllmetry (with double 'll'), yet the docstring on this line refers to "OpenTelemetry trace storage systems". Please confirm which naming convention is intended and standardize the spelling.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the PR author to "confirm which naming convention is intended" - this is explicitly asking for confirmation/explanation, which violates the rule "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". The comment doesn't definitively state there's an error - it's speculative about whether there's a naming inconsistency. "OpenLLMetry" appears to be the project name (a play on OpenTelemetry for LLM observability), while "OpenTelemetry" is the standard being used. This is likely intentional branding, not an error. The comment doesn't provide a clear, actionable code change. Could "openllmetry" actually be a typo for "OpenTelemetry"? Maybe the project name should consistently use one or the other. Without seeing the rest of the codebase or project documentation, I can't be 100% certain this is intentional branding versus an actual inconsistency that needs fixing. Even if there is some inconsistency, the comment is phrased as "Please confirm which naming convention is intended" which is explicitly asking for confirmation rather than stating a definite issue. This violates the rules. If the tool was confident there was an error, it should have said "Change X to Y" not "confirm which is intended". This comment should be deleted because it asks the PR author to confirm their intention rather than identifying a definite issue requiring a code change. The phrasing "Please confirm which naming convention is intended" violates the rule against asking authors to verify or explain their choices.
9. src/openllmetry_mcp/backends/jaeger.py:1
  • Draft comment:
    There's a potential typo in the package name: the module is under openllmetry_mcp (e.g. imported on line 7) but the repository is named opentelemetry-mcp-server and the commit title mentions "OpenLLMetry". Please verify if openllmetry is intentional or if it should be corrected to opentelemetry.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the author to "verify" if the naming is intentional. This violates the rule about not asking the PR author to confirm their intention or to double-check things. The package name openllmetry_mcp appears to be used consistently throughout the imports in this file. Without seeing other files in the codebase or the actual repository structure, I cannot determine if this is actually a problem. The comment is speculative - it's suggesting there might be an issue based on the repository name, but the code itself is internally consistent. This is exactly the type of "verify that..." comment that should be deleted according to the rules. However, if there truly is a mismatch between the package name used in imports and the actual package structure, this could be a real issue that would cause import errors. The comment might be catching a legitimate problem where the package was incorrectly named. While a package naming issue could be real, this would be caught immediately by the build system when trying to import these modules. The rules explicitly state "Do NOT comment on anything that would be obviously caught by the build, such as variable renames, file renames, or changed imports." If the package name is wrong, the imports will fail and the build will catch it. Additionally, the comment is phrased as "please verify if..." which is explicitly discouraged. This comment should be deleted because it asks the author to verify/confirm something, and any actual import/package naming issue would be caught by the build system. The comment is speculative and doesn't provide strong evidence of a problem.
10. src/openllmetry_mcp/backends/traceloop.py:1
  • Draft comment:
    The module docstring on line 1 refers to "OpenLLMetry" traces, but later in the class docstring (line 19) the term "OpenTelemetry" is used. This inconsistency might confuse readers. Please standardize the naming to clarify the intended terminology.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a documentation consistency comment about terminology. The comment is technically correct - there is an inconsistency. However, looking more carefully at line 19-20, it says "OpenTelemetry traces with OpenLLMetry semantic conventions" which is actually more precise and technically accurate than just saying "OpenLLMetry traces". OpenTelemetry is the underlying tracing standard, and OpenLLMetry adds semantic conventions on top. This is a minor documentation wording issue that doesn't affect functionality. According to the rules, I should not keep comments that are "obvious or unimportant" and this seems to fall into that category - it's a very minor documentation wording preference that doesn't require a code change. The terminology difference might actually be intentional and more accurate - "OpenTelemetry traces with OpenLLMetry semantic conventions" is technically more precise than just "OpenLLMetry traces". This could be a case where the inconsistency is actually providing useful clarification rather than confusion. While the critique is valid that the longer description might be more technically accurate, the comment is still about a very minor documentation wording preference that doesn't require any functional code change. This falls under "obvious or unimportant" comments that should be removed. This comment should be deleted. It's about a minor documentation wording preference that doesn't affect functionality. The "inconsistency" may actually be intentional to provide more technical precision in the class docstring. This is too minor and unimportant to warrant a review comment.
11. src/openllmetry_mcp/config.py:14
  • Draft comment:
    Typographical inconsistency: The file-level docstring and overall naming use "OpenLLMetry" (as in the PR title), but the BackendConfig docstring refers to "OpenTelemetry". Please align the product name for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is about a very minor naming/documentation inconsistency. The distinction between "OpenTelemetry" and "OpenLLMetry" is actually meaningful - OpenTelemetry is the standard tracing protocol/technology, while OpenLLMetry is the product name. The docstring is technically correct as it's configuring an OpenTelemetry backend. This is a purely informative comment about a minor stylistic choice, not a code change that's clearly required. According to the rules, I should not comment unless there is clearly a code change required, and I should not make comments that are obvious or unimportant. This falls into the category of an unimportant stylistic suggestion. Could the inconsistency actually be intentional and technically accurate? OpenTelemetry is the actual technology standard being used, so referring to it as "OpenTelemetry trace backend" might be more precise than "OpenLLMetry trace backend". The critique is valid - this is likely intentional. OpenLLMetry is the product/server name, but it's configuring an OpenTelemetry backend. The current naming is technically accurate and the suggested change might actually make it less precise. This is definitely not a required code change. This comment should be deleted. It's a minor stylistic suggestion about documentation that isn't clearly required. The current naming is technically accurate (OpenTelemetry is the underlying technology), and this type of purely informative comment about minor inconsistencies violates the rule against comments that aren't clearly necessary.
12. src/openllmetry_mcp/tools/__init__.py:1
  • Draft comment:
    There appears to be an inconsistency in the naming: the commit title and directory use 'OpenLLMetry' (with double 'L') while the docstring mentions 'OpenTelemetry'. Please verify and adjust the docstring to maintain consistent naming across the project.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is tricky. OpenLLMetry and OpenTelemetry are actually different things - OpenTelemetry is a well-known observability framework, while OpenLLMetry appears to be a project built on top of it. The docstring might be intentionally saying "querying OpenTelemetry traces" because that's what the tools actually do - they query traces that are in OpenTelemetry format. The comment assumes there's an inconsistency, but this could be intentional and correct. Without seeing the broader codebase or understanding the project's purpose, I cannot determine if this is truly an error. The comment is speculative and asks the author to "verify" which violates the rules. However, it's possible that this really is a typo and the docstring should say "OpenLLMetry traces" to match the project name. The comment does provide a concrete suggestion for a code change, which makes it actionable rather than purely informative. While the comment is actionable, it's fundamentally speculative because OpenTelemetry and OpenLLMetry could both be correct in this context. The comment explicitly asks to "verify" which violates the rules. Without strong evidence that this is wrong, and given that OpenTelemetry is a real framework that this project likely integrates with, I should delete this comment. Delete this comment. It's speculative and asks the author to "verify" rather than pointing out a definite error. OpenTelemetry is likely the correct term here as it's a real observability framework, and the tools may genuinely be querying OpenTelemetry traces.
13. tests/__init__.py:1
  • Draft comment:
    Typographical error: It looks like "OpenLLMetry" might be a typo. Consider replacing it with "OpenTelemetry".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% "OpenLLMetry" appears to be a portmanteau of "OpenTelemetry" and "LLM" (Large Language Model), which would make sense for a project related to LLM observability or telemetry. This is likely an intentional name, not a typo. The comment is speculative and assumes the author made a mistake without strong evidence. According to the rules, I should not keep speculative comments and should only keep comments with strong evidence of being correct. I might be missing context about the project's actual name or documentation that would confirm this is indeed a typo. Perhaps there are other files in the codebase that use "OpenTelemetry" consistently and this is the only instance of "OpenLLMetry". Even if other files use "OpenTelemetry", the fact that this appears to be a deliberate naming choice (combining OpenTelemetry with LLM concepts) and the lack of strong evidence that it's a typo means I should delete this comment. The rules state I need STRONG EVIDENCE to keep a comment, and speculation is not sufficient. This comment should be deleted because it's speculative and lacks strong evidence that "OpenLLMetry" is a typo rather than an intentional project name. The comment assumes an error without confirmation.
14. tests/conftest.py:5
  • Draft comment:
    Typo: The module name 'openllmetry_mcp' appears in the import statement on this line. Double-check if this is the intended spelling (should it be 'opentelemetry_mcp'?)
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is speculative - it's asking the author to "double-check" if the spelling is correct. The tool is guessing that it might be a typo based on similarity to "opentelemetry", but there's no strong evidence that this is wrong. The name openllmetry_mcp could be intentional (perhaps "open LLM etry" for LLM telemetry). If this were actually a typo causing an import error, it would be caught by the build/tests immediately. The comment violates the rule about not making speculative comments and not asking the author to confirm their intention. It also would be caught by the build if it were wrong. Could the module name actually be wrong? Maybe the author made a typo that would cause import failures. However, if the imports are wrong, the tests would fail immediately when run, so this would be caught by the build. Even if this is a typo, it would be immediately caught when running the tests or any build process that imports this file. The rules explicitly state not to comment on things that would be obviously caught by the build, such as imports. This is a speculative comment asking for confirmation rather than pointing out a definite issue. This comment should be deleted. It's speculative, asks the author to double-check/confirm their intention, and any import error would be immediately caught by the build. There's no strong evidence that this is actually wrong.
15. tests/conftest.py:6
  • Draft comment:
    Typo: The module name 'openllmetry_mcp' appears in the import statement on this line. Please verify if the correct spelling is 'opentelemetry_mcp'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is asking the author to "verify" if the spelling is correct, which violates the rule about not asking the PR author to confirm their intention or double-check things. Additionally, this is speculative - there's no evidence in the diff that openllmetry_mcp is wrong. It could be intentional (perhaps "openllmetry" is a portmanteau of "open" + "LLM" + "telemetry"). If this were actually a typo, it would be caught by the build system when the imports fail. The rules explicitly state not to comment on things that would be obviously caught by the build, such as changed imports. However, if this is genuinely a typo and the module should be opentelemetry_mcp, catching it now could save time. The name does seem unusual compared to the standard OpenTelemetry naming convention. Even if it seems unusual, the rules are clear: don't ask the author to verify or confirm things, and don't comment on import issues that would be caught by the build. If the import is wrong, the tests won't run. The comment provides no strong evidence that this is actually incorrect. This comment should be deleted. It asks the author to "verify" something (which violates the rules), is speculative without evidence, and addresses an import issue that would be caught by the build if it were actually wrong.
16. tests/test_traceloop.py:7
  • Draft comment:
    Typo: The module name openllmetry_mcp may be misspelled. Did you mean opentelemetry_mcp?
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The automated tool is suggesting that openllmetry_mcp might be misspelled as opentelemetry_mcp. However, this appears to be speculation. The name openllmetry could be intentional - it might stand for "Open LLM Telemetry" or similar, which would make sense given the context of tracing LLM calls (note the references to "gen_ai.system", "gen_ai.request.model", etc. in the test). The module name is used consistently. Without seeing other files in the codebase or package configuration, I cannot determine if this is actually a typo. The comment is speculative and would require the PR author to confirm, which violates the rules. I might be missing context from other files in the codebase that would show whether this module name is correct or not. Perhaps there's a package.json, setup.py, or other configuration file that would clarify the intended name. Even if other files might provide context, the rule states I should only keep comments with STRONG EVIDENCE that they are correct. Since I cannot see that evidence in the diff, and the comment is purely speculative, I should delete it. The rule explicitly states to ignore cross-file issues and only think about the file being reviewed. This comment is speculative without strong evidence. The module name openllmetry_mcp could be intentional (possibly "Open LLM Telemetry MCP"). The comment asks the author to confirm/verify, which violates the rules. I should delete this comment.
17. uv.lock:2981
  • Draft comment:
    Typo: The package name "openllmetry-mcp" appears in the lock file. Did you mean "opentelemetry-mcp"?
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_yxYR2psBQsR5AARE

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@doronkopit5 doronkopit5 marked this pull request as ready for review November 10, 2025 20:38
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 2911eba in 3 minutes and 12 seconds. Click for details.
  • Reviewed 8026 lines of code in 35 files
  • Skipped 1 files when reviewing.
  • Skipped posting 15 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. .github/workflows/ci.yml:18
  • Draft comment:
    Using version "latest" for the setup-uv action may lead to non‐deterministic builds. Consider pinning a specific version for reproducibility.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This comment is about a best practice for CI/CD - pinning versions for reproducibility. However, I need to consider the rules carefully. The comment is technically correct that using "latest" can lead to non-deterministic builds. But looking at the rules: "Do NOT comment unless there is clearly a code change required" and "Do NOT make speculative comments". This feels like a best practice suggestion rather than a clear bug or issue. The code will work fine with "latest", it's just not the most reproducible approach. Also, the comment is somewhat speculative about "may lead to" issues. The rules say comments suggesting code quality refactors are good if actionable and clear, which this is. But the rules also emphasize only keeping comments with STRONG EVIDENCE of correctness and clear necessity. This is more of a "nice to have" best practice rather than a critical issue. This could be considered a legitimate code quality suggestion that is actionable and clear. Version pinning is a well-established best practice in CI/CD, and the comment provides a specific alternative. The phrase "may lead to" could be seen as speculative, but it's also a documented concern in CI/CD practices. While version pinning is indeed a best practice, the rules emphasize not making comments unless there's clearly a code change REQUIRED. The use of "may lead to" indicates this is not a definite issue but a potential one. The code works as-is, and this is more of an optimization/best practice suggestion rather than a bug fix. Given the instruction to err on the side of deleting comments unless there's STRONG EVIDENCE they're necessary, this falls into the category of "nice to have" rather than "must fix". This comment should be deleted. While it represents a valid best practice, it's not a required code change - the workflow will function correctly with "latest". The speculative language ("may lead to") and the fact that this is an optimization rather than a bug fix means it doesn't meet the threshold of STRONG EVIDENCE for keeping.
2. src/opentelemetry_mcp/config.py:94
  • Draft comment:
    In apply_cli_overrides, directly calling HttpUrl(backend_url) may not properly parse/validate the URL. Consider using a dedicated parsing method for URL conversion.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid concern. In Pydantic v2, HttpUrl is not meant to be instantiated directly like HttpUrl(backend_url). This pattern works in some cases but is not the recommended approach. The proper way would be to use pydantic.TypeAdapter(HttpUrl).validate_python(backend_url) or to use model validation. However, I need to consider: 1) This is a new file, so the comment is about new code. 2) The comment is somewhat vague ("may not properly parse/validate") rather than definitively stating it's wrong. 3) Looking at the existing code at line 50, they use a different pattern (passing string directly to model constructor with type: ignore). 4) The comment does suggest an actionable change, though it's not specific about what "dedicated parsing method" means. 5) Whether HttpUrl(backend_url) actually works depends on the Pydantic version being used. The comment is speculative ("may not properly parse/validate") rather than definitively stating there's a bug. Without knowing the exact Pydantic version or testing the code, I cannot be certain this will fail. The comment also doesn't provide a specific solution, just vaguely suggests "a dedicated parsing method." This violates the rule about not making speculative comments. While the comment is somewhat speculative, it does point to a real inconsistency in the code - line 50 uses one pattern (passing string to model constructor) while line 95 uses another pattern (calling HttpUrl directly). However, the comment doesn't clearly explain what's wrong or provide a concrete fix. Given the rule to only keep comments with STRONG EVIDENCE of correctness, and this being speculative language ("may not"), I should delete it. The comment uses speculative language ("may not properly parse/validate") and doesn't provide concrete evidence that the code is broken. While there may be a better way to handle URL validation in Pydantic, without strong evidence that this approach is incorrect, the comment should be deleted per the rules about avoiding speculative comments.
3. src/opentelemetry_mcp/server.py:33
  • Draft comment:
    Global mutable state (_backend, _config) is used for backend initialization. Consider encapsulating backend creation within a class or context to avoid side‐effects in concurrent environments.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid architectural concern about global mutable state, but I need to consider: 1) Is this a NEW file (yes, it is - the diff shows it's being added), so the comment is about new code. 2) Does the comment identify a DEFINITE issue or is it speculative? The phrase "Consider encapsulating" suggests it's a recommendation, not a bug report. 3) Is there STRONG EVIDENCE of a concurrency problem? The code uses lazy initialization with a None check, which could have race conditions if multiple coroutines call _get_backend() simultaneously before _backend is set. However, I don't see evidence that FastMCP would call multiple tools concurrently in a way that would trigger this. 4) The rules say "Do NOT make speculative comments" and "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This seems like a speculative architectural suggestion rather than a clear bug. The comment might be identifying a real race condition - if two async tool calls happen simultaneously before _backend is initialized, both could pass the "if _backend is None" check and create two backend instances. This is a legitimate concurrency concern in async Python code. The suggestion to encapsulate in a class or context is actionable. While there could theoretically be a race condition, there's no evidence that FastMCP actually calls tools concurrently in a way that would trigger this issue. The comment uses "Consider" language which makes it a suggestion rather than identifying a definite bug. Without seeing how FastMCP handles concurrent requests or evidence that this pattern causes problems, this is speculative. The rules explicitly state to avoid speculative comments like "If X, then Y is an issue" - only comment if it's definitely an issue. This comment should be deleted. It's a speculative architectural suggestion about potential concurrency issues without strong evidence that the pattern used here will actually cause problems. The code appears intentionally designed this way for event loop management, and there's no clear indication that FastMCP's usage pattern would trigger race conditions.
4. tests/test_traceloop.py:60
  • Draft comment:
    The test_build_filters_for_search manually constructs filters. To improve robustness, consider using the backend's built‐in filter conversion logic instead of manually building expected dictionaries.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
5. start_locally.sh:15
  • Draft comment:
    The startup script properly checks for 'uv'. Ensure that the chosen backend configuration (Traceloop here) is documented clearly for developers.
  • Reason this comment was not posted:
    Confidence changes required: 20% <= threshold 50% None
6. tests/test_models.py:35
  • Draft comment:
    Tests for data models look solid; the use of 'model_validate' to create SpanAttributes is a concise approach.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
7. src/opentelemetry_mcp/models.py:55
  • Draft comment:
    In the Filter model validator, ensure that error messages use consistent quoting and that the checks fully cover edge cases for operators (e.g. BETWEEN requires exactly 2 values).
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
8. README.md:1
  • Draft comment:
    Typographical Inconsistency: The header on line 1 reads "OpenTelemetry-MCP-Server" while subsequent mentions (e.g., in the feature list and commands) use "OpenLLMetry". Please ensure the project name is consistent throughout the README.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment suggests changing "OpenTelemetry-MCP-Server" to "OpenLLMetry-MCP-Server". However, looking at the context, this seems incorrect. OpenTelemetry is the standard tracing protocol, and OpenLLMetry is a specific library that instruments LLMs using OpenTelemetry. The project appears to be a general OpenTelemetry MCP server that happens to have special support for OpenLLMetry conventions. Line 2 says it's for "querying OpenTelemetry traces across multiple backends" - this is the correct scope. The comment seems to misunderstand that these are two different but related concepts, not a typo. The actual inconsistency might be between the title and the command/directory names (openllmetry-mcp vs OpenTelemetry-MCP-Server), but the comment is suggesting the wrong fix. I might be wrong about the author's intent. Perhaps the project really is meant to be called "OpenLLMetry-MCP-Server" since the directory is "openllmetry-mcp" and all commands use "openllmetry-mcp". Maybe the title on line 1 is actually the mistake, and it should match the directory/command naming. The comment might be correct but for the wrong reasons. Even if there is a naming inconsistency, the comment is speculative about what the "correct" name should be. The comment assumes the title should be changed to match one usage pattern, but it could equally be that the directory/commands should be renamed, or that both names are intentionally different (OpenTelemetry for the broader scope, openllmetry for the specific implementation). Without seeing other files or understanding the author's intent, this is not clearly actionable. The comment is asking the author to "ensure consistency" which is essentially asking them to verify/confirm their intention - this violates the rule about not asking authors to confirm or explain. This comment should be deleted. It's speculative about whether there's actually an inconsistency (OpenTelemetry and OpenLLMetry may be intentionally different), and it's essentially asking the author to verify their naming choices rather than pointing to a clear error. The suggested fix may not be correct.
9. pyproject.toml:4
  • Draft comment:
    Typographical inconsistency: In the description, 'OpenTelemetry' is correctly capitalized in one instance, while 'Opentelemetry' in 'with Opentelemetry support' does not match. Please use consistent capitalization (e.g., 'OpenTelemetry') throughout.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and focuses on a typographical inconsistency in the PR description. It does not provide a code suggestion or address a potential issue in the code itself. According to the rules, comments should not ask the PR author to update the PR description or focus on purely informative aspects.
10. src/opentelemetry_mcp/backends/tempo.py:218
  • Draft comment:
    Typo: In the 'get_service_operations' method, the code checks for the key 'rootServiceName' but then retrieves the value using 'rootTraceName'. Please verify the correct field name.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment flags a potential inconsistency between checking for "rootServiceName" and retrieving "rootTraceName". Looking at the context, this is in get_service_operations which should return operation names, not service names. The method filters traces by service_name in the TraceQL query (line 206), so all results should be for that service. The check on line 218 might be validating that the trace result has a service name before extracting the operation name. However, it's also possible this is a copy-paste error from list_services where the check makes more sense. Without seeing the actual Tempo API response format, I cannot definitively say if this is a bug. The comment asks to "verify" which violates the rule about not asking for verification. The comment uses "Please verify" language which is explicitly against the rules - we should not ask the PR author to verify or confirm things. Additionally, without documentation of the Tempo API response format, I cannot be certain this is actually a bug. It could be intentional defensive programming to check for service name existence before extracting operation name. While the "verify" language is problematic, there is a reasonable chance this is a legitimate bug - it seems odd to check for rootServiceName when we're trying to extract rootTraceName. However, it could also be intentional validation logic. The uncertainty and the "verify" language both suggest this comment should be deleted. Delete this comment because it asks the author to "verify" which violates the rules, and because without seeing the actual API response format or having stronger evidence, we cannot be certain this is actually a bug versus intentional validation logic.
11. src/opentelemetry_mcp/backends/traceloop.py:1
  • Draft comment:
    Typo notice: The docstring refers to "Opentelemetry traces". The standard spelling is "OpenTelemetry". Please verify if this should be updated for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a comment about a typo/capitalization issue. According to the rules, I should not comment on things that are "obvious or unimportant." A capitalization inconsistency in a docstring could be considered minor. However, the comment is pointing out a legitimate inconsistency - "OpenTelemetry" is the official project name and should be capitalized consistently. The comment is actionable (fix the capitalization) and clear. But is this too minor? The rules say "Do NOT make comments that are obvious or unimportant." Typos in documentation could be considered important for professionalism, but this is borderline. The comment also asks to "verify if this should be updated" which sounds like asking the author to confirm, which violates the rule about not asking authors to confirm things. The comment asks the author to "verify if this should be updated" which is asking for confirmation rather than stating a clear issue. This violates the rule about not asking authors to confirm their intention. Also, a simple capitalization fix in a docstring might be too minor to warrant a review comment. While the phrasing does ask for verification, the core issue is valid - "OpenTelemetry" is a proper noun and should be capitalized consistently. However, the "please verify" language makes this more of a suggestion than a clear actionable item. Given the rules emphasize not making obvious or unimportant comments, and not asking for confirmation, this comment should likely be removed. The comment identifies a real capitalization inconsistency, but it asks the author to "verify" rather than stating it as a clear issue. Given the rules against asking for confirmation and against obvious/unimportant comments, this should be deleted.
12. src/opentelemetry_mcp/config.py:1
  • Draft comment:
    Typo: The module docstring uses "Opentelemetry" while the BackendConfig docstring uses "OpenTelemetry". Consider standardizing to "OpenTelemetry" across the file for consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This is a comment about a typo/inconsistency in documentation strings. The comment is technically correct - there is an inconsistency. However, I need to consider: (1) Is this about a code change? Yes, this is a new file being added. (2) Is this actionable? Yes, it's a simple fix. (3) Is this important enough? This is debatable - it's just a docstring capitalization issue. The rules say "Do NOT make comments that are obvious or unimportant." A typo in a docstring could be considered unimportant, but consistency in naming (especially for proper nouns like "OpenTelemetry") is generally considered good practice. The comment is also very clear and actionable. This might be considered too minor or obvious - it's just a capitalization issue in a docstring. The rules explicitly say not to make comments that are "obvious or unimportant", and a docstring typo could fall into that category. It doesn't affect functionality at all. While it's minor, proper noun capitalization in documentation is a reasonable quality concern, especially for a project name. It's not just a random typo but an inconsistency within the same file. However, given the emphasis on only keeping comments with STRONG EVIDENCE of being important, and the rule against obvious/unimportant comments, this leans toward deletion. This comment is technically correct and actionable, but it's about a minor documentation inconsistency that doesn't affect functionality. Given the rules emphasizing not to comment on obvious or unimportant issues, and the instruction to err on the side of deletion unless there's strong evidence the comment is important, this should likely be deleted.
13. src/opentelemetry_mcp/models.py:1
  • Draft comment:
    Typographical consistency: The module docstring uses both 'OpenTelemetry' and 'Opentelemetry' (e.g., in "Data models for OpenTelemetry traces and Opentelemetry conventions"). Consider standardizing the capitalization for clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a comment about documentation style and consistency, not about functional code. The rules state "Do NOT comment unless there is clearly a code change required" and "Do NOT make comments that are obvious or unimportant." While the inconsistency is real, this is a very minor documentation style issue that doesn't affect functionality. The comment is essentially a nitpick about capitalization in a docstring. According to the rules, I should only keep comments that require clear code changes and are not obvious or unimportant. A typo/capitalization inconsistency in a docstring is arguably unimportant and doesn't require a code change in the functional sense. However, documentation quality matters, and consistent terminology is important for professional codebases. The official name is "OpenTelemetry" (with capital T), so this is technically a factual error, not just a style preference. Some teams do care about documentation quality and would want this fixed. While documentation quality does matter, the rules explicitly state not to make comments that are "obvious or unimportant." A capitalization inconsistency in a docstring falls into the category of minor/unimportant issues that don't warrant a review comment. If this were a variable name or API endpoint, it would be different, but for a docstring, this is too minor to keep. This comment should be deleted. It's about a minor typographical inconsistency in a docstring that doesn't affect functionality. While technically correct, it falls under "obvious or unimportant" comments that should not be made according to the review rules.
14. src/opentelemetry_mcp/server.py:1
  • Draft comment:
    The module docstring starts with "Opentelemetry MCP Server". Consider verifying if this spelling is intentional or if it should be "OpenTelemetry MCP Server" to match standard terminology (and possibly the PR title).
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This is a style/consistency comment about capitalization. Looking at the rules, I should not keep comments that are "obvious or unimportant." However, this could be considered a code quality issue if there's inconsistent usage throughout the file. The comment is about a change (this is a new file), so it passes the "is_about_changes" test. The issue is that this is a very minor stylistic concern - it's about capitalization in a docstring. The official OpenTelemetry project uses capital 'T', but the codebase itself uses lowercase in package names. This seems like a nitpick that doesn't require action. The rules say "Do NOT make comments that are obvious or unimportant" - this falls into that category as it's a minor capitalization issue that doesn't affect functionality. While there is inconsistency in the file (some places use "OpenTelemetry" with capital T, others use "Opentelemetry"), this is a very minor stylistic issue. The comment asks the author to "verify if this spelling is intentional" which is exactly the kind of "confirm your intention" comment that the rules say to avoid. The comment is phrased as asking for verification ("Consider verifying if this spelling is intentional"), which violates the rule against asking authors to confirm intentions. Even if there's a legitimate inconsistency, this is too minor and the phrasing makes it non-actionable. This comment should be deleted. It's asking the author to verify/confirm their intention about a minor capitalization issue, which violates the rules. The comment is not actionable and falls into the "obvious or unimportant" category.
15. tests/__init__.py:1
  • Draft comment:
    Typographical note: The docstring reads "Tests for Opentelemetry MCP Server." Consider whether "Opentelemetry" should be corrected to "OpenTelemetry" for consistency with the project name.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is about a very minor typo/capitalization issue in a docstring. According to the rules, I should not comment on things that are "obvious or unimportant." A docstring capitalization issue is quite minor. However, the rules also say "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This is actionable and clear - it's a simple spelling/capitalization fix. But is this really a "code quality refactor"? It's more of a cosmetic/documentation fix. The comment uses language like "Consider whether" which is somewhat tentative and not strongly assertive. The rules say not to make comments unless there is "clearly a code change required" - this seems like a very minor documentation issue, not a clear code change requirement. Overall, this seems like an unimportant/obvious comment about a trivial documentation detail. The proper capitalization of a project name could be considered important for consistency and professionalism. If "OpenTelemetry" is the official project name, then using the correct capitalization throughout the codebase (including docstrings) could be a legitimate quality concern, not just a trivial cosmetic issue. While proper project name capitalization has some merit, this is still a very minor documentation issue in a test file's init.py docstring. The rules explicitly state not to comment on "obvious or unimportant" things, and a docstring capitalization issue falls into that category. The comment also uses tentative language ("Consider whether") rather than being definitive. This comment should be deleted. It's about a minor capitalization issue in a docstring, which falls under "obvious or unimportant" comments that should not be made according to the rules. While project name consistency has some value, this is too trivial to warrant a review comment.

Workflow ID: wflow_PjMZWKd0NXFSYlpy

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@@ -0,0 +1,3 @@
"""Opentelemetry MCP Server - Query OpenTelemetry traces from LLM applications."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typographical issue: The docstring at the top inconsistently uses Opentelemetry and OpenTelemetry. Use consistent capitalization (preferably OpenTelemetry) throughout.

Suggested change
"""Opentelemetry MCP Server - Query OpenTelemetry traces from LLM applications."""
"""OpenTelemetry MCP Server - Query OpenTelemetry traces from LLM applications."""
"""
Strongly-typed span attributes following OpenTelemetry semantic conventions.

Supports both gen_ai.* (Opentelemetry standard) and llm.* (legacy Traceloop) naming conventions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'Opentelemetry' should be 'OpenTelemetry' in the docstring to maintain proper casing.

Suggested change
Supports both gen_ai.* (Opentelemetry standard) and llm.* (legacy Traceloop) naming conventions
Supports both gen_ai.* (OpenTelemetry standard) and llm.* (legacy Traceloop) naming conventions
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (29)
.vscode/settings.json (1)

1-22: Consider gitignoring editor-specific settings.

These are purely UI customizations for the VSCode Peacock extension. While harmless, some teams prefer to keep editor-specific settings out of version control by adding .vscode/settings.json to .gitignore. This is a team preference decision.

.github/workflows/ci.yml (1)

67-77: Add cleanup for token file and document its purpose.

The gh_token.txt file is created but never explicitly cleaned up. While GitHub Actions cleans the workspace after the job, it's better practice to remove sensitive files explicitly. Additionally, document why this token is needed for the Docker build.

Add a cleanup step after the Docker build:

 - name: Docker Build uses: docker/build-push-action@v6 with: platforms: ${{ matrix.docker-platform }} context: . push: false secret-files: | "gh_token=gh_token.txt" + + - name: Cleanup + if: always() + run: rm -f gh_token.txt

Also, please verify if GH_ACCESS_TOKEN is actually needed for the Docker build and document its purpose in a comment.

start_locally.sh (1)

6-6: Strengthen error handling.

Consider using set -euo pipefail instead of just set -e for more robust error handling. The additional flags help catch unset variables (-u) and errors in pipelines (pipefail).

Apply this diff:

-set -e # Exit on error +set -euo pipefail # Exit on error, unset variables, and pipeline failures
README.md (3)

120-122: Clarify hardcoded project slug limitation.

The documentation states that "The backend uses a hardcoded project slug of 'default'". This appears to be a limitation that might affect users with multiple projects. Consider either making this configurable or documenting why it's hardcoded and how users can work around it.

Do you want me to help make the project slug configurable, or is there a technical reason for this hardcoded value?


305-328: Add language identifiers to code blocks.

The example query code blocks (lines 306, 314, 320, 326) are missing language identifiers. Add "text" or "plaintext" to improve rendering and satisfy markdown linters.

Example:

-``` +```text Use search_traces to find traces from the last hour where: - gen_ai_system is "openai" - min_duration_ms is 5000 ```

472-474: Format URLs as markdown links.

The bare URLs on lines 473-474 should be formatted as proper markdown links for better presentation and to satisfy markdown linters.

Apply this diff:

 For issues and questions: -- GitHub Issues: https://github.com/yourusername/openllmetry-mcp/issues -- Traceloop Community: https://traceloop.com/slack +- [GitHub Issues](https://github.com/yourusername/openllmetry-mcp/issues) +- [Traceloop Community](https://traceloop.com/slack)
src/opentelemetry_mcp/tools/errors.py (1)

34-44: Consider extracting timestamp parsing to a helper function.

The timestamp parsing logic is duplicated for start_time and end_time. Extract to a helper function to reduce duplication and improve maintainability.

Example refactor:

def parse_iso_timestamp(timestamp_str: str, param_name: str) -> tuple[datetime | None, str | None]: """Parse ISO 8601 timestamp string.    Returns:  Tuple of (parsed datetime, error message or None)  """ if not timestamp_str: return None, None try: return datetime.fromisoformat(timestamp_str.replace("Z", "+00:00")), None except ValueError as e: return None, f"Invalid {param_name} format: {e}"

Then use:

- if start_time: - try: - start_dt = datetime.fromisoformat(start_time.replace("Z", "+00:00")) - except ValueError as e: - return json.dumps({"error": f"Invalid start_time format: {e}"}) - - if end_time: - try: - end_dt = datetime.fromisoformat(end_time.replace("Z", "+00:00")) - except ValueError as e: - return json.dumps({"error": f"Invalid end_time format: {e}"}) + start_dt, start_error = parse_iso_timestamp(start_time, "start_time") + if start_error: + return json.dumps({"error": start_error}) + + end_dt, end_error = parse_iso_timestamp(end_time, "end_time") + if end_error: + return json.dumps({"error": end_error})
Dockerfile (1)

59-60: Improve health check to verify server availability.

The current health check only verifies that Python can execute, not that the MCP server is actually healthy or responding. This provides minimal operational value.

Replace with a health check that verifies the server process or endpoint:

-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ - CMD python -c "import sys; sys.exit(0)" +HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ + CMD pgrep -f opentelemetry-mcp || exit 1

Or if an HTTP health endpoint exists:

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD python -c "import httpx; httpx.get('http://localhost:8000/health', timeout=5)" || exit 1
src/opentelemetry_mcp/tools/model_stats.py (1)

34-39: Consider removing duplicate percentile values.

Both median and p50 return the same 0.5 percentile value. This duplication may be intentional for API compatibility, but consider whether both are needed.

If only one is needed:

 return { "mean": sum(values) / length, - "median": percentile(0.5), "p50": percentile(0.5), "p95": percentile(0.95), "p99": percentile(0.99), }
pyproject.toml (2)

48-54: Ruff configuration may have contradictory line-length settings.

Line 49 sets line-length = 100, but line 54 ignores E501 (line too long). This configuration is contradictory. If you want to enforce 100-character lines, remove E501 from the ignore list. If you want longer lines, adjust line-length instead.

Clarify the intent:

Option 1: Enforce 100-character limit:

 [tool.ruff.lint] select = ["E", "F", "I", "N", "W", "UP"] -ignore = ["E501"] +ignore = []

Option 2: Allow longer lines but format at 100:

 [tool.ruff] -line-length = 100 +line-length = 120 target-version = "py311"

59-65: Review ignore_missing_imports in strict mode.

While ignore_missing_imports = true (line 65) is convenient, it can mask typing issues in dependencies that don't have type stubs. With strict = true enabled, consider handling this more selectively using per-module overrides for specific untyped dependencies.

Consider replacing the global ignore with selective per-module configuration:

 [tool.mypy] python_version = "3.11" strict = true warn_return_any = true warn_unused_configs = true disallow_untyped_defs = true -ignore_missing_imports = true + +[[tool.mypy.overrides]] +module = ["httpx.*", "click.*"] # List specific untyped modules +ignore_missing_imports = true

This provides better type safety while accommodating untyped dependencies.

tests/conftest.py (2)

38-38: Move import to module level.

The datetime import should be at the top of the file with other imports rather than inside the fixture function.

Apply this diff:

 from typing import Any import pytest from pydantic import HttpUrl +from datetime import datetime from opentelemetry_mcp.attributes import SpanAttributes

And remove the import from inside the function:

 @pytest.fixture def sample_trace_data() -> TraceData: """Sample TraceData for testing.""" - from datetime import datetime - span = SpanData(

46-46: Use timezone-aware datetime for test fixtures.

Using datetime.now() without timezone information creates non-deterministic test data and can lead to timezone-related issues in tests. Consider using datetime.now(timezone.utc) for consistent, timezone-aware timestamps.

Apply this diff:

+from datetime import datetime, timezone + ... start_time=datetime.now(), + start_time=datetime.now(timezone.utc),
tests/test_models.py (1)

17-17: Use timezone-aware datetime in test data.

Multiple test functions use datetime.now() without timezone information (lines 17, 29, 43, 72). This can lead to timezone-related inconsistencies. Consider using datetime.now(timezone.utc) for consistent, timezone-aware timestamps.

Apply this diff to the import:

-from datetime import datetime +from datetime import datetime, timezone

Then update all occurrences:

- start_time=datetime.now(), + start_time=datetime.now(timezone.utc),

Also applies to: 29-29, 43-43, 72-72

src/opentelemetry_mcp/tools/list_models.py (3)

36-46: Code duplication: timestamp parsing pattern.

This timestamp parsing logic is duplicated across multiple tool files (usage.py, slow_traces.py, expensive_traces.py). Consider extracting to a shared utility function to reduce duplication and ensure consistent error handling.

Example utility function location: src/opentelemetry_mcp/tools/utils.py:

def parse_iso_timestamp(timestamp: str | None, param_name: str) -> tuple[datetime | None, str | None]: """Parse ISO 8601 timestamp with Z UTC handling.    Returns:  Tuple of (parsed_datetime, error_json). If error_json is not None, parsing failed.  """ if not timestamp: return None, None try: return datetime.fromisoformat(timestamp.replace("Z", "+00:00")), None except ValueError as e: return None, json.dumps({"error": f"Invalid {param_name} format: {e}"})

64-71: Potential N+1 query pattern and missing error handling.

For each trace summary, a separate get_trace() call is made (line 66), which could result in many sequential backend requests. Additionally, if any individual trace fetch fails, the entire operation will fail.

Consider:

  1. Batching trace fetches if the backend supports it
  2. Adding error handling to skip individual trace failures rather than failing completely

Example with error handling:

 for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) + # Get full trace to access LLM spans + try: + trace = await backend.get_trace(summary.trace_id) + except Exception: + # Skip traces that fail to fetch + continue for span in trace.llm_spans:

92-96: String comparison of ISO timestamps is fragile.

Comparing ISO format timestamp strings (lines 93, 95) works but relies on the ISO format's lexicographic ordering. While this is correct for ISO 8601 timestamps, directly comparing datetime objects would be more explicit and robust.

Consider comparing datetime objects:

 # Update timestamps - span_time = span.start_time.isoformat() - if span_time < models_data[model]["first_seen"]: - models_data[model]["first_seen"] = span_time - if span_time > models_data[model]["last_seen"]: - models_data[model]["last_seen"] = span_time + if span.start_time < datetime.fromisoformat(models_data[model]["first_seen"]): + models_data[model]["first_seen"] = span.start_time.isoformat() + if span.start_time > datetime.fromisoformat(models_data[model]["last_seen"]): + models_data[model]["last_seen"] = span.start_time.isoformat()

Alternatively, store datetime objects and convert to ISO format only at serialization time.

src/opentelemetry_mcp/tools/usage.py (2)

42-52: Code duplication: timestamp parsing pattern.

This timestamp parsing logic is duplicated across multiple tool files. Consider extracting to a shared utility function as noted in list_models.py review.


54-60: Backward compatibility logic may not match user intent.

The backward compatibility for deprecated gen_ai_model (line 58-60) maps it only to request_model. Users might expect it to match either requested or actual model. Consider documenting this behavior clearly or mapping to both fields.

Current behavior:

if gen_ai_model and not gen_ai_request_model and not gen_ai_response_model: final_request_model = gen_ai_model

Alternative (match both):

if gen_ai_model and not gen_ai_request_model and not gen_ai_response_model: final_request_model = gen_ai_model final_response_model = gen_ai_model
tests/test_traceloop.py (2)

43-137: Test reimplements logic instead of testing actual backend behavior.

This test manually rebuilds the filter construction logic rather than calling the backend's actual filter building method. This creates maintenance burden as the test needs to be updated whenever the backend implementation changes, and it doesn't verify that the backend actually works correctly.

Consider testing the actual backend method (e.g., backend._build_search_filters(query) or similar) if it's exposed, or test the full search_traces flow with mocked HTTP responses.


225-231: Trivial test adds no value.

This test merely asserts that float(3500) == 3500.0, which is testing Python's built-in type conversion rather than any application logic. Consider removing this test or expanding it to test actual duration conversion logic in the backend.

src/opentelemetry_mcp/tools/slow_traces.py (4)

39-49: Code duplication: timestamp parsing pattern.

This timestamp parsing logic is duplicated across multiple tool files. Consider extracting to a shared utility function as noted in previous reviews.


64-107: N+1 query pattern and missing error handling.

Similar to list_models.py, this code fetches each trace individually (line 71), which could result in many sequential backend requests. Additionally, if any individual trace fetch fails, the entire operation will fail.

Consider adding error handling to skip individual trace failures:

 for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) + # Get full trace to access LLM spans + try: + trace = await backend.get_trace(summary.trace_id) + except Exception: + # Skip traces that fail to fetch + continue

110-110: Unnecessary type: ignore comment.

The type: ignore comment on line 110 suggests a mypy issue. The lambda should work correctly without type issues. Consider removing the comment or investigating the root cause.


10-121: Significant code duplication with expensive_traces.py.

The get_slow_traces function shares substantial logic with get_expensive_traces:

  • Same timestamp parsing (lines 39-49)
  • Same trace fetching loop structure (lines 64-107)
  • Same result building pattern (lines 94-107, 113-116)
  • Only differs in sorting key (duration vs tokens) and aggregation details

Consider extracting shared logic into a common helper function to reduce duplication and maintenance burden.

Example approach:

async def _fetch_and_enrich_traces( backend: BaseBackend, query: TraceQuery, min_threshold: int | None = None, threshold_key: str = "tokens" ) -> list[dict[str, Any]]: """Shared logic for fetching and enriching traces with LLM data.""" # Common fetching and enrichment logic ...
src/opentelemetry_mcp/tools/expensive_traces.py (3)

39-49: Code duplication: timestamp parsing pattern.

This timestamp parsing logic is duplicated across multiple tool files. Consider extracting to a shared utility function as noted in previous reviews.


68-119: N+1 query pattern and missing error handling.

Similar to other tool files, this code fetches each trace individually (line 70), creating a potential performance bottleneck. Additionally, any individual trace fetch failure will cause the entire operation to fail.

Consider adding error handling as suggested in previous reviews.


122-122: Unnecessary type: ignore comment.

Similar to slow_traces.py, this type: ignore comment may be unnecessary. Consider removing it or investigating the root cause of the type checking issue.

src/opentelemetry_mcp/tools/trace.py (1)

77-77: Catch-all JSON serialization may hide type issues.

Using default=str as a catch-all JSON serializer (line 77) will convert any non-serializable object to its string representation. While this prevents crashes, it might hide type issues or produce unexpected output. Consider being more explicit about which types need custom serialization (e.g., datetime, which is already handled by .isoformat()).

Alternative approach:

def _json_serializer(obj: Any) -> str: """Custom JSON serializer for specific types.""" if isinstance(obj, datetime): return obj.isoformat() raise TypeError(f"Type {type(obj)} not serializable") return json.dumps(result, indent=2, default=_json_serializer)

This makes serialization behavior more explicit and will raise errors for truly unsupported types.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4e9d22 and 2911eba.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (35)
  • .dockerignore (1 hunks)
  • .env.example (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .python-version (1 hunks)
  • .vscode/settings.json (1 hunks)
  • CLAUDE.md (1 hunks)
  • Dockerfile (1 hunks)
  • README.md (1 hunks)
  • pyproject.toml (1 hunks)
  • src/opentelemetry_mcp/__init__.py (1 hunks)
  • src/opentelemetry_mcp/attributes.py (1 hunks)
  • src/opentelemetry_mcp/backends/__init__.py (1 hunks)
  • src/opentelemetry_mcp/backends/base.py (1 hunks)
  • src/opentelemetry_mcp/backends/filter_engine.py (1 hunks)
  • src/opentelemetry_mcp/backends/jaeger.py (1 hunks)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/config.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/__init__.py (1 hunks)
  • src/opentelemetry_mcp/tools/errors.py (1 hunks)
  • src/opentelemetry_mcp/tools/expensive_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_models.py (1 hunks)
  • src/opentelemetry_mcp/tools/model_stats.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
  • src/opentelemetry_mcp/tools/services.py (1 hunks)
  • src/opentelemetry_mcp/tools/slow_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/trace.py (1 hunks)
  • src/opentelemetry_mcp/tools/usage.py (1 hunks)
  • start_locally.sh (1 hunks)
  • tests/__init__.py (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/test_models.py (1 hunks)
  • tests/test_traceloop.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
📚 Learning: 2025-07-19T08:17:05.314Z
Learnt from: CR Repo: traceloop/api-service PR: 0 File: .cursor/rules/go-microservice-best-practices.mdc:0-0 Timestamp: 2025-07-19T08:17:05.314Z Learning: Applies to **/Dockerfile : Use multi-stage builds in Docker 

Applied to files:

  • Dockerfile
📚 Learning: 2025-07-19T08:17:05.314Z
Learnt from: CR Repo: traceloop/api-service PR: 0 File: .cursor/rules/go-microservice-best-practices.mdc:0-0 Timestamp: 2025-07-19T08:17:05.314Z Learning: Applies to **/Dockerfile : Minimize image size 

Applied to files:

  • Dockerfile
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 

Applied to files:

  • README.md
  • pyproject.toml
  • src/opentelemetry_mcp/__init__.py
  • tests/__init__.py
  • src/opentelemetry_mcp/config.py
  • src/opentelemetry_mcp/server.py
🧬 Code graph analysis (14)
src/opentelemetry_mcp/tools/expensive_traces.py (2)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-145)
  • search_traces (65-77)
  • get_trace (80-92)
src/opentelemetry_mcp/models.py (3)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-504)
  • from_span (145-244)
src/opentelemetry_mcp/tools/errors.py (2)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-145)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • TraceQuery (319-504)
  • TraceSummary (507-535)
  • has_error (113-115)
  • from_trace (522-535)
  • is_llm_span (98-100)
  • gen_ai_system (103-105)
src/opentelemetry_mcp/backends/__init__.py (1)
src/opentelemetry_mcp/backends/base.py (1)
  • BaseBackend (12-145)
src/opentelemetry_mcp/tools/usage.py (3)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-145)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (8)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-504)
  • UsageMetrics (247-284)
  • gen_ai_system (103-105)
  • gen_ai_model (108-110)
  • llm_spans (299-301)
  • from_span (145-244)
  • add_span (261-284)
src/opentelemetry_mcp/server.py (2)
  • get_llm_usage (222-265)
  • search_traces (117-197)
src/opentelemetry_mcp/tools/search.py (2)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-145)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • Filter (45-80)
  • TraceQuery (319-504)
  • TraceSummary (507-535)
  • gen_ai_system (103-105)
  • has_error (113-115)
  • from_trace (522-535)
src/opentelemetry_mcp/tools/list_models.py (4)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-145)
  • search_traces (65-77)
  • get_trace (80-92)
src/opentelemetry_mcp/models.py (4)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-504)
  • gen_ai_system (103-105)
  • from_span (145-244)
src/opentelemetry_mcp/backends/jaeger.py (2)
  • search_traces (42-118)
  • get_trace (158-184)
src/opentelemetry_mcp/backends/tempo.py (2)
  • search_traces (57-138)
  • get_trace (140-148)
src/opentelemetry_mcp/tools/trace.py (1)
src/opentelemetry_mcp/models.py (6)
  • LLMSpanAttributes (118-244)
  • has_errors (304-306)
  • is_llm_span (98-100)
  • from_span (145-244)
  • llm_spans (299-301)
  • total_llm_tokens (309-316)
src/opentelemetry_mcp/backends/filter_engine.py (2)
src/opentelemetry_mcp/models.py (7)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • TraceData (287-316)
  • llm_spans (299-301)
  • total_llm_tokens (309-316)
  • has_errors (304-306)
src/opentelemetry_mcp/attributes.py (1)
  • get (84-115)
tests/conftest.py (3)
src/opentelemetry_mcp/attributes.py (1)
  • SpanAttributes (8-133)
src/opentelemetry_mcp/config.py (1)
  • BackendConfig (13-54)
src/opentelemetry_mcp/models.py (2)
  • SpanData (83-115)
  • TraceData (287-316)
tests/test_traceloop.py (2)
src/opentelemetry_mcp/backends/traceloop.py (3)
  • TraceloopBackend (24-586)
  • _convert_root_span_to_trace (446-494)
  • _convert_spans_to_trace (496-578)
src/opentelemetry_mcp/models.py (3)
  • TraceQuery (319-504)
  • gen_ai_system (103-105)
  • has_error (113-115)
src/opentelemetry_mcp/attributes.py (1)
src/opentelemetry_mcp/models.py (1)
  • gen_ai_system (103-105)
src/opentelemetry_mcp/tools/model_stats.py (1)
src/opentelemetry_mcp/models.py (4)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-504)
  • from_span (145-244)
  • has_error (113-115)
src/opentelemetry_mcp/server.py (9)
src/opentelemetry_mcp/backends/base.py (4)
  • BaseBackend (12-145)
  • search_traces (65-77)
  • get_trace (80-92)
  • list_services (95-104)
src/opentelemetry_mcp/backends/jaeger.py (3)
  • search_traces (42-118)
  • get_trace (158-184)
  • list_services (186-202)
src/opentelemetry_mcp/backends/tempo.py (4)
  • TempoBackend (22-576)
  • search_traces (57-138)
  • get_trace (140-148)
  • list_services (150-189)
src/opentelemetry_mcp/backends/traceloop.py (4)
  • TraceloopBackend (24-586)
  • search_traces (86-178)
  • get_trace (180-213)
  • list_services (215-260)
src/opentelemetry_mcp/config.py (4)
  • ServerConfig (57-103)
  • from_env (31-54)
  • from_env (65-76)
  • apply_cli_overrides (78-103)
src/opentelemetry_mcp/tools/list_models.py (1)
  • list_models (11-113)
src/opentelemetry_mcp/tools/trace.py (1)
  • get_trace (10-80)
src/opentelemetry_mcp/tools/services.py (1)
  • list_services (8-25)
src/opentelemetry_mcp/tools/slow_traces.py (1)
  • get_slow_traces (10-121)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
🪛 GitHub Check: build-test (amd64)
Dockerfile

[warning] 51-51: Sensitive data should not be used in the ARG or ENV commands
SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ENV "BACKEND_API_KEY")
More info: https://docs.docker.com/go/dockerfile/rule/secrets-used-in-arg-or-env/

🪛 LanguageTool
README.md

[uncategorized] ~240-~240: Possible missing comma found.
Context: ...s ### 2. get_trace Get complete trace details including all spans and OpenLLMetry att...

(AI_HYDRA_LEO_MISSING_COMMA)

CLAUDE.md

[grammar] ~92-~92: A determiner may be missing.
Context: ...y_mcp/tools/expensive_traces.py) - Find highest token usage traces - [tools/slow_traces...

(THE_SUPERLATIVE)


[grammar] ~93-~93: A determiner may be missing.
Context: ...emetry_mcp/tools/slow_traces.py) - Find slowest LLM traces Critical: All tools MUS...

(THE_SUPERLATIVE)


[style] ~107-~107: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...y_mcp/server.py) - FastMCP application, CLI interface, tool handlers - [config.py](openteleme...

(ACRONYM_TAUTOLOGY)


[duplication] ~108-~108: Possible typo: you repeated a word.
Context: ...mcp/config.py) - Pydantic configuration models - models.py - Core...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~110-~110: The hyphen in Strongly-typed is redundant.
Context: ....py](opentelemetry_mcp/attributes.py) - Strongly-typed OpenTelemetry attribute models ## Conf...

(ADVERB_LY_HYPHEN_FIX)


[grammar] ~221-~221: A determiner may be missing.
Context: ...s` - Cost Optimization Find traces with highest token usage for cost optimization. **U...

(THE_SUPERLATIVE)


[grammar] ~239-~239: A determiner may be missing.
Context: ...traces` - Performance Optimization Find slowest LLM traces by duration to identify late...

(THE_SUPERLATIVE)


[typographical] ~291-~291: Do not use a colon (:) before a series that is introduced by a preposition (‘with’). Remove the colon or add a noun or a noun phrase after the preposition.
Context: ...ter Structure Each filter is an object with: - field: Field name in dotted notat...

(RP_COLON)


[grammar] ~292-~292: The singular proper name ‘Field’ must be used with a third-person or a past tense verb.
Context: ...r is an object with: - field: Field name in dotted notation (e.g., `"gen_ai.usag...

(HE_VERB_AGR)

🪛 markdownlint-cli2 (0.18.1)
README.md

306-306: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


314-314: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


320-320: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


326-326: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


385-385: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


473-473: Bare URL used

(MD034, no-bare-urls)


474-474: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (19)
.python-version (1)

1-1: LGTM!

The Python version specification is correct and consistent with the Dockerfile, CI workflow, and project dependencies.

src/opentelemetry_mcp/tools/__init__.py (1)

1-3: LGTM!

Standard package initializer with appropriate docstring. The empty __all__ list is fine since tools are accessed directly from their respective modules.

.dockerignore (1)

55-56: Verify exclusion of tests directory.

Excluding the tests/ directory reduces the Docker build context size, which is appropriate for production images. However, ensure this aligns with your Docker usage strategy—if you need to run tests inside containers during development or CI, you may want to reconsider this exclusion or use different .dockerignore files for different build contexts.

src/opentelemetry_mcp/backends/__init__.py (1)

1-5: LGTM!

Clean package initialization that exports the abstract BaseBackend interface. Concrete backend implementations (Jaeger, Tempo, Traceloop) are appropriately accessed directly from their respective modules.

.github/workflows/ci.yml (1)

10-47: LGTM!

The lint and test jobs follow standard CI practices with appropriate dependency installation and tool execution.

README.md (1)

1-474: Excellent comprehensive documentation!

The README provides thorough coverage of installation, configuration, usage, and troubleshooting. The documentation clearly explains the multi-backend architecture, tools, and OpenLLMetry semantic conventions. The suggested improvements above are minor formatting refinements.

start_locally.sh (1)

21-44: Clarify API key handling in start_locally.sh.

The review comment correctly identifies that the script doesn't explicitly pass --api-key to the server command. While the API key IS currently used through environment variable reading (the server calls ServerConfig.from_env() which reads BACKEND_API_KEY), making it explicit via the CLI flag would improve clarity and align with the documented configuration precedence pattern ("CLI args > environment variables").

The proposed diff is valid. However, clarify in a comment that this makes the API key handling explicit and optional—if users set BACKEND_API_KEY in the environment without uncommenting line 33, it will still work.

 # ============================================ # Start Server # ============================================ echo "Backend: $BACKEND_TYPE" echo "URL: $BACKEND_URL" echo "" +# Build command with optional API key +CMD_ARGS="--transport stdio --backend $BACKEND_TYPE --url $BACKEND_URL" +if [ -n "${BACKEND_API_KEY:-}" ]; then + CMD_ARGS="$CMD_ARGS --api-key $BACKEND_API_KEY" +fi + # Start the MCP server in stdio mode (for Claude Desktop/MCP clients) -uv run opentelemetry-mcp --transport stdio --backend "$BACKEND_TYPE" --url "$BACKEND_URL" +uv run opentelemetry-mcp $CMD_ARGS
src/opentelemetry_mcp/tools/errors.py (2)

11-121: LGTM with verification needed for attribute access.

The error finding logic is well-structured with proper error handling, JSON serialization, and LLM-specific metadata extraction. The function correctly filters for error traces and provides detailed error information. Verify the attribute access issue flagged separately.


105-110: No issues found - the flagged code is correct.

The verification confirms that span.gen_ai_system and span.gen_ai_model are valid @property methods on the SpanData class (defined in src/opentelemetry_mcp/models.py lines 103-110). The code in lines 107-110 correctly accesses these properties and will not raise AttributeError.

Likely an incorrect or invalid review comment.

Dockerfile (2)

4-26: LGTM! Efficient multi-stage build pattern.

The builder stage follows best practices with proper dependency caching using bind mounts, frozen lockfile installation, and separate dependency/project sync steps. This aligns with Docker multi-stage build recommendations.

Based on learnings


29-45: LGTM! Minimal and secure runtime configuration.

The runtime stage uses a slim base image for minimal size and creates a non-root user for security. This aligns with Docker security and optimization best practices.

Based on learnings

src/opentelemetry_mcp/tools/services.py (1)

8-52: LGTM! Clean and consistent service listing implementation.

Both functions follow a consistent pattern with proper error handling, sorted results, and uniform JSON response format. The implementation is straightforward and correct.

src/opentelemetry_mcp/tools/model_stats.py (1)

107-168: LGTM with performance caveat.

The metric collection and statistical calculation logic is well-implemented with proper null handling, division-by-zero protection, and comprehensive result formatting. Ensure the performance concerns flagged earlier are addressed for production use.

tests/__init__.py (1)

1-1: LGTM!

Standard test package initialization with descriptive docstring.

src/opentelemetry_mcp/__init__.py (1)

1-3: LGTM!

Standard package initialization with version string. Ensure version consistency with pyproject.toml.

.env.example (1)

1-26: LGTM! Comprehensive environment configuration template.

The configuration template is well-documented with clear examples for each backend type, appropriate defaults, and helpful comments. This aligns well with the application's configuration requirements.

pyproject.toml (1)

22-29: The original review comment is based on incorrect version information.

Based on web search results from November 10, 2025, all three dependencies in pyproject.toml are aligned with current stable releases:

  • pytest-asyncio — 1.2.0 matches the specified version exactly
  • mypy — 1.18.2 confirms the specified ~1.18.0 is current (1.18.2 is a patch release)
  • pytest-cov — 7.0.0 matches the specified version exactly

The original claim that these versions were outdated or ahead of current releases was incorrect. The dependencies are properly pinned to current, stable versions.

Likely an incorrect or invalid review comment.

src/opentelemetry_mcp/tools/expensive_traces.py (1)

95-101: Good edge case handling.

The filtering logic properly handles both the min_tokens threshold and excludes traces with zero token usage, preventing meaningless results in the output.

src/opentelemetry_mcp/tools/trace.py (1)

20-78: Well-structured trace detail retrieval.

The function properly fetches a single trace, builds comprehensive output including span details and LLM-specific summaries, and handles errors gracefully. The structure is clear and logical.

Comment on lines 48 to 61
build-test:
needs: test
runs-on: ubuntu-latest
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
steps:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

❓ Verification inconclusive

Fix matrix runner configuration.

The runs-on field is set to ubuntu-latest at the job level (line 50), but the matrix includes different runs-on values (lines 56, 59) that won't be used. GitHub Actions doesn't automatically use matrix.runs-on unless referenced explicitly.

Apply this diff to fix the configuration:

 build-test: needs: test - runs-on: ubuntu-latest + runs-on: ${{ matrix.runs-on }} strategy: matrix: platform: [amd64, arm64]

Also verify that the Linux-ARM64 runner (line 59) exists in your GitHub environment. If it's a self-hosted runner, ensure it's properly configured with that label.


Fix matrix runner configuration.

Matrix values won't override the job-level runs-on setting unless explicitly referenced. The runs-on field is set to ubuntu-latest at the job level (line 50), but the matrix includes different runs-on values (lines 56, 59) that won't be used.

Apply this diff:

 build-test: needs: test - runs-on: ubuntu-latest + runs-on: ${{ matrix.runs-on }} strategy: matrix: platform: [amd64, arm64]

Verify the Linux-ARM64 self-hosted runner (line 59) exists and is properly configured in your GitHub environment.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
build-test:
needs: test
runs-on: ubuntu-latest
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
steps:
build-test:
needs: test
runs-on: ${{ matrix.runs-on }}
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
steps:
🤖 Prompt for AI Agents
.github/workflows/ci.yml lines 48-61: the job-level runs-on is hardcoded to ubuntu-latest which prevents per-matrix runs-on values from being applied; change the job's runs-on to reference the matrix (runs-on: ${{ matrix.runs-on }}) and keep the matrix include entries for platform and docker-platform as-is, ensure any custom runner label like "Linux-ARM64" actually exists and is configured in the GitHub environment, and verify docker-platform is used where needed in steps that run containers. 
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 11, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
start_locally.sh (1)

26-37: Consider adding validation for backend configuration.

The backend configuration is clear and well-documented, but there's a minor usability gap: if a user uncomments the Traceloop backend (lines 31–33) without changing BACKEND_API_KEY from the placeholder "your-api-key-here", the server will start with invalid credentials, potentially failing with unclear errors downstream.

While the inline comment does instruct users to set the API key, adding a small validation step would catch this early and provide a helpful error message. Similarly, validating that BACKEND_TYPE is one of the supported backends would prevent typos.

This is optional and can be deferred, but would improve the developer experience for local setup.

If you'd like, I can generate a refactored version that adds lightweight validation for backend type and API key presence:

# Example: Add after backend configuration, before server startup if [[ "$BACKEND_TYPE" == "traceloop" && -z "$BACKEND_API_KEY" ]]; then echo "Error: BACKEND_API_KEY must be set for Traceloop backend" >&2 exit 1 fi

Would you like me to generate a full refactor proposal?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2911eba and 0b875bd.

📒 Files selected for processing (1)
  • start_locally.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (2)
start_locally.sh (2)

1-19: Well-structured script with good practices.

The shebang, strict error handling (set -e), directory detection, and uv availability check are all correctly implemented. Error messages properly route to stderr.


49-54: Server startup looks correct.

Environment variables (BACKEND_API_KEY, LOG_LEVEL, etc.) are exported and will be available to the subprocess. The command-line arguments (--backend and --url) are properly quoted.

@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 11, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/opentelemetry_mcp/models.py (1)

408-416: Preserve UNSET spans when has_error is False.

Mapping has_error=False to status == "OK" drops spans whose status is "UNSET" (the default for most non-error spans), so queries silently miss valid results. Switch the false case to status != "ERROR" instead.

- if self.has_error is not None: - all_filters.append( - Filter( - field="status", - operator=FilterOperator.EQUALS, - value="ERROR" if self.has_error else "OK", - value_type=FilterType.STRING, - ) - ) + if self.has_error is not None: + operator = ( + FilterOperator.EQUALS if self.has_error else FilterOperator.NOT_EQUALS + ) + all_filters.append( + Filter( + field="status", + operator=operator, + value="ERROR", + value_type=FilterType.STRING, + ) + )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b875bd and d12d1c0.

📒 Files selected for processing (10)
  • CLAUDE.md (1 hunks)
  • src/opentelemetry_mcp/backends/base.py (1 hunks)
  • src/opentelemetry_mcp/backends/filter_engine.py (1 hunks)
  • src/opentelemetry_mcp/backends/jaeger.py (1 hunks)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_llm_tools.py (1 hunks)
  • src/opentelemetry_mcp/tools/search_spans.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/opentelemetry_mcp/backends/tempo.py
  • src/opentelemetry_mcp/backends/jaeger.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 

Applied to files:

  • src/opentelemetry_mcp/server.py
🧬 Code graph analysis (6)
src/opentelemetry_mcp/tools/list_llm_tools.py (2)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_spans (80-92)
src/opentelemetry_mcp/models.py (4)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanQuery (538-681)
src/opentelemetry_mcp/backends/traceloop.py (4)
src/opentelemetry_mcp/attributes.py (4)
  • HealthCheckResponse (155-167)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • get (84-115)
src/opentelemetry_mcp/backends/base.py (10)
  • BaseBackend (12-160)
  • _create_headers (45-51)
  • get_supported_operators (54-62)
  • search_traces (65-77)
  • client (29-42)
  • search_spans (80-92)
  • get_trace (95-107)
  • list_services (110-119)
  • get_service_operations (122-134)
  • health_check (137-146)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-274)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (10)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (83-115)
  • SpanQuery (538-681)
  • TraceData (287-316)
  • TraceQuery (319-504)
  • get_all_filters (359-462)
  • get_all_filters (578-681)
  • has_error (113-115)
src/opentelemetry_mcp/backends/base.py (2)
src/opentelemetry_mcp/attributes.py (1)
  • HealthCheckResponse (155-167)
src/opentelemetry_mcp/models.py (2)
  • FilterOperator (12-34)
  • SpanData (83-115)
src/opentelemetry_mcp/backends/filter_engine.py (2)
src/opentelemetry_mcp/models.py (8)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (83-115)
  • TraceData (287-316)
  • llm_spans (299-301)
  • total_llm_tokens (309-316)
  • has_errors (304-306)
src/opentelemetry_mcp/attributes.py (1)
  • get (84-115)
src/opentelemetry_mcp/server.py (4)
src/opentelemetry_mcp/backends/base.py (5)
  • search_spans (80-92)
  • health_check (137-146)
  • search_traces (65-77)
  • get_trace (95-107)
  • list_services (110-119)
src/opentelemetry_mcp/backends/jaeger.py (5)
  • search_spans (123-200)
  • health_check (268-294)
  • search_traces (42-83)
  • get_trace (202-228)
  • list_services (230-246)
src/opentelemetry_mcp/backends/tempo.py (5)
  • search_spans (141-242)
  • health_check (327-352)
  • search_traces (58-139)
  • get_trace (244-252)
  • list_services (254-293)
src/opentelemetry_mcp/config.py (4)
  • ServerConfig (57-103)
  • from_env (31-54)
  • from_env (65-76)
  • apply_cli_overrides (78-103)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (3)
  • SpanAttributes (8-133)
  • to_dict (60-82)
  • get (84-115)
🪛 LanguageTool
CLAUDE.md

[style] ~111-~111: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...y_mcp/server.py) - FastMCP application, CLI interface, tool handlers - [config.py](openteleme...

(ACRONYM_TAUTOLOGY)


[style] ~452-~452: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ueries - Querying all services would be very expensive (especially for spans) - Users can easi...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
CLAUDE.md

466-466: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 11, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
Dockerfile (1)

62-64: Improve health check to verify service responsiveness.

The current health check only verifies that Python can execute (sys.exit(0)), which doesn't confirm the MCP server is actually running or responding. Consider checking if the process is listening on port 8000 or if the server module can be imported.

Apply this diff to add a more meaningful health check:

-# Health check (optional - checks if the process is running) +# Health check - verifies the server process responds HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ - CMD python -c "import sys; sys.exit(0)" + CMD python -c "import httpx; httpx.get('http://localhost:8000/health', timeout=5.0)" || exit 1

Note: This assumes your server exposes a /health endpoint. Adjust the path if different.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d12d1c0 and e491e75.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • Dockerfile (1 hunks)
  • pyproject.toml (1 hunks)
  • src/opentelemetry_mcp/__init__.py (1 hunks)
  • src/opentelemetry_mcp/attributes.py (1 hunks)
  • src/opentelemetry_mcp/backends/filter_engine.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/config.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/tools/errors.py (1 hunks)
  • src/opentelemetry_mcp/tools/model_stats.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
  • src/opentelemetry_mcp/utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/opentelemetry_mcp/init.py
  • src/opentelemetry_mcp/models.py
  • src/opentelemetry_mcp/tools/errors.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
📚 Learning: 2025-07-19T08:17:05.314Z
Learnt from: CR Repo: traceloop/api-service PR: 0 File: .cursor/rules/go-microservice-best-practices.mdc:0-0 Timestamp: 2025-07-19T08:17:05.314Z Learning: Applies to **/Dockerfile : Use multi-stage builds in Docker 

Applied to files:

  • Dockerfile
📚 Learning: 2025-07-19T08:17:05.314Z
Learnt from: CR Repo: traceloop/api-service PR: 0 File: .cursor/rules/go-microservice-best-practices.mdc:0-0 Timestamp: 2025-07-19T08:17:05.314Z Learning: Applies to **/Dockerfile : Minimize image size 

Applied to files:

  • Dockerfile
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR Repo: traceloop/openllmetry PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-17T15:06:48.109Z Learning: Applies to **/*.py : Store API keys only in environment variables/secure vaults; never hardcode secrets in code 

Applied to files:

  • Dockerfile
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 

Applied to files:

  • pyproject.toml
  • src/opentelemetry_mcp/config.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR Repo: traceloop/openllmetry PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-08-17T15:06:48.109Z Learning: Semantic conventions must follow the OpenTelemetry GenAI specification 

Applied to files:

  • src/opentelemetry_mcp/attributes.py
🧬 Code graph analysis (4)
src/opentelemetry_mcp/tools/search.py (1)
src/opentelemetry_mcp/models.py (6)
  • Filter (45-80)
  • TraceQuery (319-514)
  • TraceSummary (517-545)
  • gen_ai_system (103-105)
  • has_error (113-115)
  • from_trace (532-545)
src/opentelemetry_mcp/backends/filter_engine.py (2)
src/opentelemetry_mcp/models.py (8)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (83-115)
  • TraceData (287-316)
  • llm_spans (299-301)
  • total_llm_tokens (309-316)
  • has_errors (304-306)
src/opentelemetry_mcp/attributes.py (1)
  • get (84-115)
src/opentelemetry_mcp/tools/model_stats.py (3)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-160)
  • search_traces (65-77)
  • get_trace (95-107)
src/opentelemetry_mcp/models.py (5)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-514)
  • from_span (145-244)
  • from_span (721-740)
  • has_error (113-115)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/attributes.py (1)
src/opentelemetry_mcp/models.py (1)
  • gen_ai_system (103-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (22)
pyproject.toml (1)

29-36: No action required—versions are current.

The dev dependency versions are actually correct and up-to-date. The latest stable versions on PyPI as of November 11, 2025 are pytest-asyncio 1.2.0, pytest-cov 7.0.0, and mypy 1.18.2. The code specifies pytest-asyncio~=1.2.0 and pytest-cov~=7.0.0 which match exactly, and mypy~=1.18.0 which is compatible with 1.18.2 under the compatible release operator (~=).

Likely an incorrect or invalid review comment.

src/opentelemetry_mcp/utils.py (1)

6-23: LGTM! Clean utility extraction.

The function correctly handles ISO 8601 timestamp parsing with UTC handling and provides structured error messages. This successfully eliminates the duplication mentioned in past review comments.

src/opentelemetry_mcp/tools/model_stats.py (3)

11-40: LGTM! Solid percentile implementation.

The percentile calculation correctly uses linear interpolation and handles empty inputs. Note that median and p50 compute the same value, but this redundancy is harmless and may be intentional for API clarity.


69-88: Timestamp parsing and limit handling improved.

The function now uses the shared parse_iso_timestamp utility (eliminating duplication) and correctly passes through the limit parameter instead of hardcoding 1000.


103-139: N+1 query pattern is now documented.

The N+1 pattern (fetch summaries then individual traces) is documented in the docstring as requested. While this can cause performance issues with large result sets, the behavior is now clear to callers.

src/opentelemetry_mcp/backends/filter_engine.py (4)

24-55: LGTM! Clean filtering implementation.

The method correctly handles both TraceData and SpanData with type-safe overloads and guards against empty lists.


73-111: Correct ALL-values semantics for negative operators.

The implementation correctly requires all values to satisfy negative predicates (NOT_EQUALS, NOT_CONTAINS, NOT_IN) and any value to satisfy positive predicates, addressing the past review concern.


113-197: LGTM! Comprehensive field extraction.

The field extraction correctly handles both trace-level and span-level fields, supporting dotted notation for attributes and collecting values from multiple spans when appropriate.


199-284: LGTM! Robust type-aware value comparison.

The comparison logic correctly handles type conversions with appropriate error handling and supports all necessary operators for each value type.

src/opentelemetry_mcp/tools/search.py (2)

99-117: ValidationError handling correctly implemented.

The TraceQuery construction is properly wrapped in try/except ValidationError as requested in past review comments, preventing crashes on invalid parameters.


119-135: LGTM! Clean search execution and result formatting.

The search execution correctly converts traces to summaries and returns a well-structured JSON response with count and traces array.

src/opentelemetry_mcp/attributes.py (3)

8-134: LGTM! Well-designed attribute model.

The SpanAttributes class correctly implements dual naming convention support (gen_ai.* and llm.*) with proper aliasing and backward-compatible access patterns. The implementation follows Pydantic best practices and OpenTelemetry semantic conventions.

Based on learnings


136-152: LGTM! Clean event model.

The SpanEvent class correctly models OpenTelemetry span events with appropriate field types and descriptions.


155-167: LGTM! Type-safe health check model.

The HealthCheckResponse class correctly uses Literal types for status and backend, with appropriate optional fields for backend-specific information.

src/opentelemetry_mcp/backends/traceloop.py (8)

33-68: LGTM! Clean backend initialization.

The initialization correctly validates required parameters and sets up authentication headers. The hardcoded "default" project_id is appropriately documented as being resolved by Traceloop from the API key.


70-85: LGTM! Correct operator support declaration.

The supported operators set correctly reflects Traceloop API capabilities for native filtering.


87-179: LGTM! Solid hybrid filtering implementation.

The search_traces method correctly implements hybrid filtering by separating native and client-side filters, with appropriate fallbacks and logging. The default 24-hour time range is a reasonable choice.


181-281: LGTM! Reasonable workaround for API limitations.

The search_spans implementation correctly works around Traceloop's lack of dedicated span search by fetching traces and flattening to spans. The limit * 2 multiplication helps ensure enough spans are collected.


283-316: LGTM! Clean trace retrieval.

The get_trace method correctly fetches and validates trace data, with appropriate error handling for missing traces.


318-396: LGTM! Practical service listing implementation.

The service listing and operations methods work around API limitations by querying recent data and extracting unique values. The 7-day window is a reasonable default for service discovery.


430-553: LGTM! Comprehensive filter and attribute transformation.

The transformation methods correctly handle bidirectional mapping between gen_ai.* and llm.* conventions, with appropriate fallbacks and logging for unsupported cases.


555-687: Service name extraction now correctly implemented.

The span-to-trace conversion methods now properly extract service_name from span attributes (line 643: raw_attrs.get("service.name", "")) and root span data (line 578: root_span.get("service_name", "")), addressing the past review concern about hard-coded empty strings.

@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
src/opentelemetry_mcp/tools/usage.py (1)

47-53: Consider warning when deprecated parameter is ignored.

When gen_ai_model is provided alongside the newer gen_ai_request_model or gen_ai_response_model parameters, it's silently ignored (line 51 condition ensures this). This could confuse users who expect both filters to apply.

Consider one of these approaches:

  1. Log a warning when the deprecated parameter is ignored
  2. Document this precedence behavior in the docstring
  3. Raise an error when both deprecated and new parameters are provided together

Example for option 1:

+ if gen_ai_model and (gen_ai_request_model or gen_ai_response_model): + # Warn that deprecated parameter is being ignored + import logging + logging.warning("gen_ai_model is deprecated and ignored when gen_ai_request_model or gen_ai_response_model are provided") + if gen_ai_model and not gen_ai_request_model and not gen_ai_response_model: # Legacy mode: use gen_ai_model for request_model final_request_model = gen_ai_model
src/opentelemetry_mcp/tools/slow_traces.py (1)

103-103: Consider removing the type: ignore comment.

The type: ignore[arg-type, return-value] comment on the sort operation may be unnecessary. The lambda function correctly accesses a dictionary key and returns a numeric value for sorting. Try removing this comment to verify whether a type checking issue actually exists.

src/opentelemetry_mcp/tools/list_models.py (1)

61-89: Prefer datetime comparison over ISO string comparison.

Lines 86-89 compare ISO timestamp strings to track first/last seen. While this works due to ISO 8601's lexicographic ordering, comparing datetime objects directly (like in list_llm_tools.py lines 110-113) is clearer and less prone to confusion.

Apply this diff to use datetime comparison:

 # Initialize model entry if not exists if model not in models_data: models_data[model] = { "model": model, "provider": llm_attrs.system, "request_count": 0, - "first_seen": span.start_time.isoformat(), - "last_seen": span.start_time.isoformat(), + "first_seen": span.start_time, + "last_seen": span.start_time, } # Update statistics models_data[model]["request_count"] += 1 # Update timestamps - span_time = span.start_time.isoformat() - if span_time < models_data[model]["first_seen"]: - models_data[model]["first_seen"] = span_time - if span_time > models_data[model]["last_seen"]: - models_data[model]["last_seen"] = span_time + if span.start_time < models_data[model]["first_seen"]: + models_data[model]["first_seen"] = span.start_time + if span.start_time > models_data[model]["last_seen"]: + models_data[model]["last_seen"] = span.start_time

Then add default=str to the json.dumps call on line 103:

- return json.dumps(result, indent=2) + return json.dumps(result, indent=2, default=str)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e491e75 and 9806faa.

📒 Files selected for processing (9)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/config.py (1 hunks)
  • src/opentelemetry_mcp/tools/expensive_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_llm_tools.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_models.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
  • src/opentelemetry_mcp/tools/search_spans.py (1 hunks)
  • src/opentelemetry_mcp/tools/slow_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/usage.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/opentelemetry_mcp/tools/expensive_traces.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 

Applied to files:

  • src/opentelemetry_mcp/config.py
🧬 Code graph analysis (7)
src/opentelemetry_mcp/tools/list_llm_tools.py (3)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_spans (80-92)
src/opentelemetry_mcp/models.py (5)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanQuery (548-701)
  • gen_ai_system (103-105)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/tools/search.py (2)
src/opentelemetry_mcp/models.py (6)
  • Filter (45-80)
  • TraceQuery (319-514)
  • TraceSummary (517-545)
  • gen_ai_system (103-105)
  • has_error (113-115)
  • from_trace (532-545)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/tools/slow_traces.py (2)
src/opentelemetry_mcp/models.py (4)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-514)
  • from_span (145-244)
  • from_span (721-740)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/backends/traceloop.py (2)
src/opentelemetry_mcp/backends/base.py (10)
  • BaseBackend (12-160)
  • _create_headers (45-51)
  • get_supported_operators (54-62)
  • search_traces (65-77)
  • client (29-42)
  • search_spans (80-92)
  • get_trace (95-107)
  • list_services (110-119)
  • get_service_operations (122-134)
  • health_check (137-146)
src/opentelemetry_mcp/models.py (10)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (83-115)
  • SpanQuery (548-701)
  • TraceData (287-316)
  • TraceQuery (319-514)
  • get_all_filters (359-472)
  • get_all_filters (588-701)
  • has_error (113-115)
src/opentelemetry_mcp/tools/search_spans.py (5)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_spans (80-92)
src/opentelemetry_mcp/models.py (7)
  • Filter (45-80)
  • SpanQuery (548-701)
  • SpanSummary (704-740)
  • gen_ai_system (103-105)
  • has_error (113-115)
  • from_span (145-244)
  • from_span (721-740)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/backends/jaeger.py (1)
  • search_spans (123-200)
src/opentelemetry_mcp/backends/tempo.py (1)
  • search_spans (141-242)
src/opentelemetry_mcp/tools/list_models.py (3)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-160)
  • search_traces (65-77)
  • get_trace (95-107)
src/opentelemetry_mcp/models.py (5)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-514)
  • gen_ai_system (103-105)
  • from_span (145-244)
  • from_span (721-740)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/tools/usage.py (2)
src/opentelemetry_mcp/models.py (7)
  • LLMSpanAttributes (118-244)
  • TraceQuery (319-514)
  • UsageMetrics (247-284)
  • gen_ai_system (103-105)
  • from_span (145-244)
  • from_span (721-740)
  • add_span (261-284)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (29)
src/opentelemetry_mcp/tools/search.py (2)

1-128: LGTM! Previous feedback addressed.

The implementation is clean and well-structured. Both previous review comments have been successfully addressed:

  • ✅ TraceQuery validation errors are now caught (lines 92-110)
  • ✅ Using shared parse_iso_timestamp utility (lines 63, 67)

The error handling is comprehensive with structured JSON responses, backward compatibility is properly maintained for the deprecated gen_ai_model parameter, and the use of Pydantic models ensures robust validation throughout.


83-90: Clean backward compatibility implementation.

The logic correctly preserves legacy behavior by mapping gen_ai_model to gen_ai_request_model only when the newer parameters are absent. This ensures existing callers continue to work while allowing explicit use of the more specific parameters.

src/opentelemetry_mcp/tools/usage.py (7)

1-8: LGTM: Clean imports.

All imports are necessary and appropriately organized.


11-37: LGTM: Well-documented function signature.

The function signature is clean with modern type hints, and the docstring clearly documents the deprecation of gen_ai_model and expected formats.


38-45: LGTM: Clean timestamp validation.

Early returns with JSON error messages provide good user feedback when timestamps are invalid.


55-64: LGTM: Query construction is correct.

The query correctly uses the processed final_request_model and final_response_model parameters.


66-78: LGTM: Clean aggregation logic.

The iteration through traces and spans is well-structured, with proper null-checking before aggregating metrics.


100-118: LGTM: Breakdown aggregations are correct.

The model and service breakdowns correctly extract and format metrics from the aggregated data.


120-123: LGTM: Appropriate error handling for a user-facing tool.

The default=str parameter in json.dumps elegantly handles non-JSON-serializable types like datetime. The broad exception handler (lines 122-123) prevents crashes while still providing error feedback, which is appropriate for a tool exposed to end users.

src/opentelemetry_mcp/tools/slow_traces.py (6)

1-19: Well-structured function signature with comprehensive filtering options.

The imports and function signature are clean and well-typed. The parameter set provides good flexibility for filtering slow traces by time range, duration, service, and model.


20-34: Clear and complete documentation.

The docstring effectively describes the function's purpose, parameters, and return format.


35-42: Timestamp parsing uses shared utility with proper error handling.

Good use of the parse_iso_timestamp utility with appropriate error responses in JSON format.


44-58: Query building uses appropriate limit expansion strategy.

The 10x limit expansion (capped at 1000) is a reasonable heuristic to ensure enough slow traces are captured when the backend doesn't support duration-based sorting. The approach aligns with the client-side filtering pattern used throughout the project.


70-100: Metric collection logic is sound and comprehensive.

The code properly handles optional LLM attributes, accumulates token counts safely, and tracks models with appropriate fallbacks. The result dictionary provides a rich set of metrics for analyzing slow traces.


104-114: Result formatting and error handling are appropriate.

The result structure is clean and consistent with other tools in the codebase. The broad exception handling is acceptable for a tool function that needs to return JSON strings, and it includes useful error details for debugging.

src/opentelemetry_mcp/tools/list_models.py (3)

33-39: LGTM!

Timestamp parsing with early error returns follows the established pattern and provides clear error messages to users.


43-49: LGTM!

Query construction correctly passes through all filter parameters.


105-106: LGTM!

Error handling follows the established pattern of returning structured JSON error messages.

src/opentelemetry_mcp/tools/list_llm_tools.py (4)

14-21: LGTM!

The LLMToolInfo model provides a clear structure for tool usage statistics.


49-75: LGTM!

Timestamp parsing and query construction correctly combine legacy parameters with the new filter system. The filter for traceloop.span.kind == "tool" properly identifies tool spans.


77-113: LGTM!

The span aggregation logic correctly groups by tool name and tracks usage statistics. Using datetime comparison directly (lines 110-113) is clearer than ISO string comparison.


115-135: LGTM!

Result formatting correctly handles set-to-list conversion, datetime serialization with default=str, and provides both individual tool counts and total calls. Error handling follows the established pattern.

src/opentelemetry_mcp/config.py (2)

52-60: LGTM: Environment variable validation.

The error handling for BACKEND_TIMEOUT with try/except and logging is well-implemented, gracefully falling back to the default value when parsing fails.


118-119: LGTM: URL validation fixed.

The TypeAdapter(HttpUrl).validate_python() approach correctly handles Pydantic v2's type validation. This addresses the previous issue where HttpUrl() was called directly.

src/opentelemetry_mcp/backends/traceloop.py (4)

643-643: LGTM: Service name population fixed.

The service name is now correctly extracted from raw_attrs.get("service.name", ""), ensuring TraceData.service_name is properly populated. This addresses the previous review concern.


87-179: Solid hybrid filtering implementation.

The search_traces method effectively splits filters into API-native and client-side categories, logs unconvertible filters for transparency, and applies client-side filtering as a fallback. The time window defaulting to the last 24 hours is sensible.


510-553: Well-structured attribute transformation.

The _transform_llm_attributes_to_gen_ai method cleanly maps Traceloop's llm.* conventions to OpenTelemetry's gen_ai.* format while preserving original keys for backward compatibility. The explicit mapping table makes the transformation transparent.


408-412: Traceloop's health check uses temporary client intentionally to access root endpoint.

The temporary AsyncClient creation is intentional: the /health endpoint is at the API root, not under the versioned API path. Since self.client is configured with base_url=self.url (which includes the API path prefix), using it would route to an incorrect endpoint. This design validates basic connectivity but—unlike Jaeger and Tempo—does not validate authentication headers or full API access. If comprehensive API validation is needed, consider calling an authenticated endpoint (e.g., a minimal API method) instead of the unauthenticated health endpoint.

src/opentelemetry_mcp/tools/search_spans.py (1)

13-116: LGTM: Clean tool implementation with proper error handling.

The search_spans function is well-structured with clear separation between timestamp parsing, filter validation, query construction, and execution. Error handling returns user-friendly JSON messages, and the delegation to parse_iso_timestamp and SpanSummary.from_span promotes code reuse across tools.

Comment on lines 52 to 59
trace_summaries = await backend.search_traces(query)

# Track models with their statistics
models_data: dict[str, dict[str, Any]] = {}

for summary in trace_summaries:
# Get full trace to access LLM spans
trace = await backend.get_trace(summary.trace_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential N+1 query performance issue.

The BaseBackend.search_traces() interface documents that it returns "List of matching traces with all spans". If this is accurate, calling get_trace() for each result creates an N+1 query pattern (1 search + N individual fetches).

Verify what search_traces actually returns:

If search_traces returns full TraceData, apply this diff to eliminate redundant calls:

 # Search traces trace_summaries = await backend.search_traces(query) # Track models with their statistics models_data: dict[str, dict[str, Any]] = {} - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) - + for trace in trace_summaries: for span in trace.llm_spans:

🏁 Script executed:

#!/bin/bash # Check if search_traces returns full traces or summaries rg -n -C5 'async def search_traces' --type=py

Length of output: 4718


🏁 Script executed:

#!/bin/bash # Extract BaseBackend methods with their docstrings cat src/opentelemetry_mcp/backends/base.py | grep -A 20 "async def search_traces\|async def get_trace"

Length of output: 70


🏁 Script executed:

#!/bin/bash # Get full context of list_models.py around lines 52-59 cat src/opentelemetry_mcp/tools/list_models.py | head -80 | tail -35

Length of output: 74


🏁 Script executed:

#!/bin/bash # Check TraceData model definition cat src/opentelemetry_mcp/models.py | grep -A 10 "class TraceData"

Length of output: 63


🏁 Script executed:

#!/bin/bash # Extract BaseBackend methods with their docstrings cat src/opentelemetry_mcp/backends/base.py | grep -A 20 "async def search_traces\|async def get_trace"

Length of output: 1039


🏁 Script executed:

#!/bin/bash # Get full context of list_models.py around lines 52-59 cat src/opentelemetry_mcp/tools/list_models.py | head -80 | tail -35

Length of output: 1284


🏁 Script executed:

#!/bin/bash # Check TraceData model definition cat src/opentelemetry_mcp/models.py | grep -A 15 "class TraceData"

Length of output: 437


Fix redundant N+1 query pattern in trace fetching.

The BaseBackend.search_traces() method already returns "List of matching traces with all spans" (returns list[TraceData]), so calling get_trace() for each result is redundant. This creates an N+1 query pattern: 1 search + N individual fetches.

Apply this fix to eliminate the redundant calls:

 # Search traces trace_summaries = await backend.search_traces(query) # Track models with their statistics models_data: dict[str, dict[str, Any]] = {} - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) - + for trace in trace_summaries: for span in trace.llm_spans:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
trace_summaries = await backend.search_traces(query)
# Track models with their statistics
models_data: dict[str, dict[str, Any]] = {}
for summary in trace_summaries:
# Get full trace to access LLM spans
trace = await backend.get_trace(summary.trace_id)
trace_summaries = await backend.search_traces(query)
# Track models with their statistics
models_data: dict[str, dict[str, Any]] = {}
for trace in trace_summaries:
for span in trace.llm_spans:
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/tools/list_models.py around lines 52 to 59, the code currently performs an N+1 query by calling backend.get_trace() for every trace returned by backend.search_traces(), but search_traces already returns full TraceData with spans; remove the per-summary get_trace() call and use the spans and other fields directly from each summary (e.g., summary.spans / summary.trace_id / summary.attributes) when collecting model statistics, updating variable names and any type hints accordingly so no extra backend calls are made. 
Comment on lines 62 to 68
for summary in trace_summaries:
# Get full trace to access LLM spans
trace = await backend.get_trace(summary.trace_id)

# Only include traces that have LLM spans
if not trace.llm_spans:
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Performance concern: Sequential full-trace fetches could be slow.

The code sequentially fetches full trace data for every trace summary (potentially up to 1000 traces). If the backend is remote, this could result in hundreds of network round-trips. Consider:

  1. Parallelizing the trace fetches using asyncio.gather() for better performance
  2. Implementing early termination once enough slow traces are found
  3. Verifying whether backend summaries already include duration/LLM metadata to avoid fetching full traces

Example parallelization approach:

# Fetch traces in parallel trace_futures = [backend.get_trace(summary.trace_id) for summary in trace_summaries] traces = await asyncio.gather(*trace_futures) for trace in traces: if not trace.llm_spans: continue # ... rest of processing

 --- 🏁 Script executed: ```shell #!/bin/bash # Explore repository structure and find the slow_traces.py file find . -name "slow_traces.py" -type f 2>/dev/null | head -20 

Length of output: 44


🏁 Script executed:

#!/bin/bash # Examine slow_traces.py file with context around lines 62-68 if [ -f "src/opentelemetry_mcp/tools/slow_traces.py" ]; then cat -n "src/opentelemetry_mcp/tools/slow_traces.py" | sed -n '40,90p' fi

Length of output: 2071


🏁 Script executed:

#!/bin/bash # Search for backend interface/class definitions rg -n "class.*Backend|def get_trace" --type py -C 3

Length of output: 5561


🏁 Script executed:

#!/bin/bash # Search for trace_summaries usage and where it comes from rg -n "trace_summaries|get_traces" --type py -C 3

Length of output: 3748


🏁 Script executed:

#!/bin/bash # Look for any pagination or limiting logic rg -n "limit|offset|pagination|max_traces" --type py -C 2

Length of output: 33041


🏁 Script executed:

#!/bin/bash # Check for asyncio.gather or similar parallelization patterns already in code rg -n "asyncio\.(gather|create_task)|TaskGroup" --type py

Length of output: 0


🏁 Script executed:

#!/bin/bash # Search for trace filtering or early termination patterns rg -n "break|early.*return|found.*enough" --type py -C 2

Length of output: 2183


🏁 Script executed:

#!/bin/bash # Look for LLM span filtering logic rg -n "llm_span|LLMSpan|llm.*span" --type py -C 3

Length of output: 20264


Performance concern verified: Sequential full-trace fetches across multiple tools cause unnecessary network round-trips.

The code sequentially fetches full trace data for every trace summary—potentially up to 1000 traces (line 53: limit=min(limit * 10, 1000)). This pattern exists in slow_traces.py, model_stats.py, list_models.py, and expensive_traces.py. Each backend.get_trace() call is a remote HTTP call (confirmed in Jaeger, Tempo, Traceloop backends). The Tempo backend already caps at 50 traces with a warning about excessive API calls.

Additionally, trace summaries already include LLM span metadata (per TraceSummary model), but filtering happens after fetching full traces (line 67–68), making the initial fetch inefficient.

Recommended fixes:

  1. Parallelize trace fetches using asyncio.gather() to reduce total latency
  2. Verify whether TraceQuery supports filtering by LLM span presence to avoid full-trace fetches entirely
  3. Apply fixes consistently across all affected tools (slow_traces, expensive_traces, model_stats, list_models)
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/tools/slow_traces.py around lines 62 to 68, the code fetches full traces sequentially for every TraceSummary even when many have no LLM spans; change the logic to first filter trace_summaries by the TraceSummary.llm_spans metadata and only fetch those, and when fetching multiple traces perform the remote calls concurrently using asyncio.gather (or another batching strategy) to avoid sequential HTTP round-trips; additionally check whether TraceQuery/ backend supports a server-side filter for presence of LLM spans and prefer that to eliminate full-trace fetches entirely, and apply the same pattern to expensive_traces.py, model_stats.py, and list_models.py for consistent performance improvements. 
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/opentelemetry_mcp/backends/traceloop.py (1)

529-534: Boolean filter values serialize incorrectly when sent to Traceloop API.

Line 532 uses str(value) for all value types, which produces capitalized "True"/"False" for Python booleans. However, JSON—and consequently the Traceloop API—requires lowercase "true"/"false". Since FilterType.BOOLEAN is actively used in the codebase (filter_engine.py:245), boolean filters will fail to match in the Traceloop API.

Apply this fix:

+ # Serialize value based on type to match JSON format + if isinstance(value, bool): + serialized_value = "true" if value else "false" + else: + serialized_value = str(value) + return { "field": traceloop_field, "operator": traceloop_operator, - "value": str(value), + "value": serialized_value, "value_type": traceloop_value_type, }
🧹 Nitpick comments (3)
src/opentelemetry_mcp/backends/traceloop.py (3)

57-57: Consider using a standard environment name.

The default environment "prd" is non-standard. Most observability systems use "production", "prod", or similar conventions. This could cause confusion for users or interoperability issues with other systems.

Apply this diff if you want to align with common conventions:

- self.environments = environments if environments else ["prd"] + self.environments = environments if environments else ["production"]

391-422: Unused parameter in get_service_operations.

The service_name parameter is accepted but not used in the API request, which might be confusing for callers who expect service-specific filtering. While the docstring mentions this limitation, consider either removing the parameter or documenting the plan to use it once the API supports it.

If the API will never support this filtering, consider removing the parameter:

- async def get_service_operations(self, service_name: str) -> list[str]: + async def get_service_operations(self) -> list[str]: """Get operations for a service from Traceloop. Returns workflow names which represent high-level operations. - Args: - service_name: Service name (currently not used for filtering) - Returns: List of operation/workflow names

Alternatively, if it's required by the BaseBackend interface, add a more prominent note in the docstring about why it's unused.


433-438: Reuse self.client for health check—httpx respects absolute URLs even with base_url set.

When using a client with base_url set, passing an absolute URL to a request method causes that URL to take priority. The health_url is fully qualified (e.g., https://example.com/v2/health), so the current temp_client workaround is unnecessary. The suggested optimization—using self.client.get(health_url)—is safe and eliminates redundant client creation and connection overhead on each health check.

 try: - # Health endpoint is at root, not /v2 - health_url = self.url.rstrip("/") + "/health" - # Make request without base_url override - async with httpx.AsyncClient(timeout=self.timeout) as temp_client: - response = await temp_client.get(health_url) - response.raise_for_status() + # Health endpoint is at root, not /v2 - use absolute URL + health_url = self.url.rstrip("/") + "/health" + response = await self.client.get(health_url) + response.raise_for_status()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9806faa and e293174.

📒 Files selected for processing (1)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
CLAUDE.md (3)

111-111: Remove redundant phrase in component list.

"CLI interface" is tautological since CLI stands for Command-Line Interface. Simplify to "CLI" for clarity.

-[server.py](opentelemetry_mcp/server.py) - FastMCP application, CLI interface, tool handlers +[server.py](opentelemetry_mcp/server.py) - FastMCP application, CLI, tool handlers

452-452: Replace vague intensifier with specific language.

"Very expensive" is imprecise. Consider using more descriptive terms like "resource-intensive" or quantify the concern if possible.

-Querying all services would be very expensive (especially for spans) +Querying all services would be resource-intensive (especially for spans)

466-470: Add language identifier to fenced code block.

The code block at line 466 lacks a language specifier, which reduces syntax highlighting clarity in rendered Markdown. This is a style best practice.

-``` +``` ValueError: Jaeger backend requires 'service_name' parameter. Use list_services() to see available services, then specify one with service_name parameter. -``` +```
src/opentelemetry_mcp/server.py (1)

83-116: Consider adding backend cleanup on server shutdown.

The lazy initialization correctly handles event loop concerns, and the soft health check is reasonable. However, the backend's HTTP client is never explicitly closed when the server shuts down.

Consider adding cleanup in the main() function:

 except KeyboardInterrupt: logger.info("Server stopped by user") + if _backend: + import asyncio + asyncio.run(_backend.close()) sys.exit(0) except Exception as e: logger.error(f"Server error: {e}", exc_info=True) + if _backend: + import asyncio + asyncio.run(_backend.close()) sys.exit(1)

Alternatively, if FastMCP supports lifecycle hooks, use those instead for cleaner integration.

src/opentelemetry_mcp/models.py (1)

505-507: Move json import to module level.

Importing json inside the method is inefficient. Move it to the top of the file with other imports.

 """Data models for OpenTelemetry traces and Opentelemetry conventions.""" from datetime import datetime from enum import Enum +import json from typing import Literal from pydantic import BaseModel, Field, model_validator

Then remove the local import:

 if all_tags: - # Jaeger expects JSON-encoded tags - import json - params["tags"] = json.dumps(all_tags)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e293174 and b0aeb89.

📒 Files selected for processing (8)
  • CLAUDE.md (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/config.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/errors.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
  • src/opentelemetry_mcp/tools/usage.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 

Applied to files:

  • src/opentelemetry_mcp/config.py
  • src/opentelemetry_mcp/server.py
🧬 Code graph analysis (4)
src/opentelemetry_mcp/tools/usage.py (2)
src/opentelemetry_mcp/models.py (7)
  • LLMSpanAttributes (113-239)
  • TraceQuery (314-509)
  • UsageMetrics (242-279)
  • gen_ai_system (103-105)
  • from_span (140-239)
  • from_span (715-731)
  • add_span (256-279)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/tools/errors.py (3)
src/opentelemetry_mcp/backends/base.py (1)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • TraceQuery (314-509)
  • TraceSummary (512-540)
  • has_error (108-110)
  • from_trace (527-540)
  • is_llm_span (98-100)
  • gen_ai_system (103-105)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
src/opentelemetry_mcp/server.py (2)
src/opentelemetry_mcp/backends/base.py (6)
  • BaseBackend (12-160)
  • search_spans (80-92)
  • health_check (137-146)
  • search_traces (65-77)
  • get_trace (95-107)
  • list_services (110-119)
src/opentelemetry_mcp/config.py (4)
  • ServerConfig (68-125)
  • from_env (34-65)
  • from_env (76-98)
  • apply_cli_overrides (100-125)
🪛 LanguageTool
CLAUDE.md

[style] ~111-~111: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...y_mcp/server.py) - FastMCP application, CLI interface, tool handlers - [config.py](openteleme...

(ACRONYM_TAUTOLOGY)


[style] ~452-~452: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ueries - Querying all services would be very expensive (especially for spans) - Users can easi...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
CLAUDE.md

466-466: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (25)
CLAUDE.md (1)

1-564: Comprehensive and well-structured documentation.

This is a thoughtful guidance document that clearly explains the project architecture, tool patterns, filtering system, and development workflow. The inclusion of concrete examples (JSON filters, code snippets, expected responses) and architecture diagrams (via file references and section structure) will help Claude Code (and developers) navigate the codebase effectively.

Key strengths:

  • Clear backend abstraction pattern with concrete implementation references
  • Detailed tool documentation with use cases and parameter descriptions
  • Comprehensive filter system explanation with multiple examples and backend support matrix
  • Development patterns that establish type safety, async conventions, and error handling expectations
src/opentelemetry_mcp/tools/usage.py (3)

11-43: LGTM: Clean timestamp parsing with proper error handling.

The function signature is well-typed, and the use of the shared parse_iso_timestamp utility ensures consistent behavior across tools. Early returns on parse errors follow good practice.


45-68: LGTM: Solid aggregation logic.

The trace query construction and metric aggregation are well-structured. The iteration through traces and spans with proper null-checking (if llm_attrs) is defensive and correct.


69-114: LGTM: Comprehensive result structure with accurate filter reporting.

The result dictionary provides a clear breakdown of usage metrics by model and service. The filters section (lines 75-80) accurately reflects the query parameters used—there's no deprecated parameter handling in the current implementation, so the output correctly represents what was queried.

Note: The past review comment about misleading filter output is no longer applicable to this code.

src/opentelemetry_mcp/tools/errors.py (3)

11-46: LGTM: Consistent patterns and appropriate defaults.

The function follows the same solid patterns as other tools in the package. The default limit of 100 is appropriate for error traces, which are typically a smaller subset of all traces.


48-110: LGTM: Thorough error detail extraction with sensible safeguards.

The error extraction logic is comprehensive, handling multiple error attribute sources (error.message, exception.message) with appropriate fallbacks. The 500-character truncation of stack traces prevents payload bloat while preserving useful debugging context. The LLM-specific error enrichment is a nice touch for AI application debugging.


111-118: LGTM: Clean result structure and error handling.

The result format is straightforward and the exception handling ensures the tool always returns valid JSON, even on unexpected failures.

src/opentelemetry_mcp/tools/search.py (2)

60-79: LGTM: Proper use of shared utilities and robust filter validation.

The timestamp parsing now correctly uses the shared parse_iso_timestamp utility (addressing past feedback), and the filter parsing includes proper error handling for both validation errors and unexpected exceptions.


81-117: LGTM: TraceQuery validation properly handled.

The TraceQuery construction is now wrapped in a try-except block that catches ValidationError (addressing past feedback), ensuring invalid query parameters return a structured JSON error instead of crashing. The search execution and result formatting are clean and consistent with other tools.

src/opentelemetry_mcp/config.py (3)

33-65: LGTM: Robust environment variable loading with proper validation.

The from_env method includes good validation for required fields and enum values. The timeout parsing now includes proper error handling (addressing past feedback), with a fallback to the default value and a warning log when invalid values are encountered.


75-98: LGTM: Consistent defaults and proper error handling.

The server configuration loading includes proper validation for all fields. The max_traces_per_query parsing now includes error handling (addressing past feedback), and the default values are consistent—both the field default (line 73: default=500) and the fallback (line 92) use 500, resolving the previous inconsistency concern.


100-125: LGTM: CLI overrides properly implemented with correct URL validation.

The CLI override logic includes appropriate validation for backend type and correctly uses TypeAdapter(HttpUrl).validate_python() (addressing past feedback about the non-callable HttpUrl type). The method provides a clean way to apply runtime configuration changes.

src/opentelemetry_mcp/backends/traceloop.py (5)

33-68: LGTM: Clean initialization with proper validation.

The constructor enforces the API key requirement early and documents the design decision to use "default" as the project ID since Traceloop resolves the actual project from the API key. The default environment of "prd" is sensible for production use.


87-179: LGTM: Well-designed hybrid filtering approach with resolved limit concern.

The search_traces implementation uses a thoughtful hybrid filtering strategy, sending supported filters to the backend and applying unsupported filters client-side. The limit is now used directly (line 140: "limit": query.limit), addressing the past concern about exceeding backend limits through multiplication. The logging provides good visibility into which filters are applied where.


181-307: LGTM: Consistent implementation with proper service name extraction.

The search_spans method follows the same solid patterns as search_traces. The service name is now properly extracted from span attributes (line 288: raw_attrs.get("service.name", "")), addressing past feedback. The attribute transformation from Traceloop's llm.* format to the canonical gen_ai.* format is handled cleanly.


456-539: LGTM: Comprehensive filter conversion with proper boolean handling.

The filter conversion logic handles backend-specific attribute naming (gen_ai.*llm.*) appropriately. Boolean serialization now correctly produces lowercase "true"/"false" strings (lines 529-532), addressing past feedback about API compatibility. The pattern of returning None for unsupported filters enables the hybrid filtering strategy.


586-718: LGTM: Proper service name extraction in trace conversions.

Both trace conversion methods now correctly populate the service name from the appropriate sources—root_span.get("service_name", "") at line 609 and raw_attrs.get("service.name", "") at line 674—addressing past feedback. The trace duration calculation properly handles multiple spans by finding the minimum start time and maximum end time.

src/opentelemetry_mcp/server.py (3)

43-81: LGTM!

The backend factory cleanly handles the three supported backend types with appropriate parameterization. The Traceloop-specific environments parameter is correctly passed only to that backend.


118-197: LGTM on tool design and documentation.

The tool signatures are well-designed with comprehensive documentation. The dual support for legacy parameters and the new generic filter system is clearly explained with helpful examples.


588-695: LGTM on CLI implementation.

The Click-based CLI is well-structured with appropriate configuration loading, override handling, and transport selection. The lazy backend initialization correctly defers setup to the FastMCP event loop.

src/opentelemetry_mcp/models.py (5)

45-80: LGTM on Filter validation.

The validate_filter_values method correctly enforces operator-specific value requirements with clear error messages.


83-111: LGTM on SpanData model.

The model correctly represents OpenTelemetry span structure with appropriate computed properties for LLM detection and error checking.


139-239: LGTM on LLM attribute extraction.

The from_span method robustly handles multiple token naming conventions and provides sophisticated total token calculation that goes beyond simple addition. The handling of finish reasons and content previews is well thought out.


403-421: LGTM - has_error filter correctly includes UNSET statuses.

The filter translation now correctly uses NOT_EQUALS for the false case, ensuring that both "OK" and "UNSET" statuses are included when filtering for non-error traces. This addresses the issue from previous reviews.


242-280: LGTM on usage metrics and summary models.

The UsageMetrics aggregation correctly maintains hierarchical breakdowns, and the summary factory methods provide clean conversions from full data to compact representations.

Also applies to: 282-312, 512-541, 699-731

Comment on lines 194 to 196
except Exception as e:
logger.error(f"Error executing search_traces: {e}", exc_info=True)
return f'{{"error": "Tool execution failed: {str(e)}"}}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix JSON error formatting to prevent malformed responses.

The error response format doesn't escape special characters in exception messages, which will produce invalid JSON if the exception contains quotes, backslashes, or other special characters.

Apply this pattern to all tool error handlers:

+import json + @mcp.tool() async def search_traces(...) -> str: try: ... except Exception as e: logger.error(f"Error executing search_traces: {e}", exc_info=True) - return f'{{"error": "Tool execution failed: {str(e)}"}}' + return json.dumps({"error": f"Tool execution failed: {str(e)}"})

This same issue affects all 11 tool functions (lines 196, 217, 261, 277, 312, 350, 386, 430, 474, 546, 585).

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed f90938b in 2 minutes and 47 seconds. Click for details.
  • Reviewed 408 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/opentelemetry_mcp/models.py:503
  • Draft comment:
    Good refactoring: consolidating duplicate legacy filter logic into _convert_legacy_params_to_filters reduces code duplication in get_all_filters methods for TraceQuery and SpanQuery.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_2ZTCNwHPTDuwms8D

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/opentelemetry_mcp/models.py (2)

282-383: Consider including float values in usage token calculation.

The token extraction logic is comprehensive and handles multiple naming conventions well. However, line 318's isinstance(value, int) check might exclude usage attributes with float values, potentially underreporting token usage.

Consider broadening the type check:

 attrs_dict = attrs.to_dict() for key, value in attrs_dict.items(): - if key.startswith("gen_ai.usage.") and isinstance(value, int): + if key.startswith("gen_ai.usage.") and isinstance(value, (int, float)): usage_sum += value

Additionally, lines 377-379 convert zero token counts to None. If this is intentional to distinguish "no data" from "zero tokens," consider adding a comment explaining the rationale.


516-557: Move inline import to module level.

The to_backend_params method correctly converts query parameters to Jaeger-compatible format. However, the inline import json at line 552 could be moved to the module's top-level imports for consistency and to avoid repeated import overhead.

Apply this diff:

At the top of the file (after line 7):

 from pydantic import BaseModel, Field, model_validator +import json from .attributes import SpanAttributes, SpanEvent

Then remove the inline import:

 if all_tags: # Jaeger expects JSON-encoded tags - import json - params["tags"] = json.dumps(all_tags)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0aeb89 and f90938b.

📒 Files selected for processing (1)
  • src/opentelemetry_mcp/models.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
🧬 Code graph analysis (1)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (8)
src/opentelemetry_mcp/models.py (8)

12-81: LGTM! Comprehensive filter system with solid validation.

The FilterOperator, FilterType, and Filter classes provide a well-designed filtering abstraction. The model_validator correctly enforces operator-specific requirements: existence operators accept no values, multi-value operators require values, and BETWEEN requires exactly two values.


83-224: Well-structured helper addressing code duplication.

This function successfully consolidates the legacy-to-filter conversion logic previously duplicated between TraceQuery and SpanQuery. The has_error=False case correctly uses NOT_EQUALS to include both UNSET and OK statuses, resolving the issue flagged in past reviews.

Based on past review comments addressing duplicate code.


226-254: LGTM! Clean span data model.

The SpanData model properly encapsulates OpenTelemetry span information with well-defined computed properties. The type: ignore on line 237 is appropriate for Pydantic's handling of default_factory with custom types.


385-423: LGTM! Clean aggregation logic.

The UsageMetrics model provides clear hierarchical breakdowns by model and service. The add_span method correctly aggregates token usage at all levels with appropriate fallbacks for missing data.


425-455: LGTM! Efficient trace aggregation.

The TraceData model provides well-designed computed properties for LLM spans, error detection, and token aggregation. The implementation efficiently filters and aggregates span data.


559-588: LGTM! Clean summary model.

The TraceSummary model provides an efficient compact representation of trace data. The from_trace factory method correctly extracts computed properties from the full TraceData model.


590-648: LGTM! Consistent span query model.

The SpanQuery model maintains consistency with TraceQuery and correctly reuses the _convert_legacy_params_to_filters helper. The absence of to_backend_params appears intentional, as span queries may use different backend APIs than trace queries.


650-682: LGTM! Well-structured span summary.

The SpanSummary model provides an appropriate compact view of span data. The from_span method correctly handles both LLM and non-LLM spans, extracting LLM-specific attributes only when applicable.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 76486fa in 1 minute and 6 seconds. Click for details.
  • Reviewed 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/opentelemetry_mcp/models.py:137
  • Draft comment:
    Good change: using 'is not None' prevents 0 ms from being skipped. Consider using similar explicit checks for max_duration_ms if 0 is a valid value.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_b7wcXDlzlnq7nF5v

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/opentelemetry_mcp/models.py (2)

377-379: Reconsider converting 0 tokens to None.

After fixing the token extraction logic, these lines will convert explicit 0 values to None with the pattern prompt_tokens if prompt_tokens else None. This treats 0 as falsy and converts it to None, potentially hiding legitimate zero-token usage (e.g., cached LLM responses).

Consider whether to distinguish "no token data available" (None) from "zero tokens used" (0).

If 0 should be preserved:

- prompt_tokens=prompt_tokens if prompt_tokens else None, - completion_tokens=completion_tokens if completion_tokens else None, - total_tokens=total_tokens if total_tokens else None, + prompt_tokens=prompt_tokens, + completion_tokens=completion_tokens, + total_tokens=total_tokens,

Then update the field defaults in LLMSpanAttributes if needed, or use explicit None checks: prompt_tokens if prompt_tokens is not None else None.


550-555: Consider moving json import to module level.

Importing json inside to_backend_params() works but is unconventional. For consistency with other imports, consider moving it to the top of the file.

 from datetime import datetime from enum import Enum +import json from typing import Literal

Then remove the import from line 552.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f90938b and 76486fa.

📒 Files selected for processing (1)
  • src/opentelemetry_mcp/models.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
🧬 Code graph analysis (1)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
🔇 Additional comments (8)
src/opentelemetry_mcp/models.py (8)

45-81: LGTM: Robust filter validation.

The Filter class with its validate_filter_values post-validator properly enforces operator-specific constraints, ensuring EXISTS/NOT_EXISTS take no values, IN/NOT_IN/BETWEEN require values, and BETWEEN gets exactly 2 values.


226-254: LGTM: Clean span data model.

The SpanData model with computed properties (is_llm_span, gen_ai_system, has_error) provides a clear interface. The type: ignore on line 237 appears to be a Pydantic typing limitation with default_factory.


310-326: Verify: should explicit total_tokens=0 be recalculated?

Line 312 uses if not total_tokens:, which treats an explicit 0 from attrs.gen_ai_usage_total_tokens the same as None, triggering recalculation. If a span explicitly reports total_tokens=0 (e.g., a cached response or error case), should that be respected, or is 0 considered "not set"?

If explicit 0 should be preserved, change to an explicit None check:

- if not total_tokens: + if total_tokens is None: # Search for all gen_ai.usage.* attributes and sum numeric values

385-423: LGTM: Clean usage metrics aggregation.

The UsageMetrics.add_span method appropriately uses or 0 to treat None as 0 during accumulation, and maintains hierarchical breakdowns by model and service.


425-455: LGTM: Well-structured trace data model.

The TraceData properties (llm_spans, has_errors, total_llm_tokens) provide convenient computed views of the trace data.


559-588: LGTM: Clean trace summary model.

The TraceSummary.from_trace factory method properly extracts key metrics for list views.


590-648: LGTM: Consistent span query model.

SpanQuery appropriately mirrors TraceQuery and reuses the _convert_legacy_params_to_filters helper to avoid duplication.


650-682: LGTM: Clean span summary model.

The SpanSummary.from_span factory method cleanly handles both LLM and non-LLM spans, extracting relevant attributes as needed.

Comment on lines 147 to 155
if max_duration_ms:
all_filters.append(
Filter(
field="duration",
operator=FilterOperator.LTE,
value=max_duration_ms,
value_type=FilterType.NUMBER,
)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit None check for max_duration_ms consistency.

Line 147 uses a truthy check (if max_duration_ms:), which will skip the filter when max_duration_ms=0. Since the Pydantic schema allows ge=0, a value of 0 is valid and should generate a filter. This is inconsistent with min_duration_ms at line 137, which correctly uses is not None.

Apply this diff:

- if max_duration_ms: + if max_duration_ms is not None: all_filters.append( Filter( field="duration",
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/models.py around lines 147 to 155, the condition currently uses a truthy check (if max_duration_ms:) which incorrectly skips valid zero values; change the condition to an explicit None check (if max_duration_ms is not None:) to mirror min_duration_ms at line 137 so that max_duration_ms=0 produces the duration filter; keep the rest of the Filter construction unchanged. 
Comment on lines 291 to 304
prompt_tokens = (
attrs.gen_ai_usage_prompt_tokens
or attrs.gen_ai_usage_input_tokens
or attrs.llm_usage_prompt_tokens
or attrs.llm_usage_input_tokens
or 0
)
completion_tokens = (
attrs.gen_ai_usage_completion_tokens
or attrs.gen_ai_usage_output_tokens
or attrs.llm_usage_completion_tokens
or attrs.llm_usage_output_tokens
or 0
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix token extraction to handle explicit 0 values correctly.

The or chains treat 0 as falsy, so an explicit gen_ai_usage_prompt_tokens=0 will be skipped in favor of checking gen_ai_usage_input_tokens, potentially returning an incorrect value. For example, if gen_ai_usage_prompt_tokens=0 and llm_usage_prompt_tokens=100, the code returns 100 instead of the explicit 0.

Use explicit None checks to distinguish absent values from 0:

- prompt_tokens = ( - attrs.gen_ai_usage_prompt_tokens - or attrs.gen_ai_usage_input_tokens - or attrs.llm_usage_prompt_tokens - or attrs.llm_usage_input_tokens - or 0 - ) - completion_tokens = ( - attrs.gen_ai_usage_completion_tokens - or attrs.gen_ai_usage_output_tokens - or attrs.llm_usage_completion_tokens - or attrs.llm_usage_output_tokens - or 0 - ) + # Use explicit None checks to preserve explicit 0 values + prompt_tokens = next( + ( + x for x in [ + attrs.gen_ai_usage_prompt_tokens, + attrs.gen_ai_usage_input_tokens, + attrs.llm_usage_prompt_tokens, + attrs.llm_usage_input_tokens, + ] + if x is not None + ), + 0, + ) + completion_tokens = next( + ( + x for x in [ + attrs.gen_ai_usage_completion_tokens, + attrs.gen_ai_usage_output_tokens, + attrs.llm_usage_completion_tokens, + attrs.llm_usage_output_tokens, + ] + if x is not None + ), + 0, + )
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/models.py around lines 291 to 304, the prompt_tokens and completion_tokens assignments use "or" chains which treat 0 as falsy and therefore skip explicit zero values; change each chain to pick the first value that is not None (preserving explicit 0) and default to 0 if all are None — e.g., replace the "or" chains with an expression that checks "is not None" for each candidate (or use a first-non-None helper/next(...) pattern) for both prompt_tokens and completion_tokens so an explicit 0 is returned rather than falling through to later fields. 
@doronkopit5 doronkopit5 changed the title feat: Initial commit of OpenLLMetry MCP Server feat: Initial commit of OpenTelemetry MCP Server Nov 13, 2025
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/opentelemetry_mcp/backends/jaeger.py (2)

358-358: Consider removing type: ignore by aligning status types.

The # type: ignore comments on lines 358 and 467 suggest a type mismatch between string literals and the expected status field type. Consider using proper enum types or Literal types to eliminate the need for type ignores.

Also applies to: 467-467


361-363: Enhance error messages with identifiers for debugging.

The error logging in both parsing methods would be more useful if it included the trace_id or span_id to help identify which specific trace/span failed to parse.

Apply these improvements:

In _parse_jaeger_trace:

-logger.error(f"Error parsing trace: {e}") +logger.error(f"Error parsing trace {trace_data.get('traceID', 'unknown')}: {e}")

In _parse_jaeger_span:

-logger.error(f"Error parsing span: {e}") +logger.error(f"Error parsing span {span_data.get('spanID', 'unknown')} in trace {span_data.get('traceID', 'unknown')}: {e}")

Also applies to: 472-474

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76486fa and 2728a54.

📒 Files selected for processing (1)
  • src/opentelemetry_mcp/backends/jaeger.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (8)
src/opentelemetry_mcp/backends/jaeger.py (8)

1-27: LGTM!

The imports are well-organized, and the header creation with optional Bearer token authentication is implemented correctly.


29-40: LGTM!

The method correctly documents Jaeger's limited native filtering capabilities, with most filtering handled client-side.


42-83: LGTM! Good hybrid filtering approach.

The method correctly validates required parameters, partitions filters between native and client-side execution, and provides helpful error messages. The hybrid filtering approach (legacy to_backend_params() + new filter system) is appropriate for Jaeger's limited native filtering.


85-121: LGTM!

The method correctly fetches traces from the Jaeger API, parses them, and applies client-side filtering. Error handling and conditional client-side filtering are implemented properly.


175-175: Verify that doubling the limit provides sufficient spans.

The heuristic of doubling the trace limit (line 175) may not retrieve enough matching spans if many traces have no matching spans. Consider whether this approach provides adequate coverage for typical use cases.


202-266: LGTM!

These methods correctly implement Jaeger API calls with appropriate error handling, validation, and type conversions.


268-294: LGTM!

The health check implementation correctly uses list_services() as a probe and provides comprehensive status information with proper exception handling.


365-471: LGTM! Comprehensive span parsing with good fallback handling.

The span parsing logic correctly handles Jaeger's format including extracting IDs, timestamps, service info, parent relationships, tags, status, and events. The fallback logic for service name (lines 401-404) is particularly well done.

Comment on lines 334 to 344
# Calculate trace duration
start_times = [s.start_time for s in spans]
end_times = [
datetime.fromtimestamp(
s.start_time.timestamp() + (s.duration_ms / 1000), tz=s.start_time.tzinfo
)
for s in spans
]
trace_start = min(start_times)
trace_end = max(end_times)
trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Timezone handling issue: use UTC for timestamps.

The code uses datetime.fromtimestamp() without a timezone argument (line 394 in _parse_jaeger_span), which creates naive datetimes in local time. Jaeger timestamps are typically in UTC, so this will cause incorrect time displays and duration calculations if the server is not running in UTC.

Add from datetime import timezone to imports and update timestamp parsing:

In _parse_jaeger_span (around line 394):

-start_time = datetime.fromtimestamp(start_time_us / 1_000_000) +start_time = datetime.fromtimestamp(start_time_us / 1_000_000, tz=timezone.utc)

In this method, update the end time calculation (line 337-340):

-datetime.fromtimestamp( - s.start_time.timestamp() + (s.duration_ms / 1000), tz=s.start_time.tzinfo -) +datetime.fromtimestamp( + s.start_time.timestamp() + (s.duration_ms / 1000), tz=timezone.utc +)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/jaeger.py around lines 334 to 344, timestamps are being created without explicit timezone which yields naive local-time datetimes; import timezone from datetime and ensure all timestamps are created/normalized to UTC: change datetime.fromtimestamp(...) calls to datetime.fromtimestamp(..., tz=timezone.utc) for end time creation and, if start_time may be naive, convert or replace its tzinfo to timezone.utc before using min/max so trace_start/trace_end and trace_duration_ms are calculated in UTC consistently. 
@ellipsis-dev
Copy link

ellipsis-dev bot commented Nov 13, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/opentelemetry_mcp/backends/tempo.py (2)

615-617: Consider validating SpanAttributes construction.

The type: ignore[arg-type] comment suggests that attributes_dict may contain keys or value types not expected by SpanAttributes. While this is necessary for handling dynamic OTLP attributes, consider:

  • Adding runtime validation or filtering of known attribute keys
  • Using a more flexible attributes model if the current SpanAttributes is too strict
  • Documenting which attributes are expected vs. passed through

695-697: Array attribute handling is simplified.

Array values are converted to strings, which loses structure and makes them difficult to query or filter. This is acceptable for an initial implementation, but consider implementing proper array support if filtering or querying array attributes becomes necessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2728a54 and 402ac14.

📒 Files selected for processing (1)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nina-kollman Repo: traceloop/openllmetry PR: 3388 File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20 Timestamp: 2025-09-18T14:36:24.693Z Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently. 
🔇 Additional comments (3)
src/opentelemetry_mcp/backends/tempo.py (3)

26-56: LGTM!

The header construction and operator support declaration are clean and appropriate for Tempo's TraceQL capabilities.


173-186: Well-designed hybrid filtering approach.

The distinction between trace-level and span-level filters is well thought out and correctly accounts for TraceQL's behavior where span filters would match entire traces.


443-461: Verified: Duration value format assumptions are correct.

The code correctly assumes duration values are plain numbers. Filter creation in models.py (lines 137-154) establishes that duration filters are always created with value_type=FilterType.NUMBER and values set directly from min_duration_ms/max_duration_ms (integer types). No unit suffixes are applied during filter creation, only when appending "ms" in _filter_to_traceql. The assumption is sound.

Comment on lines 91 to 103
if query.start_time:
params["start"] = int(query.start_time.timestamp())
else:
from datetime import timedelta

params["start"] = int((datetime.now() - timedelta(days=1)).timestamp())

if query.end_time:
params["end"] = int(query.end_time.timestamp())
else:
from datetime import timedelta

params["end"] = int((datetime.now() + timedelta(hours=1)).timestamp())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Move the timedelta import to the top of the file.

The timedelta import appears twice within this method (lines 94 and 101) and is duplicated in other methods as well. Move it to the module-level imports for better efficiency and readability.

Apply this diff to move the import:

 from datetime import datetime +from datetime import timedelta from typing import Any, Literal

Then remove the local imports throughout the file.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/tempo.py around lines 91 to 103, the timedelta import is duplicated inside the function (lines ~94 and ~101) and elsewhere in the file; move "from datetime import timedelta" to the module-level imports at the top of the file, remove all local/inline "from datetime import timedelta" statements throughout the file, and update any existing import block to include timedelta so all references use the single top-level import. 
Comment on lines 463 to 465
elif operator == FilterOperator.CONTAINS:
# Use regex for contains
return f'{traceql_field} =~ ".*{value}.*"'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Escape regex special characters to prevent injection and errors.

The CONTAINS operator builds a regex pattern without escaping special characters in the value. If value contains regex metacharacters (., *, +, ?, [, ], (, ), {, }, |, \, ^, $), this will cause query errors or unintended matching behavior.

Apply this diff to escape the value:

+import re + """Grafana Tempo backend implementation with TraceQL support."""
 elif operator == FilterOperator.CONTAINS: # Use regex for contains - return f'{traceql_field} =~ ".*{value}.*"' + escaped_value = re.escape(str(value)) + return f'{traceql_field} =~ ".*{escaped_value}.*"'

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/tempo.py around lines 463-465, the CONTAINS branch builds an unescaped regex from user-controlled `value`, which allows regex metacharacters to break queries or enable injection; fix by escaping `value` with Python's re.escape (import re at top) before interpolating it into the TraceQL pattern (e.g. escaped = re.escape(value)) and use that escaped string in the f-string, making sure the resulting string still respects TraceQL quoting/escaping rules; add a unit test for a value containing common metacharacters like . * + ? [ ] ( ) { } | ^ $ \ to verify correct behavior. 
Comment on lines 554 to 558
datetime.fromtimestamp(
s.start_time.timestamp() + (s.duration_ms / 1000), tz=s.start_time.tzinfo
)
for s in all_spans
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential timezone inconsistency in datetime calculations.

The code uses s.start_time.tzinfo assuming datetimes might have timezone info, but _parse_otlp_span (line 608) creates naive datetimes using datetime.fromtimestamp() without a timezone parameter. This inconsistency could cause issues if other parts of the system expect timezone-aware datetimes or when comparing with timestamps from different sources.

Consider applying UTC explicitly:

- start_time = datetime.fromtimestamp(start_time_ns / 1_000_000_000) + from datetime import timezone + start_time = datetime.fromtimestamp(start_time_ns / 1_000_000_000, tz=timezone.utc)

Verify how other backends handle timezone information:


🏁 Script executed:

#!/bin/bash # Description: Check how other backends create datetime objects # Search for datetime.fromtimestamp usage across backends rg -n --type=py 'datetime\.fromtimestamp' src/opentelemetry_mcp/backends/ # Check if there's a utility function for datetime parsing rg -n --type=py 'def.*parse.*time' src/opentelemetry_mcp/

Length of output: 908


 Wait, I need to access the repository. Let me correct this: 

🏁 Script executed:

#!/bin/bash # Inspect parse_iso_timestamp utility and surrounding code context # Check parse_iso_timestamp function echo "=== parse_iso_timestamp in utils.py ===" rg -n -A 10 'def parse_iso_timestamp' src/opentelemetry_mcp/ # Check tempo.py line 608 context echo -e "\n=== tempo.py line 608 context (start_time creation) ===" sed -n '600,620p' src/opentelemetry_mcp/backends/tempo.py # Check tempo.py line 554 context echo -e "\n=== tempo.py line 554 context (end_time calculation) ===" sed -n '545,565p' src/opentelemetry_mcp/backends/tempo.py # Check what _parse_otlp_span returns (around line 600) echo -e "\n=== _parse_otlp_span method signature ===" rg -n 'def _parse_otlp_span' src/opentelemetry_mcp/backends/tempo.py -A 3

Length of output: 2861


🏁 Script executed:

#!/bin/bash # Inspect parse_iso_timestamp utility and surrounding code context # Check parse_iso_timestamp function echo "=== parse_iso_timestamp in utils.py ===" rg -n -A 10 'def parse_iso_timestamp' src/opentelemetry_mcp/ # Check tempo.py line 608 context echo -e "\n=== tempo.py line 608 context (start_time creation) ===" sed -n '600,620p' src/opentelemetry_mcp/backends/tempo.py # Check tempo.py line 554 context echo -e "\n=== tempo.py line 554 context (end_time calculation) ===" sed -n '545,565p' src/opentelemetry_mcp/backends/tempo.py # Check what _parse_otlp_span returns (around line 600) echo -e "\n=== _parse_otlp_span method signature ===" rg -n 'def _parse_otlp_span' src/opentelemetry_mcp/backends/tempo.py -A 3

Length of output: 2861


🏁 Script executed:

#!/bin/bash # Verify if other backends have the same issue echo "=== traceloop.py line 696 context ===" sed -n '690,705p' src/opentelemetry_mcp/backends/traceloop.py echo -e "\n=== jaeger.py line 337 context ===" sed -n '330,345p' src/opentelemetry_mcp/backends/jaeger.py echo -e "\n=== jaeger.py line 394 context ===" sed -n '388,405p' src/opentelemetry_mcp/backends/jaeger.py # Check how traceloop.py creates start_time (around line 289, 610, 675) echo -e "\n=== traceloop.py line 610 context ===" sed -n '605,620p' src/opentelemetry_mcp/backends/traceloop.py

Length of output: 2924


Timezone bug in datetime parsing: Unix timestamps misinterpreted as local time instead of UTC.

The code creates naive datetimes from Unix timestamps (which are always UTC) without specifying tz=timezone.utc. This causes datetime.fromtimestamp() to interpret UTC timestamps as local time. Additionally, line 554 compounds the error by passing tz=s.start_time.tzinfo where tzinfo is None, further cementing the naive/misaligned timezone interpretation.

This is also a systematic issue in other backends: jaeger.py (lines 337, 394) and traceloop.py (lines 610, 696) have the same pattern.

Fix for tempo.py line 608:

- start_time = datetime.fromtimestamp(start_time_ns / 1_000_000_000) + from datetime import timezone + start_time = datetime.fromtimestamp(start_time_ns / 1_000_000_000, tz=timezone.utc)

Apply similar fixes to:

  • jaeger.py line 394
  • traceloop.py line 610

Also update line 554 to remove the incorrect tz=s.start_time.tzinfo parameter after start_time is timezone-aware.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/tempo.py around lines 554-558, datetime.fromtimestamp is being called without specifying UTC and is passed tz=s.start_time.tzinfo (which is None), causing Unix timestamps (which are UTC) to be interpreted as local time; change the call to construct timezone-aware datetimes from the Unix timestamp by using tz=timezone.utc (or datetime.fromtimestamp(..., tz=timezone.utc)) and remove the incorrect tz=s.start_time.tzinfo parameter; apply the same fix pattern to jaeger.py at the referenced lines (around 337 and 394) and traceloop.py (around 610 and 696) so all backends parse Unix timestamps as UTC and yield aware datetimes. 
@doronkopit5 doronkopit5 deleted the dk/initial-server branch November 16, 2025 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants