Skip to content

Conversation

@JetoPistola
Copy link
Contributor

@JetoPistola JetoPistola commented Nov 13, 2025

Details

This PR adds display and sorting capabilities for token usage and cost metrics in the experiment items comparison view, enabling users to analyze the resource consumption of their experiment runs.

Features Implemented

  • Display "Total tokens" and "Estimated cost" columns in experiment items table
  • Sorting by both metrics (ascending/descending)
  • Statistics AVG/SUM values displayed in column headers
  • Multi-experiment comparison support with vertical split cell rendering
  • Filtering intentionally excluded (see note below)

Frontend Changes

Files Modified:

  • ExperimentItemsTab.tsx - Added new column definitions with statistics and formatters
  • CostCell.tsx - Enhanced Compare component with generic accessor support for flexible data access
  • types/datasets.ts - Updated ExperimentItem interface to include usage and total_estimated_cost fields

Key Implementation Details:

  • Columns use accessor in customMeta for dynamic data extraction from experiment items
  • Proper number formatting for tokens (e.g., "1,234") and currency formatting for costs (e.g., "$0.01")
  • Handles null/missing data gracefully (displays "-")
  • Statistics toggle between AVG/SUM/Percentile in column headers

Backend Changes

Files Modified:

  • DatasetItemDAO.java - Extended SQL queries to aggregate usage and total_estimated_cost from spans
  • ExperimentsComparisonValidKnownField.java - Added USAGE_TOTAL_TOKENS and TOTAL_ESTIMATED_COST fields
  • SortingFactoryDatasets.java - Added new fields to sortable fields validation
  • FilterQueryBuilder.java - Updated field mappings (for future filtering support)
  • StatsMapper.java - Enhanced to calculate usage statistics (total, prompt, completion tokens) + refactored to eliminate code duplication (see Code Quality section)
  • DatasetsResourceTest.java - Added comprehensive integration tests

Key Implementation Details:

  • SQL queries aggregate span-level data: sumMap(usage) as usage and SUM(total_estimated_cost)
  • Sorting works via usage.total_tokens dot notation (leveraging existing wildcard support)
  • Statistics calculated as avgMap(usage) for tokens and avg(total_estimated_cost) for costs

Code Quality Improvements

Refactoring to Address PR Feedback (Revisions 2-3):

Based on @andrescrz's review feedback about code duplication in StatsMapper.java, we refactored the map processing logic:

Before (4 duplicated blocks across 2 methods):

Map<String, Object> feedbackScoresMap = row.get(FEEDBACK_SCORE, Map.class); if (feedbackScoresMap != null) { feedbackScoresMap.entrySet().stream() .sorted(Map.Entry.comparingByKey()) .forEach(entry -> { double value = entry.getValue() instanceof Number number ? number.doubleValue() : 0.0; stats.add(new AvgValueStat("%s.%s".formatted(FEEDBACK_SCORE, entry.getKey()), value)); }); } // ... duplicated 3 more times for usage, feedbackScores, usageMap

After (single reusable helper):

// Clean call sites addMapStats(row, FEEDBACK_SCORE, stats); addMapStats(row, USAGE, stats); // Reusable helper method private static void addMapStats(Row row, String fieldName, Stream.Builder<ProjectStats.ProjectStatItem<?>> statsBuilder) { Map<String, Object> map = row.get(fieldName, Map.class); if (map != null) { map.entrySet().stream() .sorted(Map.Entry.comparingByKey()) .forEach(entry -> { double value = entry.getValue() instanceof Number number ? number.doubleValue() : 0.0; statsBuilder.add(new AvgValueStat("%s.%s".formatted(fieldName, entry.getKey()), value)); }); } }

Impact:

  • ✅ Eliminated ~28 lines of duplicated code (44% reduction in affected areas)
  • ✅ Single source of truth for map processing logic
  • ✅ DRY principle applied
  • ✅ Easier to maintain and test

Testing

Backend Integration Tests:

  • sortByTotalEstimatedCost__whenDescendingOrder__thenReturnSorted
  • sortByUsageTotalTokens__whenAscendingOrder__thenReturnSorted

Manual Testing:

  • ✅ Verified columns display correctly in single experiment view
  • ✅ Verified multi-experiment comparison rendering with vertical split cells
  • ✅ Tested sorting functionality (ascending/descending)
  • ✅ Tested null/missing data handling
  • ✅ Verified statistics toggle (AVG/SUM/Percentile)
  • ✅ Tested column visibility toggle via column selector

Important Note: Filtering Capability

Filtering by "Total tokens" and "Estimated cost" is intentionally NOT included in this PR due to technical complexity. These metrics are aggregated from spans in ClickHouse, requiring filters to be applied AFTER span aggregation rather than in the main WHERE clause. This necessitates significant SQL query restructuring.

Future work: Filtering capability tracked in OPIK-3115

Change checklist

  • User facing
  • Documentation update

Issues

  • Resolves OPIK-2936
  • Related to OPIK-3115 (filtering capability - future work)

Documentation

N/A - No documentation changes needed for this UI enhancement. The feature follows existing patterns for optional columns and sorting in data tables.

Review Status

  • ✅ Backend: Approved by @andrescrz (all review comments addressed)
  • ⏳ Frontend: Awaiting review
Copilot AI review requested due to automatic review settings November 13, 2025 17:08
@JetoPistola JetoPistola requested a review from a team as a code owner November 13, 2025 17:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds optional "Total tokens" and "Estimated cost" columns to the Experiment Items comparison table, enabling users to view token usage and cost metrics when comparing experiment results.

Key Changes:

  • Added usage and total_estimated_cost fields to the ExperimentItem TypeScript interface
  • Created new TokensCell component with Compare variant for displaying token counts
  • Extended CostCell component with Compare variant for experiment comparison views

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
apps/opik-frontend/src/types/datasets.ts Added usage and total_estimated_cost fields to ExperimentItem interface to support token and cost data
apps/opik-frontend/src/components/shared/DataTableCells/TokensCell.tsx Created new cell component for displaying token counts with Compare variant for experiment comparison
apps/opik-frontend/src/components/shared/DataTableCells/CostCell.tsx Extended existing CostCell with Compare variant for experiment comparison views
apps/opik-frontend/src/components/pages/CompareExperimentsPage/ExperimentItemsTab/ExperimentItemsTab.tsx Added token and cost column definitions to the experiment items comparison table
@JetoPistola JetoPistola marked this pull request as draft November 13, 2025 17:19
@github-actions
Copy link
Contributor

github-actions bot commented Nov 16, 2025

SDK E2E Tests Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 3a01f8b.

♻️ This comment has been updated with latest results.

@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch from 649fc86 to 4f97fbb Compare November 16, 2025 08:57
@github-actions
Copy link
Contributor

github-actions bot commented Nov 16, 2025

Backend Tests Results

  308 files  + 1    308 suites  +1   51m 31s ⏱️ + 1m 15s
5 554 tests + 2  5 547 ✅ + 2  7 💤 ±0  0 ❌ ±0 
5 552 runs  +28  5 545 ✅ +28  7 💤 ±0  0 ❌ ±0 

Results for commit e807a34. ± Comparison against base commit 8119b07.

♻️ This comment has been updated with latest results.

@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch 7 times, most recently from d016fce to 2a245c7 Compare November 16, 2025 14:47
@JetoPistola JetoPistola changed the title [OPIK-2936] [FE] Display Token & Cost Columns in Experiment Items table [OPIK-2936] [BE] [FE] Display Token & Cost Columns in Experiment Items table Nov 16, 2025
@comet-ml comet-ml deleted a comment from Copilot AI Nov 16, 2025
…xperiment Items table Add display and sorting capabilities for token usage and cost metrics in the experiment items comparison view. Features: - Display Total tokens and Estimated cost columns in experiment items table - Sort by both metrics (ascending/descending) - AVG/SUM statistics in column headers - Support for multi-experiment comparison view with vertical split cells Frontend Changes: - Updated ExperimentItem interface to include usage and total_estimated_cost fields - Enhanced CostCell.Compare component with generic accessor support for flexible data access - Added columns to ExperimentItemsTab with proper formatters and statistics Backend Changes: - Extended SQL queries to aggregate usage and cost data from spans - Added USAGE_TOTAL_TOKENS and TOTAL_ESTIMATED_COST to sortable fields - Updated StatsMapper to calculate usage statistics (total, prompt, completion tokens) - Added validation for new sortable fields in SortingFactoryDatasets - Implemented comprehensive test coverage for sorting functionality Testing: - Added backend integration tests for sorting by total_estimated_cost and usage.total_tokens - Verified multi-experiment comparison rendering - Tested null/missing data handling Note: Filtering capability intentionally excluded due to complexity of filtering aggregated fields. Future work tracked in OPIK-3115.
@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch from 722f0d8 to 4b5f682 Compare November 18, 2025 11:58
@JetoPistola JetoPistola marked this pull request as ready for review November 18, 2025 12:47
@JetoPistola JetoPistola added the test-environment Deploy Opik adhoc environment label Nov 18, 2025
@github-actions
Copy link
Contributor

🔄 Test environment deployment started

Building images for PR #4061...

You can monitor the build progress here.

@CometActions
Copy link
Collaborator

Test environment is now available!

Access Information

The deployment has completed successfully and the version has been verified.

@JetoPistola JetoPistola removed the test-environment Deploy Opik adhoc environment label Nov 19, 2025
@JetoPistola JetoPistola added the test-environment Deploy Opik adhoc environment label Nov 19, 2025
@github-actions
Copy link
Contributor

🔄 Test environment deployment started

Building images for PR #4061...

You can monitor the build progress here.

@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch from 7852e6e to ec7b1fc Compare November 19, 2025 09:31
@CometActions
Copy link
Collaborator

Test environment is now available!

Access Information

The deployment has completed successfully and the version has been verified.

Copy link
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BE LGTM. You can move forward once all FE reviews pass.

Extract duplicated map processing logic from mapExperimentItemsStats into a parameterized helper method (addMapStats) to follow DRY principle. Addresses PR review feedback from @andrescrz about code duplication between feedbackScoresMap and usageMap processing. Changes: - Add addMapStats() helper method with proper Javadoc - Simplify both feedbackScoresMap and usageMap calls to use helper - No behavioral changes, pure refactoring
@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch from 00e2135 to c6a5eb6 Compare November 19, 2025 12:06
Apply the same DRY principle to the mapProjectStats method, which also had duplicated map processing logic for usage and feedbackScores. Now both mapProjectStats and mapExperimentItemsStats use the same reusable addMapStats helper method, completely eliminating the copy-paste pattern that was mentioned in the PR review. The helper method now takes Row and fieldName directly, making call sites even cleaner by eliminating redundant local variables and field name repetition. Changes: - Refactor usage and feedbackScores processing in mapProjectStats - Refactor feedbackScoresMap and usageMap processing in mapExperimentItemsStats - Update addMapStats to accept Row and extract map internally - All call sites now use: addMapStats(row, FIELD_NAME, stats) - Single @SuppressWarnings at helper method level (not at call sites)
@JetoPistola JetoPistola force-pushed the danield/OPIK-2936-display-token-cost-columns-experiment-items branch from c6a5eb6 to e807a34 Compare November 19, 2025 12:10
Copy link
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BE LGTM.

BigDecimal.valueOf(25.75),
BigDecimal.valueOf(5.25));

for (int i = 0; i < 3; i++) {
Copy link
Member

@andrescrz andrescrz Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use PodamFactoryUtils.manufacturePojoList and create spans in batch calls for these tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Feature Request New feature or request Frontend test-environment Deploy Opik adhoc environment

4 participants