feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

JoeXic · 2025-11-02T00:21:27Z

Describe this PR

Add real-time web monitoring dashboard for GAIA validation benchmark with progress tracking and visualization capabilities.

What changed?

Added run_gaia_with_monitor.py to run GAIA benchmark with integrated web monitoring
Added utils/progress_check/gaia_web_monitor.py - web dashboard for real-time progress tracking
Added utils/progress_check/generate_gaia_report.py - report generation utility
Updated main.py to support the new monitoring command
Web dashboard accessible at http://localhost:8080 during benchmark execution

Why?

Running long benchmarks like GAIA validation requires hours, and users need a way to:

Monitor real-time progress without constantly checking logs
Visualize task completion status
Track performance metrics during execution
Generate comprehensive reports after completion

- Add run-gaia-with-monitor command for running benchmark with real-time monitoring - Add web dashboard for monitoring benchmark progress (gaia_web_monitor.py) - Add generate_gaia_report.py to utils/progress_check/ for generating task reports

…into feature/monitor

JoeXic · 2025-11-10T22:45:29Z

Describe this PR

Refactor monitoring system from GAIA-specific to generic benchmark monitoring, supporting GAIA, FutureX, xbench, and FinSearchComp benchmarks with real-time web dashboards.

What changed?

Core Changes

Replaced run_gaia_with_monitor.py → run_benchmark_with_monitor.py (generic benchmark runner)
Replaced utils/progress_check/gaia_web_monitor.py → utils/progress_check/benchmark_monitor.py (generic monitor)
Replaced utils/progress_check/generate_gaia_report.py → utils/progress_check/generate_benchmark_report.py (generic report generator)
Updated main.py to use the new generic monitoring system
Updated utils/progress_check/check_finsearchcomp_progress.py (fixed type annotation)

New Features

Auto-detect benchmark type from log folder path
Support benchmark-specific metrics:
- GAIA/FinSearchComp: Correctness evaluation (accuracy)
- FutureX/xbench: Prediction tracking (prediction rate)
- FinSearchComp: Task type breakdown (T1/T2/T3) and regional analysis
Extract attempt number from log filename for accurate report generation
Suppress verbose HTTP logs in web dashboard
Automatic port conflict resolution

Documentation

Added monitor_guide.md - Web monitoring dashboard guide

Why?

Running long benchmarks (GAIA, FutureX, xbench, FinSearchComp) requires hours, and users need a way to:

Monitor real-time progress without constantly checking logs
Visualize task completion status with benchmark-specific metrics
Track performance metrics during execution (accuracy for GAIA, prediction rate for FutureX/xbench)
Generate comprehensive reports after completion
Use a unified monitoring system across all benchmarks instead of benchmark-specific solutions

JoeXic added 7 commits October 24, 2025 17:47

Add guidance

c0243c1

spacing

a177523

Remove guide.md from tracking

33c96d2

add guide.md

fc18fef

Remove guide.md

3507659

style: format code with ruff

cb853d6

JoeXic closed this Nov 2, 2025

JoeXic reopened this Nov 2, 2025

JoeXic and others added 6 commits November 10, 2025 22:27

Merge branch 'MiroMindAI:main' into feature/monitor

66948f5

generalize monitoring system to support all benchmarks

5a05241

Merge branch 'feature/monitor' of https://github.com/JoeXic/MiroFlow …

4dd3a97

…into feature/monitor

update main.py

81b32bc

delete deprecated files

787454c

add import

b569c18

JoeXic changed the title ~~feat(monitoring): add real-time web dashboard for GAIA benchmark progress~~ feat(monitoring): add real-time web dashboard for monitoring benchmark progress Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

Uh oh!

JoeXic commented Nov 2, 2025

JoeXic commented Nov 10, 2025

Labels

1 participant

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

Are you sure you want to change the base?

feat(monitoring): add real-time web dashboard for monitoring benchmark progress #93

Uh oh!

Conversation

JoeXic commented Nov 2, 2025

Describe this PR

What changed?

Why?

JoeXic commented Nov 10, 2025

Describe this PR

What changed?

Core Changes

New Features

Documentation

Why?

Labels

1 participant