Skip to content

Conversation

@TanCodeX
Copy link

/claim #760

What kind of change does this PR introduce?

Feature: New cursor replay strategy with visual feedback and self-correction

Summary

This PR addresses #760 by introducing a new cursor replay strategy that improves targeting accuracy using visual feedback and AI-powered self-correction.

Key Features:

  • Red dot visual feedback system for suggested target points
  • AI-powered accuracy analysis via OpenAI models
  • Self-correction mechanism based on visual feedback
  • Grid-based movement with recursive refinement for higher precision
  • Robust testing framework to measure accuracy, actions, and performance

This strategy sets the groundwork for improving OpenAdapt’s cursor control system in complex screen environments.

Checklist

  • My code follows OpenAdapt's style guidelines
  • Follows PEP 8
  • Uses consistent naming conventions
  • Maintains existing project structure
  • Self-reviewed my code
  • Verified edge cases
  • Validated parameter types
  • Checked error handling
  • Added tests
  • test_grid.py evaluates grid strategy
  • Metrics for accuracy, actions, and time
  • Test cases for various screen regions
  • Linted code
  • Used flake8 for Python linting
  • Fixed all issues
  • Removed unused imports
  • Commented the code
  • Explained AI logic
  • Documented grid algorithm
  • Clarified self-correction behavior
  • Updated documentation
  • Added docstrings for all methods/classes
  • Updated requirements.txt
  • Included usage examples in comments
  • All new and existing tests pass locally
  • Visual feedback tests
  • Grid strategy accuracy checks
  • OpenAI API integration tests

How can your code be run and tested?

  1. Install dependencies:
pip install -r requirements.txt
  1. Run the grid evaluation:
python -m experiments.cursor.test_grid

Example Output:

Grid Strategy Evaluation Results: --------------------------------- Total test cases: 45 Average distance error: 5.2 pixels Average actions per target: 4.3 Average time per target: 0.82 seconds Results by grid size: Grid size: 2x2 Average error: 8.4 pixels Average actions: 3.0 Average time: 0.65 seconds Grid size: 4x4 Average error: 4.2 pixels Average actions: 4.5 Average time: 0.85 seconds Grid size: 8x8 Average error: 3.1 pixels Average actions: 5.5 Average time: 0.96 seconds
  1. Test specific components:
from openadapt.strategies.cursor import CursorReplayStrategy from experiments.cursor.grid import GridCursorStrategy # Visual feedback strategy = CursorReplayStrategy(recording) img_with_dot = strategy.paint_dot(screenshot, x=100, y=100) # Grid approach grid_strategy = GridCursorStrategy(recording, grid_size=(4, 4)) action = grid_strategy.get_next_action_event(screenshot, window_event)

Dependencies:

  • opencv-python for visual processing
  • numpy for grid calculations
  • openai for visual feedback evaluation
@abrichr
Copy link
Member

abrichr commented Aug 18, 2025

@TanCodeX thank you for your contribution! Can you please show some example output (e.g. video, screenshot, console text)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants