- Notifications
You must be signed in to change notification settings - Fork 1k
feat: require user confirmation before executing agent-generated code #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR establishes development infrastructure by introducing a GitHub Actions CI workflow, pre-commit configuration, development dependencies, and a comprehensive test suite covering code agent execution, utility functions, smoke testing, and worker behavior validation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| ??????? ?????? ?????????:\n\n- ????????? ????????????? ????? ??????????? ????, ? GUI-???????? ? fallback ?? ?????????? ????.\n- ???????? ???? --allow_exec ??? ?????????? ????????????? (??????).\n- ??????? helper execute_code ??? ????????????/??????????.\n- ????????? unit-????? (11 ?????? ?????? ???????? ????????).\n\n????????????:\n- ?? ????????? ?????????? ???? ??????? ?????????????. ?????????? ????????????? ??????????? ????? dry-run ? sandbox ??? ?????????? ????.\n\n???????:\n- ??????????, ??????? ????????? ? ????? ?????, ???? ????????? ??????????/maintainer ??? ????? PR.\n- CI: workflow ???????? ? ?????????? ????????????? (???? ??? ??????????? check runs).\n\n???????! |
| @alckasoc @eric-simu @kylesimular ??????! ??????, ??????????, ??????????? ???? PR ?? ???????????? ?????????? ???? (????????? ????????????? ????? exec, ???? --allow_exec, unit-?????). ???????! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (6)
requirements-dev.txt (1)
1-6: Consider pinning dependency versions for reproducibility.Without version constraints, different developers and CI runs may install different versions of these tools, potentially causing inconsistent behavior or build failures.
🔎 Example with version pinning
-pytest -pillow -black -flake8 -isort -pre-commit +pytest>=8.0.0,<9.0.0 +pillow>=10.0.0,<11.0.0 +black==25.11.0 +flake8>=7.0.0,<8.0.0 +isort>=5.12.0,<6.0.0 +pre-commit>=3.0.0,<4.0.0Note: Adjust version ranges based on your compatibility requirements.
.github/workflows/ci.yml (2)
10-11: Consider adding a timeout to prevent hung tests.Without a timeout, a hung test could consume CI resources indefinitely.
🔎 Proposed enhancement
test: runs-on: ubuntu-latest + timeout-minutes: 30
16-17: Consider testing against multiple Python versions.Currently only Python 3.11 is tested. If the project supports other Python versions, consider adding a matrix strategy to test compatibility.
🔎 Example matrix strategy
test: runs-on: ubuntu-latest + strategy: + matrix: + python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: - python-version: "3.11" + python-version: ${{ matrix.python-version }}tests/test_code_agent.py (1)
18-36: Consider adding edge case tests.The current tests validate happy paths. Consider adding tests for:
- Invalid code (syntax errors)
- Empty code blocks
- Malformed markdown code fences
- Error handling in execute_code
tests/test_worker.py (1)
39-43: Consider using pytest's monkeypatch fixture for more robust mocking.Direct module attribute assignment (lines 39-43) is fragile and sensitive to import order. Using pytest's
monkeypatchfixture would provide automatic cleanup and clearer test isolation.🔎 Alternative approach
import pytest @pytest.fixture(autouse=True) def mock_lmm_agent(monkeypatch): monkeypatch.setattr("gui_agents.s3.core.mllm.LMMAgent", FakeLMMAgent) monkeypatch.setattr("gui_agents.s3.agents.code_agent.LMMAgent", FakeLMMAgent) monkeypatch.setattr("gui_agents.s3.agents.grounding.LMMAgent", FakeLMMAgent) def test_worker_generate_next_action(): # test implementation...tests/test_utils_formatters.py (1)
5-15: Tests validate core functionality correctly.The tests appropriately verify code extraction and agent function parsing. Consider adding tests for edge cases like empty strings, malformed code blocks, or nested function calls if they're relevant to your use cases.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci.yml.gitignore.pre-commit-config.yamlrequirements-dev.txttests/test_code_agent.pytests/test_smoke.pytests/test_utils_formatters.pytests/test_worker.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_utils_formatters.py (1)
gui_agents/s3/utils/common_utils.py (1)
extract_agent_functions(169-179)
tests/test_worker.py (2)
gui_agents/s3/agents/agent_s.py (1)
AgentS3(48-94)tests/test_smoke.py (5)
FakeLMMAgent(51-77)reset(56-62)add_system_prompt(64-65)add_message(67-73)get_response(75-77)
tests/test_smoke.py (2)
tests/test_worker.py (5)
FakeLMMAgent(11-36)reset(16-22)add_system_prompt(24-25)add_message(27-33)get_response(35-36)gui_agents/s3/agents/agent_s.py (1)
AgentS3(48-94)
tests/test_code_agent.py (1)
gui_agents/s3/agents/code_agent.py (2)
extract_code_block(11-29)execute_code(32-49)
🪛 actionlint (1.7.9)
.github/workflows/ci.yml
15-15: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 Ruff (0.14.10)
tests/test_worker.py
12-12: Unused method argument: engine_params
(ARG002)
12-12: Unused method argument: engine
(ARG002)
27-27: Unused method argument: image_content
(ARG002)
27-27: Unused method argument: kwargs
(ARG002)
35-35: Unused method argument: args
(ARG002)
35-35: Unused method argument: kwargs
(ARG002)
73-73: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
tests/test_smoke.py
13-13: Unused static method argument: image
(ARG004)
13-13: Unused static method argument: output_type
(ARG004)
48-48: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
52-52: Unused method argument: engine_params
(ARG002)
52-52: Unused method argument: engine
(ARG002)
67-67: Unused method argument: image_content
(ARG002)
67-67: Unused method argument: kwargs
(ARG002)
75-75: Unused method argument: args
(ARG002)
75-75: Unused method argument: kwargs
(ARG002)
119-119: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
tests/test_code_agent.py
14-14: Unused method argument: timeout
(ARG002)
🔇 Additional comments (3)
tests/test_code_agent.py (1)
4-16: Test coverage looks good.The
DummyEnvControllerappropriately stubs the environment interface for testing code execution paths without actually running code..pre-commit-config.yaml (1)
1-20: Pre-commit configuration looks good.The hook configuration aligns well with the CI workflow and uses appropriate linting/formatting tools. The Black, isort, and Flake8 settings are consistent with common Python best practices.
tests/test_smoke.py (1)
8-46: Clever approach to stub heavy dependencies.Using dummy modules injected into
sys.modulesbefore imports is an effective way to run tests without requiring pytesseract and pyautogui installations. This keeps the test suite lightweight and fast.
| - uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: "3.11" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update actions/setup-python to v5.
Version v4 of actions/setup-python is deprecated and may stop working on GitHub Actions runners.
🔎 Proposed fix
- - uses: actions/setup-python@v4 + - uses: actions/setup-python@v5 with: python-version: "3.11"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - uses: actions/setup-python@v4 | |
| with: | |
| python-version: "3.11" | |
| - uses: actions/setup-python@v5 | |
| with: | |
| python-version: "3.11" |
🧰 Tools
🪛 actionlint (1.7.9)
15-15: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 15 to 17: the workflow uses actions/setup-python@v4 which is deprecated; update the action reference to actions/setup-python@v5 in the workflow file and ensure any 'with:' keys remain compatible with v5 (e.g., keep python-version or switch to 'python-version' input as before), then run a local or CI syntax check to confirm the workflow parses correctly. | # Local env file for secrets | ||
| .env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate .env entry.
The .env pattern is already present at line 125, making this entry redundant.
🔎 Proposed fix
-# Local env file for secrets -.env📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Local env file for secrets | |
| .env |
🤖 Prompt for AI Agents
In .gitignore around lines 166-167 the pattern ".env" is duplicated (already present at line 125); remove the redundant ".env" entry at lines 166-167 so there is only a single .env line in the file, and quickly scan for any other duplicate identical patterns to avoid repeated entries. | sys.modules.setdefault("pyautogui", DummyPyAutoGUI()) | ||
| | ||
| # ---- Monkeypatch LMMAgent to avoid external LLM calls ---- | ||
| import gui_agents.s3.core.mllm as mllm # noqa: E402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused noqa directive.
The # noqa: E402 comment is unnecessary as E402 is not an enabled check in your Flake8 configuration.
🔎 Proposed fix
-import gui_agents.s3.core.mllm as mllm # noqa: E402 +import gui_agents.s3.core.mllm as mllm📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import gui_agents.s3.core.mllm as mllm # noqa: E402 | |
| import gui_agents.s3.core.mllm as mllm |
🧰 Tools
🪛 Ruff (0.14.10)
48-48: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
🤖 Prompt for AI Agents
In tests/test_smoke.py around line 48, the import statement includes an unnecessary "# noqa: E402" directive; remove the trailing " # noqa: E402" from the import line so the import reads simply "import gui_agents.s3.core.mllm as mllm" and commit the change. | info, actions = agent.predict( | ||
| instruction="Wait a bit", observation={"screenshot": screenshot} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix unused variable with underscore.
The info variable is unpacked but never used. Following Python conventions, prefix it with an underscore.
🔎 Proposed fix
- info, actions = agent.predict( + _info, actions = agent.predict( instruction="Wait a bit", observation={"screenshot": screenshot} )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| info, actions = agent.predict( | |
| instruction="Wait a bit", observation={"screenshot": screenshot} | |
| _info, actions = agent.predict( | |
| instruction="Wait a bit", observation={"screenshot": screenshot} | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
119-119: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In tests/test_smoke.py around lines 119 to 120, the unpacked variable "info" is not used; rename it to a prefixed unused variable (e.g., "_info" or simply "_") in the unpacking assignment so it follows Python conventions for unused variables and avoid linter warnings; update the assignment to use the chosen prefixed name and ensure there are no remaining references to the old "info" identifier. | cli = importlib.import_module("gui_agents.s3.cli_app") | ||
| | ||
| # Running help should exit with code 0 | ||
| import sys as _sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant import.
sys is already imported at line 3, so importing it again as _sys is unnecessary. Use the existing sys import instead.
🔎 Proposed fix
# Running help should exit with code 0 - import sys as _sys - - prev_argv = _sys.argv.copy() + prev_argv = sys.argv.copy() try: - _sys.argv = ["agent_s", "--help"] + sys.argv = ["agent_s", "--help"] try: cli.main() except SystemExit as e: assert e.code == 0 finally: - _sys.argv = prev_argv + sys.argv = prev_argvCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tests/test_smoke.py around line 132, there is a redundant import "import sys as _sys" (sys is already imported at line 3); remove this extra import and, if the code below references _sys, replace those references with the existing sys import so the single import at line 3 is used consistently. | platform="linux", | ||
| ) | ||
| | ||
| info, actions = agent.predict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix unused variable with underscore.
The info variable is unpacked but never used. Following Python conventions, prefix it with an underscore to indicate it's intentionally unused.
🔎 Proposed fix
- info, actions = agent.predict( + _info, actions = agent.predict( instruction="Wait small", observation={"screenshot": screenshot} )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| info, actions = agent.predict( | |
| _info, actions = agent.predict( | |
| instruction="Wait small", observation={"screenshot": screenshot} | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
73-73: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In tests/test_worker.py around line 73, the test unpacks two values into variables `info, actions` but never uses `info`; update the unpacking to prefix the unused variable with an underscore (e.g., `_info, actions`) to follow Python conventions for intentionally unused variables and avoid linter warnings.
????????? ????????????? ????? ??????????? ???? ??????, ???? --allow_exec ??? ?????????? ?????????????, helper execute_code ? ????? (unit tests). ???????????? ???????????? ??? ?????????? agent-generated ????.
Summary by CodeRabbit
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.