feat: require user confirmation before executing agent-generated code #161

Vasyl198 · 2025-12-24T13:37:24Z

????????? ????????????? ????? ??????????? ???? ??????, ???? --allow_exec ??? ?????????? ?????????????, helper execute_code ? ????? (unit tests). ???????????? ???????????? ??? ?????????? agent-generated ????.

Summary by CodeRabbit

Chores
- Added automated code quality checks and linting workflow.
- Configured pre-commit hooks for code formatting and consistency.
- Added development dependencies for testing and code analysis.
- Updated repository ignore patterns for local configuration files.
Tests
- Added comprehensive test coverage for core functionality and utilities.
- Introduced smoke tests to validate basic agent workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T13:37:33Z

📝 Walkthrough

Walkthrough

This PR establishes development infrastructure by introducing a GitHub Actions CI workflow, pre-commit configuration, development dependencies, and a comprehensive test suite covering code agent execution, utility functions, smoke testing, and worker behavior validation.

Changes

Cohort / File(s)	Summary
CI/CD & Configuration Setup `.github/workflows/ci.yml`, `.gitignore`, `.pre-commit-config.yaml`, `requirements-dev.txt`	Adds GitHub Actions CI pipeline (linting and pytest on push/PR), secrets management (.env to gitignore), pre-commit hooks (Black 25.11.0, isort 5.12.0, Flake8 7.1.0, file fixers), and dev dependencies (pytest, Pillow, Black, Flake8, isort, pre-commit).
Test Suite `tests/test_code_agent.py`	Tests code extraction and execution (Python/Bash) via execute_code with DummyEnvController mock.
Test Suite `tests/test_smoke.py`	Smoke tests with mocked pytesseract/pyautogui and FakeLMMAgent; verifies AgentS3 prediction flow and CLI help runs without external dependencies.
Test Suite `tests/test_utils_formatters.py`	Tests parse_code_from_string and extract_agent_functions utility functions.
Test Suite `tests/test_worker.py`	Tests AgentS3.predict behavior with monkeypatched FakeLMMAgent across mllm, code_agent, and grounding modules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through CI with glee, 🐰
Pre-commit hooks now help us see—
Black, isort, tests all aligned,
Infrastructure so refined!
DevTools configured, workflows blessed,
The codebase now will pass each test! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main feature: requiring user confirmation before executing agent-generated code. However, the changeset includes substantial infrastructure additions (CI workflow, pre-commit config, dev dependencies, and multiple test files) that are not reflected in the title.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Vasyl198 · 2025-12-24T13:39:34Z

??????? ?????? ?????????:\n\n- ????????? ????????????? ????? ??????????? ????, ? GUI-???????? ? fallback ?? ?????????? ????.\n- ???????? ???? --allow_exec ??? ?????????? ????????????? (??????).\n- ??????? helper execute_code ??? ????????????/??????????.\n- ????????? unit-????? (11 ?????? ?????? ???????? ????????).\n\n????????????:\n- ?? ????????? ?????????? ???? ??????? ?????????????. ?????????? ????????????? ??????????? ????? dry-run ? sandbox ??? ?????????? ????.\n\n???????:\n- ??????????, ??????? ????????? ? ????? ?????, ???? ????????? ??????????/maintainer ??? ????? PR.\n- CI: workflow ???????? ? ?????????? ????????????? (???? ??? ??????????? check runs).\n\n???????!

Vasyl198 · 2025-12-24T13:40:06Z

@alckasoc @eric-simu @kylesimular ??????! ??????, ??????????, ??????????? ???? PR ?? ???????????? ?????????? ???? (????????? ????????????? ????? exec, ???? --allow_exec, unit-?????). ???????!

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (6)

requirements-dev.txt (1)
1-6: Consider pinning dependency versions for reproducibility.

Without version constraints, different developers and CI runs may install different versions of these tools, potentially causing inconsistent behavior or build failures.
🔎 Example with version pinning
-pytest -pillow -black -flake8 -isort -pre-commit +pytest>=8.0.0,<9.0.0 +pillow>=10.0.0,<11.0.0 +black==25.11.0 +flake8>=7.0.0,<8.0.0 +isort>=5.12.0,<6.0.0 +pre-commit>=3.0.0,<4.0.0
Note: Adjust version ranges based on your compatibility requirements.
.github/workflows/ci.yml (2)
10-11: Consider adding a timeout to prevent hung tests.

Without a timeout, a hung test could consume CI resources indefinitely.
🔎 Proposed enhancement
 test: runs-on: ubuntu-latest + timeout-minutes: 30
16-17: Consider testing against multiple Python versions.

Currently only Python 3.11 is tested. If the project supports other Python versions, consider adding a matrix strategy to test compatibility.
🔎 Example matrix strategy
 test: runs-on: ubuntu-latest + strategy: + matrix: + python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: - python-version: "3.11" + python-version: ${{ matrix.python-version }}
tests/test_code_agent.py (1)

18-36: Consider adding edge case tests.

The current tests validate happy paths. Consider adding tests for:

Invalid code (syntax errors)

Empty code blocks

Malformed markdown code fences

Error handling in execute_code
tests/test_worker.py (1)
39-43: Consider using pytest's monkeypatch fixture for more robust mocking.

Direct module attribute assignment (lines 39-43) is fragile and sensitive to import order. Using pytest's monkeypatch fixture would provide automatic cleanup and clearer test isolation.
🔎 Alternative approach
import pytest @pytest.fixture(autouse=True) def mock_lmm_agent(monkeypatch): monkeypatch.setattr("gui_agents.s3.core.mllm.LMMAgent", FakeLMMAgent) monkeypatch.setattr("gui_agents.s3.agents.code_agent.LMMAgent", FakeLMMAgent) monkeypatch.setattr("gui_agents.s3.agents.grounding.LMMAgent", FakeLMMAgent) def test_worker_generate_next_action(): # test implementation...
tests/test_utils_formatters.py (1)

5-15: Tests validate core functionality correctly.

The tests appropriately verify code extraction and agent function parsing. Consider adding tests for edge cases like empty strings, malformed code blocks, or nested function calls if they're relevant to your use cases.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb57fb and 9902f3d.

📒 Files selected for processing (8)

.github/workflows/ci.yml
.gitignore
.pre-commit-config.yaml
requirements-dev.txt
tests/test_code_agent.py
tests/test_smoke.py
tests/test_utils_formatters.py
tests/test_worker.py

🧰 Additional context used

🧬 Code graph analysis (4)

tests/test_utils_formatters.py (1)

gui_agents/s3/utils/common_utils.py (1)

extract_agent_functions (169-179)

tests/test_worker.py (2)

gui_agents/s3/agents/agent_s.py (1)

AgentS3 (48-94)

tests/test_smoke.py (5)

FakeLMMAgent (51-77)

reset (56-62)

add_system_prompt (64-65)

add_message (67-73)

get_response (75-77)

tests/test_smoke.py (2)

tests/test_worker.py (5)

FakeLMMAgent (11-36)

reset (16-22)

add_system_prompt (24-25)

add_message (27-33)

get_response (35-36)

gui_agents/s3/agents/agent_s.py (1)

AgentS3 (48-94)

tests/test_code_agent.py (1)

gui_agents/s3/agents/code_agent.py (2)

extract_code_block (11-29)

execute_code (32-49)

🪛 actionlint (1.7.9)

.github/workflows/ci.yml

15-15: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Ruff (0.14.10)

tests/test_worker.py

12-12: Unused method argument: engine_params

(ARG002)

12-12: Unused method argument: engine

(ARG002)

27-27: Unused method argument: image_content

(ARG002)

27-27: Unused method argument: kwargs

(ARG002)

35-35: Unused method argument: args

(ARG002)

35-35: Unused method argument: kwargs

(ARG002)

73-73: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

tests/test_smoke.py

13-13: Unused static method argument: image

(ARG004)

13-13: Unused static method argument: output_type

(ARG004)

48-48: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)

52-52: Unused method argument: engine_params

(ARG002)

52-52: Unused method argument: engine

(ARG002)

67-67: Unused method argument: image_content

(ARG002)

67-67: Unused method argument: kwargs

(ARG002)

75-75: Unused method argument: args

(ARG002)

75-75: Unused method argument: kwargs

(ARG002)

119-119: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

tests/test_code_agent.py

14-14: Unused method argument: timeout

(ARG002)

🔇 Additional comments (3)

tests/test_code_agent.py (1)

4-16: Test coverage looks good.

The DummyEnvController appropriately stubs the environment interface for testing code execution paths without actually running code.

.pre-commit-config.yaml (1)

1-20: Pre-commit configuration looks good.

The hook configuration aligns well with the CI workflow and uses appropriate linting/formatting tools. The Black, isort, and Flake8 settings are consistent with common Python best practices.

tests/test_smoke.py (1)

8-46: Clever approach to stub heavy dependencies.

Using dummy modules injected into sys.modules before imports is an effective way to run tests without requiring pytesseract and pyautogui installations. This keeps the test suite lightweight and fast.

coderabbitai · 2025-12-24T13:41:49Z

.github/workflows/ci.yml

+ - uses: actions/setup-python@v4
+ with:
+ python-version: "3.11"


⚠️ Potential issue | 🟠 Major

Update actions/setup-python to v5.

Version v4 of actions/setup-python is deprecated and may stop working on GitHub Actions runners.

🔎 Proposed fix

- - uses: actions/setup-python@v4 + - uses: actions/setup-python@v5 with: python-version: "3.11"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- uses: actions/setup-python@v4

with:

python-version: "3.11"

- uses: actions/setup-python@v5

with:

python-version: "3.11"

🧰 Tools

🪛 actionlint (1.7.9)

15-15: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

.github/workflows/ci.yml around lines 15 to 17: the workflow uses actions/setup-python@v4 which is deprecated; update the action reference to actions/setup-python@v5 in the workflow file and ensure any 'with:' keys remain compatible with v5 (e.g., keep python-version or switch to 'python-version' input as before), then run a local or CI syntax check to confirm the workflow parses correctly.

coderabbitai · 2025-12-24T13:41:49Z

.gitignore

+# Local env file for secrets
+.env


⚠️ Potential issue | 🟡 Minor

Remove duplicate .env entry.

The .env pattern is already present at line 125, making this entry redundant.

🔎 Proposed fix

-# Local env file for secrets -.env

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Local env file for secrets

.env

🤖 Prompt for AI Agents

In .gitignore around lines 166-167 the pattern ".env" is duplicated (already present at line 125); remove the redundant ".env" entry at lines 166-167 so there is only a single .env line in the file, and quickly scan for any other duplicate identical patterns to avoid repeated entries.

coderabbitai · 2025-12-24T13:41:50Z

tests/test_smoke.py

+sys.modules.setdefault("pyautogui", DummyPyAutoGUI())
+
+# ---- Monkeypatch LMMAgent to avoid external LLM calls ----
+import gui_agents.s3.core.mllm as mllm # noqa: E402


⚠️ Potential issue | 🟡 Minor

Remove unused noqa directive.

The # noqa: E402 comment is unnecessary as E402 is not an enabled check in your Flake8 configuration.

🔎 Proposed fix

-import gui_agents.s3.core.mllm as mllm # noqa: E402 +import gui_agents.s3.core.mllm as mllm

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import gui_agents.s3.core.mllm as mllm # noqa: E402

import gui_agents.s3.core.mllm as mllm

🧰 Tools

🪛 Ruff (0.14.10)

48-48: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents

In tests/test_smoke.py around line 48, the import statement includes an unnecessary "# noqa: E402" directive; remove the trailing " # noqa: E402" from the import line so the import reads simply "import gui_agents.s3.core.mllm as mllm" and commit the change.

coderabbitai · 2025-12-24T13:41:50Z

tests/test_smoke.py

+ info, actions = agent.predict(
+ instruction="Wait a bit", observation={"screenshot": screenshot}


⚠️ Potential issue | 🟡 Minor

Prefix unused variable with underscore.

The info variable is unpacked but never used. Following Python conventions, prefix it with an underscore.

🔎 Proposed fix

- info, actions = agent.predict( + _info, actions = agent.predict( instruction="Wait a bit", observation={"screenshot": screenshot} )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

info, actions = agent.predict(

instruction="Wait a bit", observation={"screenshot": screenshot}

_info, actions = agent.predict(

instruction="Wait a bit", observation={"screenshot": screenshot}

)

🧰 Tools

🪛 Ruff (0.14.10)

119-119: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In tests/test_smoke.py around lines 119 to 120, the unpacked variable "info" is not used; rename it to a prefixed unused variable (e.g., "_info" or simply "_") in the unpacking assignment so it follows Python conventions for unused variables and avoid linter warnings; update the assignment to use the chosen prefixed name and ensure there are no remaining references to the old "info" identifier.

coderabbitai · 2025-12-24T13:41:50Z

tests/test_smoke.py

+ cli = importlib.import_module("gui_agents.s3.cli_app")
+
+ # Running help should exit with code 0
+ import sys as _sys


⚠️ Potential issue | 🟡 Minor

Remove redundant import.

sys is already imported at line 3, so importing it again as _sys is unnecessary. Use the existing sys import instead.

🔎 Proposed fix

# Running help should exit with code 0 - import sys as _sys - - prev_argv = _sys.argv.copy() + prev_argv = sys.argv.copy() try: - _sys.argv = ["agent_s", "--help"] + sys.argv = ["agent_s", "--help"] try: cli.main() except SystemExit as e: assert e.code == 0 finally: - _sys.argv = prev_argv + sys.argv = prev_argv

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tests/test_smoke.py around line 132, there is a redundant import "import sys as _sys" (sys is already imported at line 3); remove this extra import and, if the code below references _sys, replace those references with the existing sys import so the single import at line 3 is used consistently.

coderabbitai · 2025-12-24T13:41:50Z

tests/test_worker.py

+ platform="linux",
+ )
+
+ info, actions = agent.predict(


⚠️ Potential issue | 🟡 Minor

Prefix unused variable with underscore.

The info variable is unpacked but never used. Following Python conventions, prefix it with an underscore to indicate it's intentionally unused.

🔎 Proposed fix

- info, actions = agent.predict( + _info, actions = agent.predict( instruction="Wait small", observation={"screenshot": screenshot} )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

info, actions = agent.predict(

_info, actions = agent.predict(

instruction="Wait small", observation={"screenshot": screenshot}

)

🧰 Tools

🪛 Ruff (0.14.10)

73-73: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In tests/test_worker.py around line 73, the test unpacks two values into variables `info, actions` but never uses `info`; update the unpacking to prefix the unused variable with an underscore (e.g., `_info, actions`) to follow Python conventions for intentionally unused variables and avoid linter warnings.

ci: add tests, CI, and pre-commit config (clean backport)

9902f3d

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: require user confirmation before executing agent-generated code #161

feat: require user confirmation before executing agent-generated code #161

Uh oh!

Vasyl198 commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Vasyl198 commented Dec 24, 2025

Vasyl198 commented Dec 24, 2025

coderabbitai bot left a comment

coderabbitai bot Dec 24, 2025

coderabbitai bot Dec 24, 2025

coderabbitai bot Dec 24, 2025

coderabbitai bot Dec 24, 2025

coderabbitai bot Dec 24, 2025

coderabbitai bot Dec 24, 2025

Labels

1 participant

	import gui_agents.s3.core.mllm as mllm # noqa: E402
	import gui_agents.s3.core.mllm as mllm

		info, actions = agent.predict(
		instruction="Wait a bit", observation={"screenshot": screenshot}

- info, actions = agent.predict(
+ _info, actions = agent.predict(
+ instruction="Wait small", observation={"screenshot": screenshot}
+ )

feat: require user confirmation before executing agent-generated code #161

Are you sure you want to change the base?

feat: require user confirmation before executing agent-generated code #161

Uh oh!

Conversation

Vasyl198 commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Vasyl198 commented Dec 24, 2025

Vasyl198 commented Dec 24, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Labels

1 participant

Vasyl198 commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading