Skip to content

Commit 272b140

Browse files
authored
Merge pull request OthersideAI#105 from michaelhhogue/evaluation-documentation
Add instructions for testing changes
2 parents 40c176e + 72f3e2c commit 272b140

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

CONTRIBUTING.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,20 @@ We appreciate your contributions!
1111
## Modifying and Running Code
1212
1. Make changes in `operate/main.py`
1313
2. Run `pip install .` again
14-
3. Run `operate` to test your changes
14+
3. Run `operate` to see your changes
15+
16+
## Testing Changes
17+
**After making significant changes, it's important to verify that SOC can still successfully perform a set of common test cases.**
18+
In the root directory of the project, run:
19+
```
20+
python3 evaluate.py
21+
```
22+
This will automatically prompt `operate` to perform several simple objectives.
23+
Upon completion of each objective, GPT-4v will give an evaluation and determine if the objective was successfully reached.
24+
25+
`evaluate.py` will print out if each test case `[PASSED]` or `[FAILED]`. In addition, a justification will be given on why the pass/fail was given.
26+
27+
It is **strongly** recommended that a screenshot of the `evaluate.py` output is included in any PR which could impact the performance of SOC.
1528

1629
## Contribution Ideas
1730
- **Improve performance by finding optimal screenshot grid**: A primary element of the framework is that it overlays a percentage grid on the screenshot which GPT-4v uses to estimate click locations. If someone is able to find the optimal grid and some evaluation metrics to confirm it is an improvement on the current method then we will merge that PR.

0 commit comments

Comments
 (0)