Skip to content

Commit 32fe0c4

Browse files
committed
Merge branch 'main' into evaluator
2 parents 8cbd372 + 7b15121 commit 32fe0c4

File tree

4 files changed

+12
-12
lines changed

4 files changed

+12
-12
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,12 @@ We appreciate your contributions!
99
5. Create new Pull Request
1010

1111
## Contribution Ideas
12-
- **Remove necessity for `pip install .`**: I think by uploading packages to PyPi we can reduce the installation code steps by consolidating `pip install -r requirements.txt` and `pip install .`. If that's possible that'd be great.
12+
- **Develop an Automated End-to-End Testing System**: Build an automated testing framework that can be run before merging PRs to `main` to confirm no test cases broke. An example of such a test case would be "go to google docs and write a poem". This testing system should be flexible to add new test cases in the future and reduce the time spent on manually testing each PR.
1313
- **Improve performance by finding optimal screenshot grid**: A primary element of the framework is that it overlays a percentage grid on the screenshot which GPT-4v uses to estimate click locations. If someone is able to find the optimal grid and some evaluation metrics to confirm it is an improvement on the current method then we will merge that PR.
1414
- **Improve the `SUMMARY_PROMPT`**
15-
- **Create an evaluation system**
1615
- **Improve Linux and Windows compatibility**: There are still some issues with Linux and Windows compatibility. PRs to fix the issues are encouraged.
17-
- **Enabling New Mouse Capabilities**: (drag, hover, etc.)
1816
- **Adding New Multimodal Models**: Integration of new multimodal models is welcomed. If you have a specific model in mind that you believe would be a valuable addition, please feel free to integrate it and submit a PR.
19-
- **Framework Architecture Improvements**: Think you can enhance the framework architecture described in the intro? We welcome suggestions and PRs.
17+
- **Iterate `--accurate` flag functionality**: Look at https://github.com/OthersideAI/self-operating-computer/pull/57 for previous iteration
2018

2119
## Guidelines
2220
This will primarily be a [Software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35) project. For this reason:

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -63,23 +63,23 @@ python3 -m venv venv
6363
```
6464
source venv/bin/activate
6565
```
66-
6. **Install Project Requirements and Command-Line Interface**:
66+
5. **Install Project Requirements and Command-Line Interface: Instead of using `pip install .`, you can now install the project directly from PyPI with:**
6767
```
68-
pip install .
68+
pip install self-operating-computer
6969
```
70-
7. **Then rename the `.example.env` file to `.env` so that you can save your OpenAI key in it.**
70+
6. **Then rename the `.example.env` file to `.env` so that you can save your OpenAI key in it.**
7171
```
7272
mv .example.env .env
7373
```
74-
8. **Add your Open AI key to your new `.env` file. If you don't have one, you can obtain an OpenAI key [here](https://platform.openai.com/account/api-keys)**:
74+
7. **Add your Open AI key to your new `.env` file. If you don't have one, you can obtain an OpenAI key [here](https://platform.openai.com/account/api-keys)**:
7575
```
7676
OPENAI_API_KEY='your-key-here'
7777
```
78-
9. **Run it**!
78+
8. **Run it**!
7979
```
8080
operate
8181
```
82-
10. **Final Step**: As a last step, the Terminal app will ask for permission for "Screen Recording" and "Accessibility" in the "Security & Privacy" page of Mac's "System Preferences".
82+
9. **Final Step**: As a last step, the Terminal app will ask for permission for "Screen Recording" and "Accessibility" in the "Security & Privacy" page of Mac's "System Preferences".
8383

8484
<div align="center">
8585
<img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/terminal-access-1.png" width="300" style="margin: 10px;"/>
@@ -132,4 +132,4 @@ Stay updated with the latest developments:
132132

133133
## OpenAI Rate Limiting Note
134134
The ```gpt-4-vision-preview``` model is required. To unlock access to this model, your account needs to spend at least \$5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum \$5.
135-
Learn more **[here](https://platform.openai.com/docs/guides/rate-limits?context=tier-one)**
135+
Learn more **[here](https://platform.openai.com/docs/guides/rate-limits?context=tier-one)**

operate/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -700,6 +700,8 @@ def search(text):
700700
pyautogui.press("space")
701701
pyautogui.keyUp("command")
702702

703+
time.sleep(1)
704+
703705
# Now type the text
704706
for char in text:
705707
pyautogui.write(char)

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
setup(
1212
name="self-operating-computer",
13-
version="1.0.4",
13+
version="1.0.5",
1414
packages=find_packages(),
1515
install_requires=required, # Add dependencies here
1616
entry_points={

0 commit comments

Comments
 (0)