Releases · malah-code/self-ai-operating-computer

07 Jul 16:19

malah-code

v2.0.15

02d44bc

v2.0.15 Latest

Latest

Version v2.0.15 (Latest) Release Summary

New Features:

Centralized Model Management: All model configurations are now managed in a single file (operate/models/model_configs.py), making it easier to add, remove, and manage models.
Expanded Ollama Model Support: Added support for qwen2.5vl:3b and gemma3:4b.
Enhanced Debugging: Added a -d flag (alias for --verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.

Improvements:

Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.

Bug Fixes:

Fixed an issue where the model selection screen was not correctly displaying all available models.
Resolved an IndentationError in the model configuration file.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

Key Features

Compatibility: Designed for various multimodal models.
Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including e2b and e4b variants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa.
Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
Future Plans: Support for additional models.

Assets 2

07 Jul 14:29

malah-code

v2.0.14

da00a8b

v2.0.14

Version v2.0.14 (Latest) Release Summary

New Features:

Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Assets 2

07 Jul 13:55

malah-code

v2.0.11

76c90ae

2.0.11

Assets 2

07 Jul 13:48

malah-code

v2.0.10

14c7730

2.0.10

Assets 2

07 Jul 13:38

malah-code

v2.0.9

b68717b

2.0.9

Assets 2

06 Jul 22:02

malah-code

v2.0.7

89a3890

2.0.7

v2.0.7 Release version 2.0.7

Assets 2

06 Jul 21:57

malah-code

v2.0.6

5028b38

2.0.6

Version v2.0.5 (Latest) Release Summary

New Features:

Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Assets 2

06 Jul 21:50

malah-code

v2.0.5

3b5fb9d

2.0.5

Version v2.0.5 (Latest) Release Summary

New Features:

Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Assets 2

04 Jul 21:10

malah-code

v2.0.2

4fc4439

2.0.2

New Features:

Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Assets 2

04 Jul 15:27

malah-code

v2.0.0

544e84a

2.0.0

New Features:

Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Assets 2

Releases: malah-code/self-ai-operating-computer

v2.0.15

Version v2.0.15 (Latest) Release Summary

Key Features

Uh oh!

v2.0.14

Version v2.0.14 (Latest) Release Summary

Uh oh!

2.0.11

Uh oh!

2.0.10

Uh oh!

2.0.9

Uh oh!

2.0.7

Uh oh!

2.0.6

Version v2.0.5 (Latest) Release Summary

Uh oh!

2.0.5

Version v2.0.5 (Latest) Release Summary

Uh oh!

2.0.2

Uh oh!

2.0.0

Uh oh!