Releases: malah-code/self-ai-operating-computer
v2.0.15
Version v2.0.15 (Latest) Release Summary
New Features:
- Centralized Model Management: All model configurations are now managed in a single file (
operate/models/model_configs.py), making it easier to add, remove, and manage models. - Expanded Ollama Model Support: Added support for
qwen2.5vl:3bandgemma3:4b. - Enhanced Debugging: Added a
-dflag (alias for--verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.
Improvements:
- Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.
Bug Fixes:
- Fixed an issue where the model selection screen was not correctly displaying all available models.
- Resolved an
IndentationErrorin the model configuration file.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.
Key Features
- Compatibility: Designed for various multimodal models.
- Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including
e2bande4bvariants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa. - Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
- Future Plans: Support for additional models.
v2.0.14
Version v2.0.14 (Latest) Release Summary
New Features:
- Interactive Model Selection: When running
operatewithout specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model. - Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or
.envfile. - Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (
CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.
Improvements:
- Expanded Google Gemini Support: Added full support for
gemini-2.5-proandgemini-2.5-flashmodels. - Enhanced Ollama Integration: Improved handling for Ollama models, including setting
http://localhost:11434as the default host and providing more informative error messages when Ollama models are not found. - Gemma 3n Model Support: Integrated support for
gemma3n,gemma3n:e2b, andgemma3n:e4bmodels via Ollama. - Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.
Bug Fixes:
- Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
- Fixed an issue where the application would attempt to use an incorrect model name for
gemini-2.5-flash-lite(which is not a valid model name). - Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.
2.0.11
2.0.11
2.0.10
2.0.10
2.0.9
2.0.9
2.0.7
v2.0.7 Release version 2.0.7
2.0.6
Version v2.0.5 (Latest) Release Summary
New Features:
- Interactive Model Selection: When running
operatewithout specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model. - Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or
.envfile. - Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (
CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.
Improvements:
- Expanded Google Gemini Support: Added full support for
gemini-2.5-proandgemini-2.5-flashmodels. - Enhanced Ollama Integration: Improved handling for Ollama models, including setting
http://localhost:11434as the default host and providing more informative error messages when Ollama models are not found. - Gemma 3n Model Support: Integrated support for
gemma3n,gemma3n:e2b, andgemma3n:e4bmodels via Ollama. - Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.
Bug Fixes:
- Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
- Fixed an issue where the application would attempt to use an incorrect model name for
gemini-2.5-flash-lite(which is not a valid model name). - Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.
2.0.5
Version v2.0.5 (Latest) Release Summary
New Features:
- Interactive Model Selection: When running
operatewithout specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model. - Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or
.envfile. - Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (
CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.
Improvements:
- Expanded Google Gemini Support: Added full support for
gemini-2.5-proandgemini-2.5-flashmodels. - Enhanced Ollama Integration: Improved handling for Ollama models, including setting
http://localhost:11434as the default host and providing more informative error messages when Ollama models are not found. - Gemma 3n Model Support: Integrated support for
gemma3n,gemma3n:e2b, andgemma3n:e4bmodels via Ollama. - Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.
Bug Fixes:
- Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
- Fixed an issue where the application would attempt to use an incorrect model name for
gemini-2.5-flash-lite(which is not a valid model name). - Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.
2.0.2
New Features:
- Interactive Model Selection: When running
operatewithout specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model. - Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or
.envfile. - Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (
CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.
Improvements:
- Expanded Google Gemini Support: Added full support for
gemini-2.5-proandgemini-2.5-flashmodels. - Enhanced Ollama Integration: Improved handling for Ollama models, including setting
http://localhost:11434as the default host and providing more informative error messages when Ollama models are not found. - Gemma 3n Model Support: Integrated support for
gemma3n,gemma3n:e2b, andgemma3n:e4bmodels via Ollama. - Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.
Bug Fixes:
- Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
- Fixed an issue where the application would attempt to use an incorrect model name for
gemini-2.5-flash-lite(which is not a valid model name). - Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.
2.0.0
New Features:
- Interactive Model Selection: When running
operatewithout specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model. - Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or
.envfile. - Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (
CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.
Improvements:
- Expanded Google Gemini Support: Added full support for
gemini-2.5-proandgemini-2.5-flashmodels. - Enhanced Ollama Integration: Improved handling for Ollama models, including setting
http://localhost:11434as the default host and providing more informative error messages when Ollama models are not found. - Gemma 3n Model Support: Integrated support for
gemma3n,gemma3n:e2b, andgemma3n:e4bmodels via Ollama. - Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.
Bug Fixes:
- Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
- Fixed an issue where the application would attempt to use an incorrect model name for
gemini-2.5-flash-lite(which is not a valid model name). - Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.



