Skip to content

Releases: malah-code/self-ai-operating-computer

v2.0.15

07 Jul 16:19

Choose a tag to compare

Version v2.0.15 (Latest) Release Summary

New Features:

  • Centralized Model Management: All model configurations are now managed in a single file (operate/models/model_configs.py), making it easier to add, remove, and manage models.
  • Expanded Ollama Model Support: Added support for qwen2.5vl:3b and gemma3:4b.
  • Enhanced Debugging: Added a -d flag (alias for --verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.

Improvements:

  • Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.

Bug Fixes:

  • Fixed an issue where the model selection screen was not correctly displaying all available models.
  • Resolved an IndentationError in the model configuration file.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

Key Features

  • Compatibility: Designed for various multimodal models.
  • Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including e2b and e4b variants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
  • Future Plans: Support for additional models.

v2.0.14

07 Jul 14:29

Choose a tag to compare

Version v2.0.14 (Latest) Release Summary

New Features:

  • Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
  • Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
  • Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

  • Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
  • Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
  • Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

  • Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
  • Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
  • Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

2.0.11

07 Jul 13:55

Choose a tag to compare

2.0.11

2.0.10

07 Jul 13:48

Choose a tag to compare

2.0.10

2.0.9

07 Jul 13:38

Choose a tag to compare

2.0.9

2.0.7

06 Jul 22:02

Choose a tag to compare

v2.0.7 Release version 2.0.7

2.0.6

06 Jul 21:57

Choose a tag to compare

Version v2.0.5 (Latest) Release Summary

New Features:

  • Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
  • Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
  • Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

  • Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
  • Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
  • Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

  • Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
  • Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
  • Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

2.0.5

06 Jul 21:50

Choose a tag to compare

Version v2.0.5 (Latest) Release Summary

New Features:

  • Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
  • Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
  • Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

  • Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
  • Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
  • Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

  • Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
  • Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
  • Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

2.0.2

04 Jul 21:10

Choose a tag to compare

New Features:

  • Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
  • Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
  • Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

  • Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
  • Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
  • Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

  • Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
  • Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
  • Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.

2.0.0

04 Jul 15:27

Choose a tag to compare

New Features:

  • Interactive Model Selection: When running operate without specifying a model, a welcome screen is displayed, followed by an interactive menu to select your desired model.
  • Dynamic API Key Prompting: The application now intelligently prompts for required API keys (e.g., OpenAI, Google, Anthropic) only when a model requiring that key is selected and the key is not found in your environment variables or .env file.
  • Custom System Prompt: Users can now provide a custom system prompt from a file or an environment variable (CUSTOM_SYSTEM_PROMPT). If the environment variable is set, the option to load from it will be hidden.

Improvements:

  • Expanded Google Gemini Support: Added full support for gemini-2.5-pro and gemini-2.5-flash models.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including setting http://localhost:11434 as the default host and providing more informative error messages when Ollama models are not found.
  • Gemma 3n Model Support: Integrated support for gemma3n, gemma3n:e2b, and gemma3n:e4b models via Ollama.
  • Robust Error Handling: Improved error handling for API calls to prevent unexpected fallbacks and provide clearer error messages.

Bug Fixes:

  • Resolved an issue where the application would incorrectly prompt for an OpenAI API key when a Google Gemini model was selected.
  • Fixed an issue where the application would attempt to use an incorrect model name for gemini-2.5-flash-lite (which is not a valid model name).
  • Addressed the "Extra data" JSON parsing error when receiving responses from Gemini models.