Overcooked Environment #355

mmbajo · 2025-09-13T15:20:18Z

This PR introduces a Overcooked cooking game environment for PufferLib. Sprites were from OvercookedAI

- Introduced core files for the Overcooked multiagent environment, including `binding.c`, `overcooked.c`, `overcooked.h`, and `overcooked.py`. - Implemented initialization, logging, and step functions for agent interactions. - Added rendering and cleanup functionalities for the environment. - Provided a template for users to create their own multiagent environments based on Overcooked.

- Updated `binding.c`, `overcooked.c`, and `overcooked.h` to support a single-agent Overcooked environment. - Replaced multi-agent parameters with single-agent equivalents, including max steps and grid size. - Enhanced observation and action spaces to reflect the new gameplay structure. - Implemented item handling and grid management for the cooking environment. - Adjusted rendering logic to visualize the agent and items correctly.

- Added entry to ignore all dsym files to prevent them from being tracked in the repository.

- Changed default dimensions of the Overcooked environment from 10x10 to 5x5 in both C and Python implementations. - Updated grid size from 50 to 100 for better visualization. - Introduced a new cramped room layout for testing. - Enhanced grid parsing logic to accommodate the new layout and item types. - Adjusted agent starting position and rendering logic for improved gameplay experience.

- Updated the `is_valid_position` function to check for EMPTY instead of WALL, ensuring correct position validation within the grid.

- Implemented logic for agent interactions, allowing the agent to pick up items, put down items on specific tiles, and retrieve ingredients from boxes. - Enhanced the `handle_interaction` function to manage the agent's held item based on the current tile and agent's facing direction. - Added boundary checks to ensure valid interactions within the grid.

- Introduced helper functions for item management: `get_item_at`, `add_item`, and `remove_item`. - Implemented `get_agent_color` function to determine the agent's color based on the held item. - Updated rendering logic to draw the agent with the appropriate color based on the item they are holding.

- Introduced various chef images (EAST, NORTH, SOUTH, WEST) with different hats and dishes. - Added individual item images such as arrows and interaction icons. - Included object images for dishes, onions, and pots. - Added multiple soup images representing different cooking stages and states. - Updated terrain images for counters, dishes, and ingredients.

- Added new terrain textures for floor, counter, pot, serve, and ingredient boxes. - Introduced object textures for onions, tomatoes, dishes, and soups. - Implemented chef sprite textures for all directions and held items. - Updated rendering logic to utilize textures for grid tiles and items, improving visual fidelity. - Added texture unloading in cleanup to manage resources effectively.

- Changed chef sprite texture file names to remove specific hat identifiers, streamlining asset management. - Ensured consistency in texture naming for all chef directions (NORTH, SOUTH, EAST, WEST).

- Introduced cooking states and parameters for managing cooking pots. - Implemented logic for adding ingredients to pots, starting cooking, and handling cooked or burnt states. - Enhanced rendering to display cooking progress and states visually on stoves. - Added functions for initializing and updating cooking pots, ensuring proper resource management during gameplay.

- Introduced a new item type for plated soup, allowing players to hold and place soups with ingredient information. - Updated agent and item structures to track soup ingredients and states. - Modified interaction logic to require a plate for picking up cooked soup and to preserve ingredient data when placing plated soup. - Enhanced rendering logic to display appropriate textures for plated soups and their respective ingredients. - Added new chef sprite textures for holding soups, improving visual representation during gameplay.

- Introduced a new configuration file `overcooked.ini` for the Overcooked environment. - Defined parameters for the base, environment, and training settings, including package name, environment name, number of environments, agents, goals, and training hyperparameters. - This file will facilitate easier adjustments to gameplay settings and training configurations.

- Reduced the number of agents from 8 to 2 in the `overcooked.ini` configuration file. - This change aims to streamline gameplay dynamics and enhance performance by limiting the number of active agents in the environment.

- Updated the Overcooked environment to support multiple agents, allowing cooperative play. - Introduced new parameters for agent management, including `num_agents` in the environment configuration. - Modified initialization, action handling, and rendering logic to accommodate multiple agents. - Enhanced interaction mechanics to track actions and states for each agent, improving gameplay dynamics. - Updated documentation and comments to reflect the transition from a single-agent to a multi-agent environment.

- Updated the observation size calculation to reflect a one-hot encoded grid with detailed channel breakdown. - Enhanced the `compute_observations` function to include terrain types, item types, agent positions, and cooking states. - Added logic for encoding agent states and cooking progress, improving the observation data structure for multi-agent gameplay. - Included debug information for initial observations to assist in development and testing.

- Updated the observation size calculation to a flat array format, simplifying the structure for multi-agent gameplay. - Enhanced the `compute_observations` function to include detailed agent states, item positions, and cooking pot information. - Improved clarity in comments regarding the observation components, aiding future development and understanding of the observation space. - Removed outdated one-hot encoding logic, streamlining the observation process for better performance.

- Implemented a new function to evaluate served dishes and assign rewards based on ingredient correctness. - Added logic to handle the completion of serving a plated soup, including clearing the agent's held items. - Enhanced interaction mechanics to incorporate dish evaluation, improving gameplay dynamics and agent cooperation. - Updated comments for clarity on the new reward structure and potential future enhancements.

- Increased the maximum steps from 200 to 400 to allow for longer gameplay sessions. - Adjusted reward values for served dishes from 10.0 to 1.0 and step penalty from -0.1 to 0.0 to refine the reward system and enhance player experience. - These changes aim to improve the overall dynamics and strategy within the Overcooked environment.

…onment - Introduced a new function to evaluate dishes served by agents, improving gameplay dynamics. - Updated rendering logic to accommodate an additional status display area, enhancing visual feedback during gameplay. - Adjusted drawing positions for various elements to account for the new status area, ensuring a clear and organized interface. - Improved comments for clarity on the new rendering adjustments and dish evaluation mechanics.

- Included 'overcooked' as a new game type in the MAKE_FUNCTIONS dictionary. - This addition expands the available game modes, enhancing the versatility of the environment.

- Updated the Overcooked demo to include neural network functionality, allowing agents to make decisions based on learned weights. - Replaced random action selection with a neural network for agent actions, enhancing gameplay dynamics. - Added a performance testing function to evaluate agent actions over a specified time period. - Included necessary weight files for neural network operations, improving the overall complexity and strategy of the game.

mmbajo · 2025-09-13T15:33:04Z

Currently training model for demo. Model is not producing meaningful actions. Still debugging.

- Introduced new metrics to the Log structure, including correct and wrong dishes, ingredients picked, pots started, items dropped, agent collisions, and cooking time efficiency. - Updated relevant functions to increment these statistics during gameplay, enhancing performance tracking and gameplay analysis. - Initialized user stats upon environment reset to ensure accurate tracking from the start of each episode.

- Removed penalty for serving wrong dishes, now only tracking the count of wrong dishes. - This change simplifies the reward system and focuses on performance metrics without penalizing agents for incorrect actions.

- Simplified the dish evaluation function by removing unnecessary comments and streamlining the tracking of correct and wrong dishes. - Maintained the focus on performance metrics while ensuring clarity in the code structure.

- Updated the observation size to a fixed 96-dimensional vector per agent, improving clarity and consistency in agent state representation. - Introduced helper functions to find the nearest objects and items, streamlining the observation computation process. - Detailed the structure of the observation vector, including player features, teammate features, and absolute position, ensuring comprehensive agent information for gameplay. - Removed outdated comments and improved documentation for better understanding of the observation components.

- Updated the observation vector size from 96 to 76 dimensions, reflecting a more accurate representation of agent states. - Adjusted distance calculations to ensure proper type handling and consistency across functions. - Enhanced teammate feature extraction, including proximity to various objects and pot states, improving the overall observation detail. - Updated comments for clarity on the structure and purpose of the observation components, ensuring better understanding for future development.

…ed environment. Fix for displaying UserStats in CLI and logs. - Added initialization of the episode counter in the reset function to track the number of episodes accurately. - Updated performance metrics at the end of each episode, including normalization of dishes served and scoring based on episode returns. - These changes enhance the logging capabilities for better performance analysis and gameplay tracking.

… environment - Updated distance calculations to ensure proper type casting for agent coordinates, enhancing accuracy in proximity feature computations. - Removed unused boolean variable in nearest empty counter search, streamlining the code. - Adjusted wall detection logic to include type casting for agent coordinates, improving consistency in wall checks. - These changes enhance the overall functionality and performance of the Overcooked environment.

- Changed item position coordinates from float to int for improved grid positioning accuracy. - Adjusted distance calculations to ensure proper type casting for agent coordinates. - Reduced frame rate from 60 to 4 FPS to optimize rendering performance. - These changes enhance the overall functionality and performance of the Overcooked environment.

…ment - Introduced a new wall texture to enhance the visual representation of the game environment. - Updated the grid configuration to include wall elements, improving the layout and gameplay dynamics. - Adjusted rendering logic to utilize the new wall texture, ensuring proper display during gameplay. - These changes enhance the overall functionality and aesthetics of the Overcooked environment.

- Replaced multiple calls to find_nearest_object and find_nearest_item with a unified compute_proximity_feature function for consistency and clarity. - Updated comments to better describe the purpose of each proximity feature calculation, enhancing code readability. - These changes improve the maintainability and functionality of the observation computation in the Overcooked environment.

…ooked environment - Updated item type definitions to start from 10 to avoid collision with grid tiles, improving clarity and organization. - Removed deprecated helper functions for finding nearest objects and items, streamlining the codebase. - Enhanced the observation panel by adding detailed agent and teammate features, improving the visibility of game state information. - These changes improve code maintainability and enhance the overall functionality of the Overcooked environment.

…d environment - Increased the observation vector size from 77 to 83 dimensions to accommodate additional features. - Refactored absolute position calculations for agents, improving clarity and added debug output for initial steps. - Updated relevant comments to reflect changes, ensuring consistency across the codebase. - These modifications enhance the agent's state representation and improve overall functionality in the Overcooked environment.

- Simplified the absolute position calculation for agents by directly assigning the computed values to the observation array. - Removed debug output for absolute position, streamlining the code while maintaining functionality. - These changes enhance code clarity and maintainability in the Overcooked environment.

- Removed deprecated comments and unnecessary lines to enhance code clarity and maintainability. - Consolidated item type definitions and cooking parameters for better organization. - Streamlined the structure of the Client and Overcooked structs by removing redundant comments. - These changes improve the overall readability and maintainability of the Overcooked environment codebase.

- Increased the reward for serving a dish from 1.0 to 20.0 to enhance gameplay dynamics. - Modified the evaluate_dish_served function to include agent index for targeted reward assignment. - Added incremental rewards for agents interacting with cooking pots, improving feedback for actions. - These changes enhance the reward structure and overall agent interaction within the Overcooked environment.

- Increased learning rate from 0.015 to 0.04 to accelerate training. - Adjusted entropy coefficient from 0.02 to 0.05 to encourage exploration. - Added GAE lambda and clip coefficient for improved training stability. - These changes aim to enhance the training efficiency and performance of agents in the Overcooked environment.

mmbajo · 2025-09-22T12:39:42Z

Added intermediate reward shaping to the Overcooked environment to encourage cooperative cooking behavior
and provide more frequent learning signals.

Changes

Onion to pot: +0.1 reward when an agent adds an onion to a pot
Correct recipe start: +0.1 reward when starting to cook a pot with exactly 3 onions (the target
recipe)
Soup plating: +0.1 reward when transferring a cooked soup from pot to plate
Dish serving:
- Correct recipe (3 onions): +5.0 to serving agent, +20.0 to all agents
- Incorrect recipe: +0.1 to all agents (small consolation reward)

I am now getting decent performance trajectories when training. But I do need some help. Still not sure if this will workout. 🙇

Explained Variance in the positive region! I assume this is a good sign?

Any advice what to try? or change?

- Changed the environment to support a single agent instead of two, updating relevant variables and comments. - Adjusted neural network weights loading to reflect the new configuration. - Updated action control comments to clarify single agent controls. - Ensured observation size remains consistent with the new setup, maintaining functionality and clarity in the codebase.

mmbajo · 2025-09-26T05:29:23Z

Hmm... tried training with 1 agent, net can't fully learn how to cook.

Hadrien-Cr · 2025-09-27T09:52:14Z

do you mind describing your reward structure and rules ? maybe i can help

…o one neuralnet not separate ones - therefore mirroring is redundant.

- Introduced a TODO comment regarding the handling of Tomatoes in the ingredient box. - Added a note to implement logging for each ingredient type to improve tracking of items picked by agents. - These changes aim to enhance the clarity and future extensibility of the ingredient management system.

…d function - Introduced a TODO comment suggesting the creation of a struct for easier modification of reward distribution among agents. - This change aims to improve the flexibility and maintainability of the reward system in the Overcooked environment.

…ed environment - Eliminated the DEBUG_OBSERVATIONS flag and associated print statements to clean up the code. - Added a TODO comment to indicate future implementation of tomatoes in the ingredient handling. - Updated comments for clarity regarding the current rules for dish evaluation. - These changes aim to enhance code clarity and prepare for future ingredient management improvements.

…observations at time of writing - Introduced a comprehensive README file detailing the Overcooked environment, including observation and action spaces, reward system, and recipe instructions. - This addition aims to provide clear documentation for users and developers, facilitating understanding and usage of the multi-agent cooking coordination environment.

mmbajo · 2025-10-09T04:13:23Z

@Hadrien-Cr Hello! I have written a concise README describing the rewards and observations. https://github.com/mmbajo/PufferLib/tree/roze-overcooked-dev/pufferlib/ocean/overcooked

…environment - Reduced the maximum number of ingredients from 5 to 3 to refine gameplay dynamics. - Adjusted the observation logic to check for exactly MAX_INGREDIENTS instead of 3 or more, enhancing the accuracy of pot state observations. - These changes aim to improve the clarity and functionality of the Overcooked environment.

- Replaced individual reward parameters with a structured RewardConfig to streamline reward management. - Updated initialization and handling of rewards for various actions, enhancing clarity and maintainability. - Adjusted relevant code sections to utilize the new reward structure, ensuring consistent reward distribution across agents.

- Removed BURNT state from cooking states to simplify cooking logic. - Adjusted related code sections to reflect the removal of BURNT, including cooking progress updates and rendering logic. - Updated README to reflect changes in cooking time and ingredient limits, ensuring consistency with the codebase.

- Moved the increment of dishes_served to the correct location in the evaluate_dish_served function to ensure accurate tracking of served dishes. - Updated performance calculation in c_step to normalize based on correct_dishes instead of dishes_served, improving reward accuracy for agents.

- Updated the observation logic to check for any non-EMPTY tile (including walls, stoves, counters, serving areas, ingredient boxes, and cutting boards) instead of just walls, stoves, and counters. This change improves the accuracy of wall detection for agents navigating the environment.

mmbajo added 23 commits August 11, 2025 12:26

Update .gitignore to include dsym files

55d0987

- Added entry to ignore all dsym files to prevent them from being tracked in the repository.

Fix position validation in Overcooked environment

319e514

- Updated the `is_valid_position` function to check for EMPTY instead of WALL, ensuring correct position validation within the grid.

Update chef sprite textures in Overcooked environment

c5a93b8

- Changed chef sprite texture file names to remove specific hat identifiers, streamlining asset management. - Ensured consistency in texture naming for all chef directions (NORTH, SOUTH, EAST, WEST).

Edit the rendering logic for ingredient box. We only use Onions for now.

9fa3cfd

Update Overcooked configuration to adjust agent settings

190fd9d

- Reduced the number of agents from 8 to 2 in the `overcooked.ini` configuration file. - This change aims to streamline gameplay dynamics and enhance performance by limiting the number of active agents in the environment.

Add Overcooked game type to environment configuration

7e9ca00

- Included 'overcooked' as a new game type in the MAKE_FUNCTIONS dictionary. - This addition expands the available game modes, enhancing the versatility of the environment.

mmbajo added 6 commits September 14, 2025 00:50

Refactor dish evaluation logic in Overcooked environment

18176f6

- Removed penalty for serving wrong dishes, now only tracking the count of wrong dishes. - This change simplifies the reward system and focuses on performance metrics without penalizing agents for incorrect actions.

Refactor dish evaluation logic in Overcooked environment

d2f7ff7

- Simplified the dish evaluation function by removing unnecessary comments and streamlining the tracking of correct and wrong dishes. - Maintained the focus on performance metrics while ensuring clarity in the code structure.

mmbajo and others added 11 commits September 20, 2025 14:04

Merge branch '3.0' into roze-overcooked-dev

4df28fd

mmbajo added 3 commits September 22, 2025 23:08

This config gets over 0.5 explained variance!

60d47cb

Test 1 agent config to verify learning - still cant learn fully

30f44fb

mmbajo added 5 commits October 7, 2025 12:35

Remove teammate mirroring since its redundant - we put everything int…

7cb4dbc

…o one neuralnet not separate ones - therefore mirroring is redundant.

mmbajo added 7 commits October 9, 2025 13:23

Update training parameters

3435db1

Update readme

6add1dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overcooked Environment #355

Overcooked Environment #355

Uh oh!

mmbajo commented Sep 13, 2025

mmbajo commented Sep 13, 2025

mmbajo commented Sep 22, 2025

mmbajo commented Sep 26, 2025

Hadrien-Cr commented Sep 27, 2025

mmbajo commented Oct 9, 2025

Labels

2 participants

Overcooked Environment #355

Are you sure you want to change the base?

Overcooked Environment #355

Uh oh!

Conversation

mmbajo commented Sep 13, 2025

mmbajo commented Sep 13, 2025

mmbajo commented Sep 22, 2025

Changes

mmbajo commented Sep 26, 2025

Hadrien-Cr commented Sep 27, 2025

mmbajo commented Oct 9, 2025

Labels

2 participants