If you encounter any difficulties in using or reproducing the code, please contact me at zhaoyangyu713@gmail.com.
ReCode introduces recursive code generation for LLM agents, unifying plan and action into a single representation. By treating high-level plans as placeholder functions that recursively decompose into executable primitives, it achieves universal granularity control and dynamically adapts from strategic thinking to concrete actions. This repository hosts the reference implementation used in the paper, along with environment wrappers and experiment tooling.
ReCode adopts a divide-and-conquer strategy, decomposing complex tasks into executable code fragments:
- Tree-structured code: Organizes partial programs in a tree where each node captures one sub-task and records its execution trace.
- Recursive expansion: Placeholder functions are expanded by the LLM into more specific calls or smaller subroutines using environment-specific prompts and few-shots.
- Dynamic execution loop: Each node is executed immediately; fresh observations decide whether to expand further, retry, or finish.
- Shared executor state: A constrained Python executor maintains environment variables, validates code blocks, and exposes the toolset available to the agent.
run.py– CLI entry point that instantiates agents/envs, manages concurrency, and writes run summaries.agents/recode/– ReCode agent implementation, prompt templates, and utility helpers.envs/– Environment wrappers and assets foralfworld,webshop, andsciworld.configs/– LLM profile templates and (expected) pricing metadata used by the async client.utils/– Shared components: async OpenAI wrapper, constrained executor, logging helpers, error types.figures/– Paper figures used throughout this README.
To evaluate the effectiveness of ReCode, we divide our experiments into the inference part and the training part.
- Inference Result: we compare against several mainstream paradigm (ReAct, CodeAct) and some of the work focused on improving LLM-based agent planning (AdaPlanner and ADaPT). ReCode achieved significant performance improvements across all three environments, with an average score of 60.8, surpassing the best baseline method by 10.5 (relative 20.9%). With our tests, ReCode can achieve a perfect 100 score in ALFWorld under
claude-4-sonnet.
- Training Result: we conduct supervised fine-tuning (SFT) on ReCode, ReAct and CodeAct with
Qwen2.5-7B-Instruct. ReCode+SFT achieves a strong average performance of 70.4% across all environments, surpassing both ReAct+SFT (67.6%) and CodeAct+SFT (55.8%).
We are refreshing this section and will publish the full walkthrough before 31 Oct (UTC).
-
configs/profiles.yamlcontains named profiles. Therun.py --profileflag selects which profile to forward toAsyncLLM. Example:models: default: api_key: "sk-your_api_key" base_url: "https://api.openai.com/v1" model: "gpt-4o-mini" temperature: 0.0 track_costs: true gpt-4o: api_key: "sk-your_other_key" base_url: "https://api.openai.com/v1" model: "gpt-4o" temperature: 0.7 max_tokens: 512
-
Cost tracking loads
configs/prices.json. If you do not want to record costs, settrack_costs: falsefor the profile. -
As a fallback, you can omit the file and set
OPENAI_API_KEYin the environment; the default profile will then use it.
-
Install
alfworld(already part of the Quick Start) and download the official dataset following the ALFWorld instructions. -
Set
ALFWORLD_DATAto the dataset root or editenvs/alfworld/base_config.yamlto point to your local paths:export ALFWORLD_DATA=/path/to/alfworld -
Optional filters such as
task_typesandmax_stepscan be supplied via YAML/CLI and are forwarded toAlfworldEnv.
- Install
scienceworldfrom the ScienceWorld repository.
-
Ensure
gdownis installed. -
Run the provided helper to fetch the goal set and pre-built search index:
bash envs/webshop/setup.sh
The script downloads Google Drive archives, extracts them into
envs/webshop/dataandenvs/webshop/search_index, and keeps the simulator underenvs/webshop/src. -
WebShopEnvexposes knobs such asmax_stepsandsuccess_thresholdthat can be overridden via config.
run.py is the canonical entry point. It resolves agent/environment aliases, manages concurrency, streams logs, and emits a structured summary.
# ALFWorld, single instance python run.py -a recode -e alfworld -n 1 --split test --profile default # WebShop, 3 test goals, allow deeper recursion python run.py -a recode -e webshop -n 3 --split test --profile default --max-depth 12 # ScienceWorld, run 5 instances with 2-way concurrency python run.py -a recode -e sciworld -n 5 -c 2 --profile gpt-4oKey CLI flags:
-a / --agent– class path or alias (recoderesolves toagents.recode.agent.ReCodeAgent).-e / --env– environment class or alias (alfworld,webshop,sciworld).-n / --instances– number of evaluation episodes.-c / --concurrent– max concurrent episodes (rich progress UI automatically adapts).--split,--seed,--max-depth,--profile– forwarded to both agent and environment.-C / --config– YAML file whose keys override CLI flags; useful for complex sweeps.
Example YAML (configs/example.yaml):
agent: recode env: alfworld instances: 10 concurrent: 2 profile: gpt-4o split: test task_types: ["put", "clean"] max_depth: 12 max_retry: 4Run it with:
python run.py -C configs/example.yaml- Each run creates
logs/<run_id>/with:running_logs/run.log– aggregated stream of agent + environment logs.running_logs/instance_<id>.log– per-instance traces (when multiple instances are launched).<results.json>– structured summary written bywrite_summary, containing per-instance metrics and aggregated statistics (overall + per task type).
- The console prints a condensed summary (success rate, standard metrics, by-task breakdown) after completion.
- Implement the
Envinterface underenvs/<your_env>/env.py. Usebase.environment.Envas the contract: implementreset,_run,is_done,is_success, andreport. Return{"observations": [...], "env_name": <name>, "env": self}fromreset. - Expose prompts and guidance in
agents/recode/resources/:prompts/<env_name>/actions.txt– concise description of validrun("...")calls/tools.fewshots/<env_name>/– one or more.txtexamples showing thought→execute patterns.- If your environment has task types, update
agents/recode/agent.py::_load_resourcesandagents/recode/utils.parse_raw_observationto parse initial observations correctly.
- Register aliases by adding your class to
ENV_ALIASESinrun.py(optional but convenient) and, if needed, plan-specific logic in the agent utilities. - Optionally add setup scripts (similar to
envs/webshop/setup.sh) to document dataset fetching.
You can embed the agent directly inside your own loop by reusing the provided utilities:
import asyncio from agents.recode.agent import ReCodeAgent from envs.alfworld.env import AlfworldEnv async def solve_once(): config = {"split": "test", "task_types": ["put"], "max_depth": 10} env = AlfworldEnv(logger=None) agent = ReCodeAgent() init_info = env.reset(config) agent.reset(config, init_info) observations = init_info["observations"] while not env.is_done(): actions = await agent.act(observations) observations = await env.run(actions) print(env.report()) await env.close() asyncio.run(solve_once())The same pattern works for any Env implementation; be sure to pass a logger if you need file-backed traces.
@misc{yu2025recodeunifyplanaction, title={ReCode: Unify Plan and Action for Universal Granularity Control}, author={Zhaoyang Yu and Jiayi Zhang and Huixue Su and Yufan Zhao and Yifan Wu and Mingyi Deng and Jinyu Xiang and Yizhang Lin and Lingxiao Tang and Yingchao Li and Yuyu Luo and Bang Liu and Chenglin Wu}, year={2025}, eprint={2510.23564}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2510.23564}, } 


