Skip to content

OpenMLRL/CoMLRL

Repository files navigation

OpenMLRL arXiv documentation Hugging Face

Python Version PyPI version Conda version PyPI downloads

CI pre-commit.ci Docs Build code style: black license: BSD-3-Clause

Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.

Installation

Install from PyPI

pip install comlrl # install PyTorch compatible with your device

Install from conda-forge

conda install -c conda-forge comlrl # install PyTorch compatible with your device

Install from source

To access the latest features, you can install CoMLRL from source:

git clone https://github.com/OpenMLRL/CoMLRL.git cd CoMLRL pip install -e . # install PyTorch compatible with your device

Features

  • MARL trainers to optimize LLM collaboration:

    • Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFROCE, MAGRPO, MARLOO, MAREMAX.
      • Aligned individual response joint with joint_mode='align'.
      • Memory-efficient cross joint with joint_mode='cross'.
    • Multi-Agent Actor-Critic: Critic-based policy gradient methods, including IAC and MAAC.
      • Independent actor-critic (separate critic or value-head over LLM backbone).
      • Centralized critic over joint prompts with separate actors.
  • Environments that simulate real-world tasks for training and evaluating LLM collaboration:

    • Writing Collaboration: Multiple LLM agents collaborate on processing articles.
      • TLDR - Summarizing Reddit posts.
      • ArXiv - Expanding abstracts into introductions.
    • Code Generation: Generate code solutions for programming problems.
      • MBPP - Mostly basic python problems.
      • HumanEval - Handwritten evaluation problems
      • CoopHumanEval - HumanEval with cooperative nature.
    • Code Completion: Complete code snippets based on given contexts.
      • ClassEval - Complete class-level code based on method stubs and docstrings.

Usage

Quick start by training 2 Qwen-2.5 agents to summarize Reddit posts with MAGRPO:

from datasets import load_dataset from transformers import AutoTokenizer from comlrl.trainers.magrpo import MAGRPOConfig, MAGRPOTrainer # Load dataset and tokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B") dataset = load_dataset("trl-lib/tldr", split="train").select(range(128)) # Initialize trainer and start training trainer = MAGRPOTrainer( model="Qwen/Qwen2.5-0.5B", num_agents=2, tokenizer=tokenizer, train_dataset=dataset, reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)], formatters=[lambda example: example["prompt"]] * 2, args=MAGRPOConfig( per_device_train_batch_size=1, ), ) trainer.train()

Contributing

We welcome contributions from the community! Please see contributing guidelines on setting up a development environment and contribute.

Thanks to the gracious help of contributors:


Shuo Liu

πŸ€” 🚧 πŸ’» πŸ“–

Tianle Chen

🚧 πŸ’» πŸ›

Ryan Amiri

🚧 πŸ’» πŸ›

Zeyu Liang

πŸ“– πŸ›
πŸ€”: Foundational Ideas; 🚧: Maintenance; πŸ’»: Code; πŸ“–: Documentation; πŸ›: Bug Report.

Citation

Please cite our paper if you find this library useful in your research:

@misc{liu2025comlrl, title={LLM Collaboration With Multi-Agent Reinforcement Learning}, author={Shuo Liu and Tianle Chen and Zeyu Liang and Xueguang Lyu and Christopher Amato}, year={2025}, eprint={2508.04652}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2508.04652}, }