MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

MEAL is the first Continual Multi‑Agent Reinforcement Learning (CMARL) benchmark built around cooperative Overcooked‑style tasks, implemented in JAX for high‑performance training and evaluation. It focuses on learning over extensive sequences of procedurally generated tasks without catastrophic forgetting, across different team sizes, difficulty level, and reward settings.

Key Features

JAX/Flax implementation for scalable, accelerated training
Procedurally generated cooperative tasks with adjustable difficulty
Built‑in continual learning regularizers and memory methods
Multi‑agent baselines: IPPO and MAPPO
Results tooling: W&B integration, download utilities, and plotting scripts

Installation

Requires Python 3.10.

# Create and activate an environment (Conda example)
conda create -n meal python=3.10 -y
conda activate meal

# Install MEAL in editable mode and optional extras
pip install -e .
pip install -e ".[viz]"
pip install -e ".[utils]"

# Optional: GPU acceleration for JAX (pick your CUDA version)
pip install -U "jax[cuda12]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# or
pip install -U "jax[cuda11]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Quick Start

The main entry points are:

IPPO: experiments/ippo.py
MAPPO: experiments/mappo.py

Example: IPPO + EWC on generated medium tasks

python -m experiments.ippo \
  --cl-method ewc \
  --seq-length 10 \
  --strategy generate \
  --difficulty medium \
  --num-agents 2 \
  --num-envs 2048 \
  --num-steps 400 \
  --update-epochs 8 \
  --use-wandb true \
  --project MEAL \
  --seed 1

Example: MAPPO + MAS with CNN encoder and 4 agents

python -m experiments.mappo \
  --cl-method mas \
  --use-cnn true \
  --num-agents 4 \
  --seq-length 8 \
  --strategy generate \
  --difficulty hard \
  --use-wandb true \
  --project MEAL \
  --seed 2

Running Experiments

For running experiments, please refer to experiments/README.MD.

Environments

MEAL composes continual learning sequences from generated task layouts. The layouts can be created across difficulty levels. The level affects the grid size, obstacle density, and severity of non-stationary components. Example layouts:

Easy	Medium	Hard

More details about MEAL environments can be found in meal/README.MD.

Project Structure

experiments/
- ippo.py, mappo.py: training entry points
- continual/: implementations of EWC, MAS, L2, FT, AGEM
- results/: W&B downloaders and plotting scripts
meal/
- env/: layouts and utilities
- wrappers/: logging and environment wrappers
- visualization/: rendering utilities
tests/: smoke tests and image comparisons

Acknowledgments

The Overcooked environment is based on JaxMARL.
Our experiments were managed using WandB.

Citation

If you use our work in your research, please cite it as follows:

@article{tomilin2025meal,
  title={MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning},
  author={Tomilin, Tristan and van den Boogaard, Luka and Garcin, Samuel and Ruhdorfer, Constantin and Grooten, Bram and Bulling, Andreas and Pechenizkiy, Mykola and Fang, Meng},
  journal={arXiv preprint arXiv:2406.01234},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
assets		assets
examples		examples
experiments		experiments
meal		meal
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Key Features

Installation

Quick Start

Example: IPPO + EWC on generated medium tasks

Example: MAPPO + MAS with CNN encoder and 4 agents

Running Experiments

Environments

Project Structure

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

TTomilin/MEAL

Folders and files

Latest commit

History

Repository files navigation

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Key Features

Installation

Quick Start

Example: IPPO + EWC on generated medium tasks

Example: MAPPO + MAS with CNN encoder and 4 agents

Running Experiments

Environments

Project Structure

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages