19  Week 5 Deliverable – MARL-QMIX-Warehouse-Robots

Author

Dre Simmons

20 Week 5 Summary and Deliverable Context

Week 5 required the preparation of code documentation for the MARL-QMIX-Warehouse-Robots project. This deliverable involves clearly documenting the structure, dependencies, installation process, training pipeline, and functional behavior of the codebase so that other developers and researchers can understand, reproduce, and extend the project.

In support of that objective, this document provides:

  • A technical overview of the project architecture
  • Documentation of all required installations (Unity, ML-Agents 4.0, Sacred, Python dependencies)
  • Troubleshooting guidance for installation and runtime failures
  • Explanation of how the training code interacts with Unity
  • Descriptions of major configuration files, network components, and environment wrappers
  • A breakdown of experimental results and diagnostics
  • A reflection section highlighting issues, limitations, and next steps

This QMD therefore serves as both the Week 5 code documentation deliverable and a detailed technical report summarizing the current state of the project.


21 Project Overview

The MARL-QMIX-Warehouse-Robots project focuses on multi-agent reinforcement learning (MARL) within a cooperative warehouse environment built in Unity. The system trains multiple autonomous robots using the QMIX value-factorization algorithm. The project integrates Unity ML-Agents for simulation and EPyMARL for multi-agent learning.

21.0.1 Key Features

  • Centralized training with decentralized execution
  • QMIX mixing network for cooperative MARL
  • Unity ML-Agents 4.0 for simulation and environment communication
  • Procedurally generated warehouse layouts
  • Discrete grid-based navigation and agent action space
  • Sacred logging for reproducible experiments

22 Repository Structure

MARL-QMIX-Warehouse-Robots/
├── epymarl/
│   ├── src/
│   │   ├── config/algs/
│   │   │   └── qmix_warehouse_improved.yaml
│   │   ├── envs/
│   │   │   ├── unity_wrapper.py
│   │   │   └── warehouse_env.py
│   │   ├── learners/
│   │   ├── modules/
│   │   └── main.py
│   ├── requirements.txt
│   └── env_requirements.txt
├── WarehouseProjectURP/
│   ├── Assets/Scenes/Warehouse.unity
│   ├── Assets/Scripts/
│   └── Packages/
├── com.unity.robotics.warehouse.base/
└── com.unity.robotics.warehouse.urp/

23 Installing Unity, ML-Agents 4.0, and Sacred

23.1 1. Installing Unity

Unity is required to execute the warehouse simulation and connect it to the EPyMARL training pipeline.

23.1.1 1.1 Install Unity Hub

  1. Navigate to: https://unity.com/download
  2. Download and install Unity Hub
  3. Launch Unity Hub

23.1.2 1.2 Install a Compatible Unity Editor Version

The project supports Unity 2021.1+, with best results on Unity 6 (2023 LTS).

Install via: 1. Unity Hub → InstallsAdd
2. Select Unity 2021.1 / 2021.3 LTS / Unity 6
3. Add modules: - Windows/macOS Build Support
- Linux Build Support (optional)
- IL2CPP
- Documentation (optional)

23.1.3 1.3 Add the Warehouse Project

  1. Unity Hub → Projects → Add

  2. Select:

    WarehouseProjectURP/
  3. Open the project and allow for asset compilation


23.2 2. Installing Unity ML-Agents 4.0

23.2.1 2.1 Unity-side Installation

Inside Unity: 1. Window → Package Manager
2. Verify installation of: - ML Agents
- Barracuda
- Input System (optional)

23.2.2 2.2 Python-side Installation

ML-Agents 4.0 corresponds to Python package version 0.30.0:

pip install mlagents==0.30.0
pip install mlagents-envs==0.30.0

23.2.3 2.3 Communication Port Verification

Unity must show:

[INFO] Listening on port 5004

Python must show:

UnityEnvironment initialized

23.3 3. Installing Sacred

Sacred provides experiment tracking and reproducible logging.

23.3.1 3.1 Install Sacred

pip install sacred==0.8.7

23.3.2 3.2 Sacred Logging Directory

Logs are stored in:

epymarl/results/sacred/

23.3.3 3.3 Test Integration

Run:

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse with t_max=1000

Expected:

INFO - qmix - Started run with ID "1"

24 Python Environment Setup

24.0.1 Create Virtual Environment

python3 -m venv epymarl_env
source epymarl_env/bin/activate

24.0.2 Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt
pip install -r env_requirements.txt

Key dependencies include: - torch
- sacred
- numpy
- pyyaml
- mlagents
- mlagents-envs


25 Troubleshooting Installation and Runtime Issues

The installation of Unity ML-Agents 4.0 and Sacred can occasionally produce environment conflicts, import errors, or version mismatches. The following guidance summarizes common failure cases and corrective actions.

25.1 1. ML-Agents Installation Issues

25.1.1 1.1 Version Mismatch Errors

Symptoms: - ImportError: No module named 'mlagents'
- Unity editor communication failures
- Training script hanging while waiting on the environment

Corrective Action:

pip uninstall mlagents mlagents-envs -y
pip install mlagents==0.30.0
pip install mlagents-envs==0.30.0

25.1.2 1.2 Port Not Found / Environment Not Connecting

Symptoms: - Python timeout errors
- Unity does not display “Listening on port 5004”

Corrective Steps: 1. Close all Unity instances
2. Reopen the Warehouse project
3. Press Play and confirm port 5004
4. Start Python script only after Unity is listening

To kill a blocked port:

sudo lsof -i :5004
kill -9 <PID>

25.1.3 1.3 Barracuda Inference Errors

Symptoms:
Unity error messages referencing the inference engine or model loading

Corrective Action:
Reinstall Barracuda in Unity’s Package Manager with a matching version.


25.2 2. Sacred Installation Issues

25.2.1 2.1 Sacred Fails to Import

Corrective Action:

pip uninstall sacred -y
pip install sacred==0.8.7

25.2.2 2.2 Sacred Logging Errors

Symptoms: - KeyError: 'config'
- Missing or incomplete run directories

Corrective Steps:

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse

Ensure directory exists:

epymarl/results/sacred/

Fix permissions if needed:

sudo chmod -R 777 epymarl/results

25.3 3. Virtual Environment Conflicts

Symptoms: - Packages installed but not detected
- pip points to system Python

Check paths:

which python
which pip

Expected:

.../epymarl_env/bin/python

If incorrect:

source epymarl_env/bin/activate

25.4 4. Unity or Python Training Freezing

Corrective Measures: - Use a Unity standalone build for long runs
- Restart Unity after long sessions
- Enable checkpointing (save_model_interval: 100000)


25.5 5. Missing Dependencies

Corrective Action:

pip install -r requirements.txt
pip install -r env_requirements.txt

26 Training Instructions

26.0.1 Start Python Training

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse with t_max=500000

26.0.2 Start Unity Simulation

  • Open the Warehouse scene
  • Press Play

26.0.3 Monitor Logs

Example:

return_mean: 25.42
epsilon: 0.95

27 Configuration Documentation

Configuration file:

epymarl/src/config/algs/qmix_warehouse_improved.yaml

27.0.1 Hyperparameters

lr: 0.001
batch_size: 16
buffer_size: 5000
target_update_interval: 200

27.0.2 Exploration

epsilon_start: 1.0
epsilon_finish: 0.1
epsilon_anneal_time: 200000

27.0.3 Neural Architecture

agent: "rnn"
rnn_hidden_dim: 64
mixer: "qmix"
mixing_embed_dim: 32
hypernet_layers: 2
hypernet_embed: 64

28 Environment Documentation

28.0.1 Action Space

  • 0 — Turn Left
  • 1 — Turn Right
  • 2 — Move Forward
  • 3 — Load/Unload Shelf
  • 4 — No-op

28.0.2 Reward Structure

  • Positive reward for shelf delivery
  • Collision penalty
  • Small step penalty

29 Training Results (Week 5)

29.0.1 Training Run #93

Metric Value
Final Training Return 207.96
Final Test Return 49.29
Steps Completed 350,199 / 500,000
Final Q-value 2.398
Final Epsilon 0.10

29.0.2 Learning Curve

Steps Return Test Return Epsilon
10k 13.6 0.03 0.95
100k 50.6 0.05 0.55
200k 156.8 0.03 0.10
300k 228.4 0.08 0.10
350k 207.96 49.29 0.10

30 Critical Observations

  • Agents perform well under exploration (ε = 0.1–1.0).
  • Behavior collapses under pure greedy policy execution (ε = 0).
  • The large exploration–evaluation gap indicates reliance on random behavior, not policy learning.

31 Hypothesized Causes

  • Sparse rewards
  • Insufficient observations
  • QMIX monotonic mixing limitation
  • Early epsilon annealing
  • High environmental complexity

32 Future Directions

  • Reward shaping
  • Curriculum learning
  • Expanded observation space
  • Curiosity-driven exploration
  • Alternative MARL algorithms (VDN, Weighted QMIX)

33 Reflection (Week 5)

Week 5 focused on documenting the MARL-QMIX-Warehouse-Robots codebase and analyzing system behavior during extended training. The documented divergence between training and evaluation performance highlights the difficulty of value-based coordination in sparse-reward, multi-agent environments. The code documentation presented here clarifies the structure of the repository, configuration files, installation procedures, dependency management, and operational workflow required to run the MARL system. It also identifies key limitations and areas that require modification—such as reward design, observation space, and exploration strategy—to enable successful policy learning in future iterations.


34 References

  1. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML.
  2. Unity Technologies. (2023). Unity ML-Agents Toolkit Documentation.
  3. Samvelyan, M. et al. (2019). The StarCraft Multi-Agent Challenge. AAMAS.
  4. Vinyals, O. et al. (2019). Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning. Nature.
  5. EPyMARL Documentation (2025). https://github.com/uoe-agents/epymarl