19 Week 5 Deliverable – MARL-QMIX-Warehouse-Robots

Author

Dre Simmons

20 Week 5 Summary and Deliverable Context

Week 5 required the preparation of code documentation for the MARL-QMIX-Warehouse-Robots project. This deliverable involves clearly documenting the structure, dependencies, installation process, training pipeline, and functional behavior of the codebase so that other developers and researchers can understand, reproduce, and extend the project.

In support of that objective, this document provides:

A technical overview of the project architecture
Documentation of all required installations (Unity, ML-Agents 4.0, Sacred, Python dependencies)
Troubleshooting guidance for installation and runtime failures
Explanation of how the training code interacts with Unity
Descriptions of major configuration files, network components, and environment wrappers
A breakdown of experimental results and diagnostics
A reflection section highlighting issues, limitations, and next steps

This QMD therefore serves as both the Week 5 code documentation deliverable and a detailed technical report summarizing the current state of the project.

21 Project Overview

The MARL-QMIX-Warehouse-Robots project focuses on multi-agent reinforcement learning (MARL) within a cooperative warehouse environment built in Unity. The system trains multiple autonomous robots using the QMIX value-factorization algorithm. The project integrates Unity ML-Agents for simulation and EPyMARL for multi-agent learning.

21.0.1 Key Features

Centralized training with decentralized execution
QMIX mixing network for cooperative MARL
Unity ML-Agents 4.0 for simulation and environment communication
Procedurally generated warehouse layouts
Discrete grid-based navigation and agent action space
Sacred logging for reproducible experiments

22 Repository Structure

MARL-QMIX-Warehouse-Robots/
├── epymarl/
│   ├── src/
│   │   ├── config/algs/
│   │   │   └── qmix_warehouse_improved.yaml
│   │   ├── envs/
│   │   │   ├── unity_wrapper.py
│   │   │   └── warehouse_env.py
│   │   ├── learners/
│   │   ├── modules/
│   │   └── main.py
│   ├── requirements.txt
│   └── env_requirements.txt
├── WarehouseProjectURP/
│   ├── Assets/Scenes/Warehouse.unity
│   ├── Assets/Scripts/
│   └── Packages/
├── com.unity.robotics.warehouse.base/
└── com.unity.robotics.warehouse.urp/

23 Installing Unity, ML-Agents 4.0, and Sacred

23.1 1. Installing Unity

Unity is required to execute the warehouse simulation and connect it to the EPyMARL training pipeline.

23.1.1 1.1 Install Unity Hub

Navigate to: https://unity.com/download
Download and install Unity Hub
Launch Unity Hub

23.1.2 1.2 Install a Compatible Unity Editor Version

The project supports Unity 2021.1+, with best results on Unity 6 (2023 LTS).

Install via: 1. Unity Hub → Installs → Add
2. Select Unity 2021.1 / 2021.3 LTS / Unity 6
3. Add modules: - Windows/macOS Build Support
- Linux Build Support (optional)
- IL2CPP
- Documentation (optional)

23.1.3 1.3 Add the Warehouse Project

Unity Hub → Projects → Add
Select:
```
WarehouseProjectURP/
```
Open the project and allow for asset compilation

23.2 2. Installing Unity ML-Agents 4.0

23.2.1 2.1 Unity-side Installation

Inside Unity: 1. Window → Package Manager
2. Verify installation of: - ML Agents
- Barracuda
- Input System (optional)

23.2.2 2.2 Python-side Installation

ML-Agents 4.0 corresponds to Python package version 0.30.0:

pip install mlagents==0.30.0
pip install mlagents-envs==0.30.0

23.2.3 2.3 Communication Port Verification

Unity must show:

[INFO] Listening on port 5004

Python must show:

UnityEnvironment initialized

23.3 3. Installing Sacred

Sacred provides experiment tracking and reproducible logging.

23.3.1 3.1 Install Sacred

pip install sacred==0.8.7

23.3.2 3.2 Sacred Logging Directory

Logs are stored in:

epymarl/results/sacred/

23.3.3 3.3 Test Integration

Run:

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse with t_max=1000

Expected:

INFO - qmix - Started run with ID "1"

24 Python Environment Setup

24.0.1 Create Virtual Environment

python3 -m venv epymarl_env
source epymarl_env/bin/activate

24.0.2 Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt
pip install -r env_requirements.txt

Key dependencies include: - torch
- sacred
- numpy
- pyyaml
- mlagents
- mlagents-envs

25 Troubleshooting Installation and Runtime Issues

The installation of Unity ML-Agents 4.0 and Sacred can occasionally produce environment conflicts, import errors, or version mismatches. The following guidance summarizes common failure cases and corrective actions.

25.1 1. ML-Agents Installation Issues

25.1.1 1.1 Version Mismatch Errors

Symptoms: - ImportError: No module named 'mlagents'
- Unity editor communication failures
- Training script hanging while waiting on the environment

Corrective Action:

pip uninstall mlagents mlagents-envs -y
pip install mlagents==0.30.0
pip install mlagents-envs==0.30.0

25.1.2 1.2 Port Not Found / Environment Not Connecting

Symptoms: - Python timeout errors
- Unity does not display “Listening on port 5004”

Corrective Steps: 1. Close all Unity instances
2. Reopen the Warehouse project
3. Press Play and confirm port 5004
4. Start Python script only after Unity is listening

To kill a blocked port:

sudo lsof -i :5004
kill -9 <PID>

25.1.3 1.3 Barracuda Inference Errors

Symptoms:
Unity error messages referencing the inference engine or model loading

Corrective Action:
Reinstall Barracuda in Unity’s Package Manager with a matching version.

25.2 2. Sacred Installation Issues

25.2.1 2.1 Sacred Fails to Import

Corrective Action:

pip uninstall sacred -y
pip install sacred==0.8.7

25.2.2 2.2 Sacred Logging Errors

Symptoms: - KeyError: 'config'
- Missing or incomplete run directories

Corrective Steps:

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse

Ensure directory exists:

epymarl/results/sacred/

Fix permissions if needed:

sudo chmod -R 777 epymarl/results

25.3 3. Virtual Environment Conflicts

Symptoms: - Packages installed but not detected
- pip points to system Python

Check paths:

which python
which pip

Expected:

.../epymarl_env/bin/python

If incorrect:

source epymarl_env/bin/activate

25.4 4. Unity or Python Training Freezing

Corrective Measures: - Use a Unity standalone build for long runs
- Restart Unity after long sessions
- Enable checkpointing (save_model_interval: 100000)

25.5 5. Missing Dependencies

Corrective Action:

pip install -r requirements.txt
pip install -r env_requirements.txt

26 Training Instructions

26.0.1 Start Python Training

python src/main.py --config=qmix_warehouse_improved --env-config=unity_warehouse with t_max=500000

26.0.2 Start Unity Simulation

Open the Warehouse scene
Press Play

26.0.3 Monitor Logs

Example:

return_mean: 25.42
epsilon: 0.95

27 Configuration Documentation

Configuration file:

epymarl/src/config/algs/qmix_warehouse_improved.yaml

27.0.1 Hyperparameters

lr: 0.001
batch_size: 16
buffer_size: 5000
target_update_interval: 200

27.0.2 Exploration

epsilon_start: 1.0
epsilon_finish: 0.1
epsilon_anneal_time: 200000

27.0.3 Neural Architecture

agent: "rnn"
rnn_hidden_dim: 64
mixer: "qmix"
mixing_embed_dim: 32
hypernet_layers: 2
hypernet_embed: 64

28 Environment Documentation

28.0.1 Action Space

0 — Turn Left
1 — Turn Right
2 — Move Forward
3 — Load/Unload Shelf
4 — No-op

28.0.2 Reward Structure

Positive reward for shelf delivery
Collision penalty
Small step penalty

29 Training Results (Week 5)

29.0.1 Training Run #93

Metric	Value
Final Training Return	207.96
Final Test Return	49.29
Steps Completed	350,199 / 500,000
Final Q-value	2.398
Final Epsilon	0.10

29.0.2 Learning Curve

Steps	Return	Test Return	Epsilon
10k	13.6	0.03	0.95
100k	50.6	0.05	0.55
200k	156.8	0.03	0.10
300k	228.4	0.08	0.10
350k	207.96	49.29	0.10

30 Critical Observations

Agents perform well under exploration (ε = 0.1–1.0).
Behavior collapses under pure greedy policy execution (ε = 0).
The large exploration–evaluation gap indicates reliance on random behavior, not policy learning.

31 Hypothesized Causes

Sparse rewards
Insufficient observations
QMIX monotonic mixing limitation
Early epsilon annealing
High environmental complexity

32 Future Directions

Reward shaping
Curriculum learning
Expanded observation space
Curiosity-driven exploration
Alternative MARL algorithms (VDN, Weighted QMIX)

33 Reflection (Week 5)

Week 5 focused on documenting the MARL-QMIX-Warehouse-Robots codebase and analyzing system behavior during extended training. The documented divergence between training and evaluation performance highlights the difficulty of value-based coordination in sparse-reward, multi-agent environments. The code documentation presented here clarifies the structure of the repository, configuration files, installation procedures, dependency management, and operational workflow required to run the MARL system. It also identifies key limitations and areas that require modification—such as reward design, observation space, and exploration strategy—to enable successful policy learning in future iterations.

34 References

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML.
Unity Technologies. (2023). Unity ML-Agents Toolkit Documentation.
Samvelyan, M. et al. (2019). The StarCraft Multi-Agent Challenge. AAMAS.
Vinyals, O. et al. (2019). Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning. Nature.
EPyMARL Documentation (2025). https://github.com/uoe-agents/epymarl