12 Week 3 Deliverable – Scaling Analysis & QMIX Focus Decision

Author

Salmon Riaz

Published

November 5, 2025

13 Week 3 – Scaling Analysis & Strategic QMIX Focus

13.1 1. Weekly Objectives

This week’s activities focused on:

Compiling scaling analysis results from team members
Documenting the strategic decision to focus on QMIX
Updating research paper methodology sections
Analyzing algorithm performance across different configurations

13.2 2. Strategic Decision: QMIX Focus

13.2.1 2.1 Decision Rationale

After two weeks of parallel algorithm development, the team made a strategic decision to focus primarily on QMIX for the remainder of the project. This decision was based on:

Strongest Empirical Performance: QMIX achieved the highest test returns on RWARE
Value Decomposition Advantages: Mixing network effectively handles credit assignment
Scalability Potential: Architecture designed for varying agent counts
Industry Relevance: Value-based methods common in warehouse robotics

13.2.2 2.2 Algorithm Comparison Summary

Algorithm	RWARE Performance	Tuning Difficulty	Scalability
QMIX	Best (3.25 mean)	High	Excellent
IPPO-LSTM	Good (2.8 mean)	Moderate	Good
MASAC	Moderate	Moderate	Good

13.2.3 2.3 Supporting Research on Algorithm Selection

QMIX’s suitability for warehouse robotics aligns with literature findings:

Value decomposition methods like QMIX excel in fully cooperative settings where credit assignment is challenging. The monotonicity constraint ensures that individual agent improvements translate to team-level gains—a critical property for warehouse coordination where all robots must work toward shared objectives.

13.3 3. Scaling Analysis Results

13.3.1 3.1 Agent Count Scaling

Compiled results from scaling experiments:

Configuration	Agents	Test Return	Training Steps
tiny-2ag-v2	2	3.25	20M
small-4ag-v1	4	2.10	30M
medium-6ag-v1	6	1.45	40M

Observations:

Performance degrades with increased agent count (expected)
Training time increases super-linearly with agents
Coordination complexity grows combinatorially

13.3.2 3.2 Environment Size Scaling

Environment	Grid Size	Agents	Difficulty
tiny	5×10	2	Low
small	7×15	4	Medium
medium	10×20	6	High
hard	15×30	8+	Very High

13.3.3 3.3 Scaling Insights for Research Paper

Added to Discussion section:

Our scaling analysis reveals significant challenges in applying MARL algorithms to larger warehouse configurations. While QMIX demonstrates robust performance on small-scale problems, the curse of dimensionality becomes apparent as agent counts increase. The joint action space grows exponentially, and the mixing network must learn increasingly complex relationships between individual Q-values and global performance.

13.4 4. Unity Integration Planning

13.4.1 4.1 Integration Goals

Documented requirements for Unity ML-Agents integration:

Visual Fidelity: Realistic warehouse environment rendering
Agent Control: Python-Unity communication via ML-Agents
Observation Compatibility: Match RWARE observation structure
Action Mapping: Discrete actions to Unity robot controllers

13.4.2 4.2 Technical Requirements

Unity Integration Stack:
├── Unity 2021.3+ (LTS)
├── ML-Agents 0.30.0
├── Python mlagents-envs
├── Custom Warehouse Scene
└── Robot Prefabs with Controllers

13.4.3 4.3 Price’s Unity Environment Development

Documented Price’s work on Unity warehouse environment:

Custom warehouse scene with shelf prefabs
Robot agent prefabs with physics-based movement
Reward system mirroring RWARE logic
Observation sensors matching grid-based perception

13.5 5. Research Paper Methodology Updates

13.5.1 5.1 Algorithm Focus Justification

Added methodology section explaining QMIX focus:

Algorithm Selection Rationale: Following preliminary experiments across IPPO-LSTM, MASAC, and QMIX, we narrow our focus to QMIX for in-depth analysis. This decision reflects QMIX’s superior performance on our target environment and its theoretical advantages for fully cooperative multi-agent settings. The value decomposition paradigm directly addresses credit assignment—a fundamental challenge in multi-robot coordination.

13.5.2 5.2 Experimental Design Updates

Revised experimental protocol to emphasize:

Extended QMIX training configurations
Ablation studies on mixing network components
Scalability experiments across warehouse sizes
Comparative analysis with baseline approaches

13.6 6. Hard RWARE Environment Analysis

13.6.1 6.1 Environment Characteristics

Documented the “hard” RWARE variant:

Parameter	Standard	Hard
Grid Size	5×10	15×30
Shelf Density	Low	High
Navigation Complexity	Simple	Complex
Coordination Requirement	Moderate	Intensive

13.6.2 6.2 Hard Environment Challenges

Hard RWARE Challenges:
├── Narrow Corridors: Agents must coordinate passing
├── Distant Goals: Longer paths increase failure risk
├── Dense Shelves: Limited maneuvering space
├── Blocking: Agents can obstruct each other
└── Sparse Rewards: Success requires many coordinated steps

13.7 7. Literature Review Expansion

13.7.1 7.1 Value Decomposition Methods

Expanded literature review on QMIX-related work:

VDN (Sunehag et al., 2018): Additive value decomposition
QMIX (Rashid et al., 2018): Monotonic mixing with hypernetworks
QPLEX (Wang et al., 2020): Duplex dueling decomposition
Weighted QMIX (Rashid et al., 2020): Importance weighting extension

13.7.2 7.2 Warehouse Robotics Applications

Added application-focused literature:

Amazon Robotics: Kiva systems and coordination challenges
Multi-Robot Path Planning: Classical vs learning approaches
Decentralized Warehouse Control: Real-world deployment constraints

13.8 8. Updated Performance Metrics

13.8.1 8.1 Comprehensive QMIX Results

Run	Configuration	Timesteps	Test Return	Status
1	Default	2M	0.00	Failed
2	Modified	10M	0.00	Failed
3	Optimized	20M	3.25	Success
4	Hard env	30M	1.82	Partial

13.8.2 8.2 Training Efficiency Analysis

Configuration	Time to First Reward	Time to Convergence
Default	Never	Never
Optimized	~5M steps	~15M steps
Hard	~10M steps	~25M steps

13.9 9. Week 4 Preview

Upcoming activities:

Receive extended training results from team
Compile comprehensive performance comparisons
Update research paper results sections
Prepare for second class presentation

13.10 10. Deliverables Summary

Deliverable	Status
Scaling analysis compilation	Complete
QMIX focus documentation	Complete
Research paper methodology update	Complete
Unity integration planning	Complete
Literature review expansion	Complete

13.11 11. Research Paper Excerpt: Scaling Discussion

Added to paper:

Scaling Analysis: Our experiments reveal that QMIX’s performance on RWARE is sensitive to warehouse configuration. On the tiny-2ag-v2 environment (2 agents, 5×10 grid), optimized hyperparameters achieve consistent task completion. However, scaling to 4+ agents introduces coordination challenges that extend training requirements by 50-100%. The mixing network’s capacity to represent complex agent interactions becomes a limiting factor, suggesting that architectural innovations may be necessary for warehouse-scale deployments.

These findings align with theoretical expectations—the joint action space grows as |A|^n where |A| is the action space size and n is the agent count. For 5 discrete actions and 6 agents, this yields 15,625 joint actions, making exhaustive exploration infeasible. QMIX’s implicit coordination through the mixing network provides a tractable approximation, but convergence guarantees weaken as complexity increases.

13.12 12. References

Rashid, T., et al. (2018). QMIX: Monotonic Value Function Factorisation. ICML.
Wang, J., et al. (2020). QPLEX: Duplex Dueling Multi-Agent Q-Learning. ICLR.
Sunehag, P., et al. (2018). Value-Decomposition Networks for Cooperative Multi-Agent Learning. AAMAS.
Unity ML-Agents: https://github.com/Unity-Technologies/ml-agents
RWARE Scaling: https://github.com/Farama-Foundation/RWARE