12 Week 3 Deliverable – Scaling Analysis & QMIX Focus Decision
13 Week 3 – Scaling Analysis & Strategic QMIX Focus
13.1 1. Weekly Objectives
This week’s activities focused on:
- Compiling scaling analysis results from team members
- Documenting the strategic decision to focus on QMIX
- Updating research paper methodology sections
- Analyzing algorithm performance across different configurations
13.2 2. Strategic Decision: QMIX Focus
13.2.1 2.1 Decision Rationale
After two weeks of parallel algorithm development, the team made a strategic decision to focus primarily on QMIX for the remainder of the project. This decision was based on:
- Strongest Empirical Performance: QMIX achieved the highest test returns on RWARE
- Value Decomposition Advantages: Mixing network effectively handles credit assignment
- Scalability Potential: Architecture designed for varying agent counts
- Industry Relevance: Value-based methods common in warehouse robotics
13.2.2 2.2 Algorithm Comparison Summary
| Algorithm | RWARE Performance | Tuning Difficulty | Scalability |
|---|---|---|---|
| QMIX | Best (3.25 mean) | High | Excellent |
| IPPO-LSTM | Good (2.8 mean) | Moderate | Good |
| MASAC | Moderate | Moderate | Good |
13.2.3 2.3 Supporting Research on Algorithm Selection
QMIX’s suitability for warehouse robotics aligns with literature findings:
Value decomposition methods like QMIX excel in fully cooperative settings where credit assignment is challenging. The monotonicity constraint ensures that individual agent improvements translate to team-level gains—a critical property for warehouse coordination where all robots must work toward shared objectives.
13.3 3. Scaling Analysis Results
13.3.1 3.1 Agent Count Scaling
Compiled results from scaling experiments:
| Configuration | Agents | Test Return | Training Steps |
|---|---|---|---|
| tiny-2ag-v2 | 2 | 3.25 | 20M |
| small-4ag-v1 | 4 | 2.10 | 30M |
| medium-6ag-v1 | 6 | 1.45 | 40M |
Observations:
- Performance degrades with increased agent count (expected)
- Training time increases super-linearly with agents
- Coordination complexity grows combinatorially
13.3.2 3.2 Environment Size Scaling
| Environment | Grid Size | Agents | Difficulty |
|---|---|---|---|
| tiny | 5×10 | 2 | Low |
| small | 7×15 | 4 | Medium |
| medium | 10×20 | 6 | High |
| hard | 15×30 | 8+ | Very High |
13.3.3 3.3 Scaling Insights for Research Paper
Added to Discussion section:
Our scaling analysis reveals significant challenges in applying MARL algorithms to larger warehouse configurations. While QMIX demonstrates robust performance on small-scale problems, the curse of dimensionality becomes apparent as agent counts increase. The joint action space grows exponentially, and the mixing network must learn increasingly complex relationships between individual Q-values and global performance.
13.4 4. Unity Integration Planning
13.4.1 4.1 Integration Goals
Documented requirements for Unity ML-Agents integration:
- Visual Fidelity: Realistic warehouse environment rendering
- Agent Control: Python-Unity communication via ML-Agents
- Observation Compatibility: Match RWARE observation structure
- Action Mapping: Discrete actions to Unity robot controllers
13.4.2 4.2 Technical Requirements
Unity Integration Stack:
├── Unity 2021.3+ (LTS)
├── ML-Agents 0.30.0
├── Python mlagents-envs
├── Custom Warehouse Scene
└── Robot Prefabs with Controllers
13.4.3 4.3 Price’s Unity Environment Development
Documented Price’s work on Unity warehouse environment:
- Custom warehouse scene with shelf prefabs
- Robot agent prefabs with physics-based movement
- Reward system mirroring RWARE logic
- Observation sensors matching grid-based perception
13.5 5. Research Paper Methodology Updates
13.5.1 5.1 Algorithm Focus Justification
Added methodology section explaining QMIX focus:
Algorithm Selection Rationale: Following preliminary experiments across IPPO-LSTM, MASAC, and QMIX, we narrow our focus to QMIX for in-depth analysis. This decision reflects QMIX’s superior performance on our target environment and its theoretical advantages for fully cooperative multi-agent settings. The value decomposition paradigm directly addresses credit assignment—a fundamental challenge in multi-robot coordination.
13.5.2 5.2 Experimental Design Updates
Revised experimental protocol to emphasize:
- Extended QMIX training configurations
- Ablation studies on mixing network components
- Scalability experiments across warehouse sizes
- Comparative analysis with baseline approaches
13.6 6. Hard RWARE Environment Analysis
13.6.1 6.1 Environment Characteristics
Documented the “hard” RWARE variant:
| Parameter | Standard | Hard |
|---|---|---|
| Grid Size | 5×10 | 15×30 |
| Shelf Density | Low | High |
| Navigation Complexity | Simple | Complex |
| Coordination Requirement | Moderate | Intensive |
13.6.2 6.2 Hard Environment Challenges
Hard RWARE Challenges:
├── Narrow Corridors: Agents must coordinate passing
├── Distant Goals: Longer paths increase failure risk
├── Dense Shelves: Limited maneuvering space
├── Blocking: Agents can obstruct each other
└── Sparse Rewards: Success requires many coordinated steps
13.7 7. Literature Review Expansion
13.7.1 7.1 Value Decomposition Methods
Expanded literature review on QMIX-related work:
- VDN (Sunehag et al., 2018): Additive value decomposition
- QMIX (Rashid et al., 2018): Monotonic mixing with hypernetworks
- QPLEX (Wang et al., 2020): Duplex dueling decomposition
- Weighted QMIX (Rashid et al., 2020): Importance weighting extension
13.7.2 7.2 Warehouse Robotics Applications
Added application-focused literature:
- Amazon Robotics: Kiva systems and coordination challenges
- Multi-Robot Path Planning: Classical vs learning approaches
- Decentralized Warehouse Control: Real-world deployment constraints
13.8 8. Updated Performance Metrics
13.8.1 8.1 Comprehensive QMIX Results
| Run | Configuration | Timesteps | Test Return | Status |
|---|---|---|---|---|
| 1 | Default | 2M | 0.00 | Failed |
| 2 | Modified | 10M | 0.00 | Failed |
| 3 | Optimized | 20M | 3.25 | Success |
| 4 | Hard env | 30M | 1.82 | Partial |
13.8.2 8.2 Training Efficiency Analysis
| Configuration | Time to First Reward | Time to Convergence |
|---|---|---|
| Default | Never | Never |
| Optimized | ~5M steps | ~15M steps |
| Hard | ~10M steps | ~25M steps |
13.9 9. Week 4 Preview
Upcoming activities:
- Receive extended training results from team
- Compile comprehensive performance comparisons
- Update research paper results sections
- Prepare for second class presentation
13.10 10. Deliverables Summary
| Deliverable | Status |
|---|---|
| Scaling analysis compilation | Complete |
| QMIX focus documentation | Complete |
| Research paper methodology update | Complete |
| Unity integration planning | Complete |
| Literature review expansion | Complete |
13.11 11. Research Paper Excerpt: Scaling Discussion
Added to paper:
Scaling Analysis: Our experiments reveal that QMIX’s performance on RWARE is sensitive to warehouse configuration. On the tiny-2ag-v2 environment (2 agents, 5×10 grid), optimized hyperparameters achieve consistent task completion. However, scaling to 4+ agents introduces coordination challenges that extend training requirements by 50-100%. The mixing network’s capacity to represent complex agent interactions becomes a limiting factor, suggesting that architectural innovations may be necessary for warehouse-scale deployments.
These findings align with theoretical expectations—the joint action space grows as |A|^n where |A| is the action space size and n is the agent count. For 5 discrete actions and 6 agents, this yields 15,625 joint actions, making exhaustive exploration infeasible. QMIX’s implicit coordination through the mixing network provides a tractable approximation, but convergence guarantees weaken as complexity increases.
13.12 12. References
Rashid, T., et al. (2018). QMIX: Monotonic Value Function Factorisation. ICML.
Wang, J., et al. (2020). QPLEX: Duplex Dueling Multi-Agent Q-Learning. ICLR.
Sunehag, P., et al. (2018). Value-Decomposition Networks for Cooperative Multi-Agent Learning. AAMAS.
Unity ML-Agents: https://github.com/Unity-Technologies/ml-agents
RWARE Scaling: https://github.com/Farama-Foundation/RWARE