4 Week 1 Deliverable – Research Paper Foundation
5 Week 1 – Compiling Research Foundations & Comparative Analysis
5.1 1. Role Overview: Documentation Expert
As the Documentation Expert for this MARL Warehouse Robotics project, my primary responsibilities include:
- Compiling and synthesizing findings from all team members
- Conducting comparative analysis across implemented algorithms
- Developing and maintaining the research paper throughout the project
- Creating presentation materials for class deliverables
- Coordinating final documentation and industry expert interviews
5.2 2. Team Structure & Algorithm Assignments
Our team consists of four members, each focusing on different MARL algorithms:
| Team Member | Role | Algorithm Focus |
|---|---|---|
| Price Allman | Team Lead | IPPO-LSTM |
| Lian Thang | Environment Specialist | MASAC |
| Dre Simmons | Training Specialist | QMIX |
| Salmon Riaz | Documentation Expert | Research Synthesis |
5.3 3. Week 1 Objectives Completed
5.3.1 3.1 Compiled Initial Findings from Team
Gathered setup documentation and initial training results from each team member:
- Price (IPPO-LSTM): Environment setup, PPO implementation details, LSTM integration approach
- Lian (MASAC): Multi-agent SAC architecture, entropy optimization considerations
- Dre (QMIX): Value decomposition framework, mixing network architecture
5.3.2 3.2 Comparative Analysis Framework
Established framework for comparing algorithms across key dimensions:
Comparison Dimensions:
├── Training Paradigm (on-policy vs off-policy)
├── Coordination Mechanism (independent vs centralized mixing)
├── Observation Handling (feedforward vs recurrent)
├── Sample Efficiency
└── Scalability to larger agent counts
5.3.3 3.3 Research Paper Foundation
Initiated the research paper structure with the following sections:
- Introduction – Problem motivation and contributions
- Related Work – MARL algorithms and cooperative environments
- Preliminaries – Mathematical formulation and CTDE paradigm
- Methods – IPPO-LSTM, MASAC, and QMIX descriptions
- Experiments and Results – Setup and performance metrics
- Discussion – Key findings and implications
- Conclusion – Summary and future directions
5.4 4. Algorithm Comparison Overview
5.4.1 4.1 IPPO-LSTM (Independent PPO with LSTM)
- Type: On-policy, actor-critic
- Coordination: Independent learning (no explicit coordination)
- Memory: LSTM handles partial observability
- Training: Proximal Policy Optimization with clipped objective
5.4.2 4.2 MASAC (Multi-Agent Soft Actor-Critic)
- Type: Off-policy, actor-critic
- Coordination: Centralized critic with decentralized actors
- Entropy: Maximum entropy RL for exploration
- Training: Soft Q-learning with experience replay
5.4.3 4.3 QMIX (Q-Mixing Network)
- Type: Off-policy, value-based
- Coordination: Centralized mixing of Q-values
- Constraint: Monotonicity ensures IGM condition
- Training: Value decomposition with mixing network
5.5 5. Environment Selection
5.5.1 5.1 MPE Simple Spread (Week 1-2)
Selected as initial benchmark due to:
- Well-established MARL benchmark
- Clear cooperative objective (agents cover landmarks)
- Dense reward signal facilitating learning
- Moderate complexity for algorithm validation
5.5.2 5.2 RWARE (Week 2+)
Planned transition to warehouse-specific environment:
- Grid-based warehouse simulation
- Sparse, delayed rewards
- Realistic multi-robot coordination challenges
- Scalable to different warehouse sizes
5.6 6. Research Paper Abstract Draft
This paper presents a progressive benchmarking study of Multi-Agent Reinforcement Learning (MARL) algorithms for cooperative warehouse robotics applications. We evaluate three prominent MARL algorithms—IPPO-LSTM, MASAC, and QMIX—on increasingly complex coordination tasks, beginning with the Multi-agent Particle Environment (MPE) simple spread scenario and progressing to the more challenging Robotic Warehouse Environment (RWARE). Our experiments examine algorithm performance across key dimensions including sample efficiency, coordination quality, and scalability. Through systematic comparison, we identify strengths and limitations of each approach for multi-robot warehouse automation, providing practical insights for algorithm selection in real-world deployment scenarios.
5.7 7. Key Literature Identified
Initial literature review compiled the following foundational works:
- QMIX: Rashid et al. (2018) – Monotonic value function factorisation
- IPPO: de Witt et al. (2020) – Independent PPO effectiveness
- MASAC: Haarnoja et al. (2018) – Soft actor-critic foundations
- RWARE: Christianos et al. (2021) – Warehouse benchmark
- EPyMARL: Papoudakis et al. (2021) – Multi-agent deep RL benchmarking
5.8 8. Week 2 Preview
Next week’s focus areas:
- Compile RWARE training results from team
- Update research paper with experimental findings
- Prepare class presentation materials
- Begin performance comparison tables
5.9 9. Deliverables Summary
| Deliverable | Status |
|---|---|
| Team findings compilation | Complete |
| Comparative analysis framework | Complete |
| Research paper outline | Complete |
| Abstract draft | Complete |
| Literature review initiation | Complete |
5.10 10. References
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. ICML.
Papoudakis, G., Christianos, F., Schäfer, L., & Albrecht, S.V. (2021). Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. NeurIPS.
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning. ICML.
PettingZoo MPE: https://pettingzoo.farama.org/environments/mpe/
RWARE: https://github.com/Farama-Foundation/RWARE