13  Week 4 Deliverable

Author

Price Allman

13.1 Project Overview

Training Multi-Agent Warehouse Robots using Reinforcement Learning to maneuver a warehouse environment and help retrieve packages for shipment.

13.2 Week 4 Accomplishments

13.2.1 Environment Development

  • Enhanced Unity Warehouse Environment: Upgraded the simulation to provide a more realistic and professional multi-agent coordination platform.

  • Package-Readiness Queue System: Implemented a visual queue system where blue-highlighted packages indicate items ready for pickup and delivery.

13.2.2 Algorithm Integration

  • QMIX Implementation: Successfully adapted and integrated the QMIX algorithm into the Unity environment to support end-to-end learning for multi-agent package retrieval and delivery tasks.

  • Hyperparameter Transfer: Leveraged previously successful hyperparameters from RWARE experiments to accelerate convergence and improve training performance.

13.3 Training Results

Initiated an extensive 1,000,000-timestep QMIX training session:

  • Training Duration: Interrupted at approximately 530,000 timesteps (~8.5 hours of runtime) due to system crash
  • Performance Improvements:
    • Test return improved from 0 to 0.19
    • Training loss decreased by approximately 97.5%
  • Behavioral Progress: Agents demonstrated emerging competence in:
    • Identifying ready packages
    • Picking up packages
    • Successfully delivering packages to designated locations

13.4 Next Steps

  • Complete the full 1,000,000-timestep training run
  • Record and present video demonstration of learned multi-agent coordination behaviors
  • Finalize research paper documenting methodology, experiments, and contributions