10 RL: Multi-Agent Warehouse Robots

Deliverable 3 – Multi-Agent Soft Actor-Critic (MASAC) on Robotic Warehouse (RWARE) environment

Author

Lian Thang

Published

December 3, 2025

11 Intro

Deliverable 3 explore cpu and gpu run time, cpu parallel and gpu parallel run time. In Titan the default partition use was \(#SBATCH --partition=sb\). SB was a 16 CPU, 128 GB memory, and 280 nodes. But it didn’t have GPU.

While research I came to the conculsion to switch to GPU.

Using

sinfo -o "%P %G %c %m %D %N"

I was able to see the partition Titan have.

PARTITION GRES CPUS MEMORY NODES NODELIST
bw* (null) 40+ 128000 38 c[001-034,315-318]
il (null) 48 256000 1 c319
sb (null) 16 128000 280 c[035-314]
t4 gpu:t4:4 64 256000 1 g000
gpu gpu:a30:4 64 512000 8 g[001-008]
osg (null) 16+ 128000 318 c[001-318]

Because GPUs were available and greatly accelerate batch forward/backward passes for neural networks, I switched to a GPU partition for the final experiments.

12 Parallelization

I opt for a synchronous parallel using internal synchronization wrapper SyncVectorEnv. This would let all environment steps be grouped and run, and the main thread waits for the slowest environment to finish before proceeding to the next step. This would in theroy dramatically increases the throughput of the training data, allowing it to fill the replay buffer much faster.

13 Test

Test Case	Episodes	num_envs	Focus
Sequential CPU	50	1	CPU overhead, single-instance processing
Parallel CPU	50	8	Parallelization Gain (CPU cores used efficiently)
Sequential GPU	50	1	GPU Acceleration (single forward pass speed)
Parallel GPU	50	8	Final Optimized Speed (best-case scenario)

13.1 Measured times (from runs)

Test Case	Episodes	num_envs	Time
Sequential CPU	50	1	1:02:39
Parallel CPU	50	8	09:49
Sequential GPU	50	1	06:43
Parallel GPU	50	8	01:06

13.2 Summary

GPU acceleration + parallel environments provide dramatic reductions in wall-clock time for MASAC on RWARE. For the measured runs: sequential CPU (50 ep) took 01:02:39 (3759 s), while parallel GPU (50 ep, 8 envs) took 01:06 (66 s), which is a ~57× reduction in wall-clock time.

13.2.1 References

https://pmc.ncbi.nlm.nih.gov/articles/PMC11059992/
https://github.com/JohannesAck/tf2multiagentrl
https://github.com/semitable/robotic-warehouse
OpenAI. (2024). ChatGPT (Oct 23 version) [Large language model]. https://chat.openai.com/chat