CudaRL

08-02-2026

I was working with RL and implementing each algorithm of it from Q-learning to PPO, thats where i felt gap that pytorch like library should exist for RL where i just need to type out PPO() and all the background complexity implemented automatically and top on that it need to have Cuda accleration too, so with GPU the compute intensive algorithms can be run faster. For that i started to work on CudaRL.

First question i need to answer is that for which part we can utilize parallization of GPU computation. So i wrote GAE(Generalized Advantage Estimation) cuda Kernel first.

The reason this kernel make even single environment PPO faster is because advantage calculation scales linealy with rollout length T. Where Cuda-GAE kernel this computation by replacing python loop with single GPU kernel. for multiple epoch PPO this becomes even faster and much more sample efficient. as computing advantage function happens parallel for all environment.

CudaRL includes all policy gradient algorithms such as Reinforce, Reinforce with baseline, A2C, PPO, and GRPO.

I will iterate over CudaRl library as i have interesting things to add.

Faster RL training for all !