Don’t start with moon shots. by Thomas H. Davenport and Rajeev Ronanki In 2013, the MD Anderson Cancer Center launched a “moon shot” project: diagnose and recommend treatment plans for certain forms ...
In on-policy RL training (RLHF/GRPO/DAPO), the rollout phase dominates runtime, typically accounting for over 90% of total training time. Due to the highly variable response lengths across samples, ...