A technique that can exploit nested parallelism on the GPU. Basic Block Neutering introduces additional control flow that causes unused GPU threads to bypass basic blocks in a serial region. However, even after optimizations, there is non-negligible communication overhead to the unused thread to inform them of the control flow decisions taken by the serial (master) GPU thread.

[0] Jacob, Arpith Chacko, et al. “Efficient Fork-Join on GPUs Through Warp Specialization.” High Performance Computing (HiPC), 2017 IEEE 24th International Conference on. IEEE, 2017.