A technique that can exploit nested parallelism on the GPU. Basic Block Neutering introduces additional control flow that causes unused GPU threads to bypass basic blocks in a serial region. However, even after optimizations, there is non-negligible communication overhead to the unused thread to inform them of the control flow decisions taken by the serial (master) GPU thread.

