A unit of 32 GPU threads.

A warp executes on a Single Instruction, Multiple Thread (SIMT) unit that issues a single instruction from the warp every cycle. One or more of the 32 SIMT threads in a warp that have the same program counter as the issued instruction are executed in parallel. Divergent threads in a warp serialize execution.

Jacob, Arpith Chacko, et al. “Efficient fork-join on GPUs through warp specialization.” 2017 IEEE 24th International Conference on High Performance Computing (HiPC). IEEE, 2017.