Is the warp size always 32 in CUDA?

Last week, I began to read the awesome Professional CUDA C Programming, and bumped into the following words in GPU Architecture Overview section:

CUDA employs a Single Instruction Multiple Thread (SIMT) architecture to manage and execute threads in groups of 32 called warps.

Since this book is published in 2014, I just wonder whether the warp size is still 32 in CUDA no matter the different Compute Capability is. To figure out it, I turn to the official CUDA C Programming Guide, and get the answer from Compute Capability table:

capture

Yep, for all Compute Capabilities, the warp size is always 32.

BTW, you can also use following program to determine the warp size value:

#include <stdio.h>

int main(void) {
        cudaDeviceProp deviceProp;
        if (cudaSuccess != cudaGetDeviceProperties(&deviceProp, 0)) {
                printf("Get device properties failed.\n");
                return 1;
        } else {
                printf("The warp size is %d.\n", deviceProp.warpSize);
                return 0;
        }
}

The running result in my CUDA box is here:

The warp size is 32.