CUDA编程笔记(8)——CUDA kernel

这篇笔记摘自Professional CUDA C Programming

A CUDA kernel call is a direct extension to the C function syntax that adds a kernel’s execution confguration inside triple-angle-brackets:
kernel_name <<<grid, block>>>(argument list);
As explained in the previous section, the CUDA programming model exposes the thread hierarchy. With the execution configuration, you can specify how the threads will be scheduled to run on the GPU. The first value in the execution configuration is the grid dimension, the number of blocks to launch. The second value is the block dimension, the number of threads within each block. By specifying the grid and block dimensions, you configure:
➤ The total number of threads for a kernel
➤ The layout of the threads you want to employ for a kernel

kernel_name <<<grid, block>>>(argument list);中,grid参数指定block数量,而block参数指定每个blockthread数量,二者之积就是grid一共拥有的thread数量。

Unlike a C function call, all CUDA kernel launches are asynchronous. Control returns to the CPU immediately after the CUDA kernel is invoked.

A kernel function is the code to be executed on the device side. In a kernel function, you define the computation for a single thread, and the data access for that thread. When the kernel is called, many different CUDA threads perform the same computation in parallel.

The following restrictions apply for all kernels:
➤ Access to device memory only
➤ Must have void return type
➤ No support for a variable number of arguments
➤ No support for static variables
➤ No support for function pointers
➤ Exhibit an asynchronous behavior


电子邮件地址不会被公开。 必填项已用*标注