这篇笔记摘自Professional CUDA C Programming

There are two fundamental types of parallelism in applications:
➤ Task parallelism
➤ Data parallelism
Task parallelism arises when there are many tasks or functions that can be operated independently and largely in parallel. Task parallelism focuses on distributing functions across multiple cores.

Data parallelism arises when there are many data items that can be operated on at the same time. Data parallelism focuses on distributing the data across multiple cores.

CUDA programming is especially well-suited to address problems that can be expressed as data parallel computations. Many applications that process large data sets can use a data-parallel model to speed up the computations. Data-parallel processing maps data elements to parallel threads.

There are two basic approaches to partitioning data:
➤ Block: Each thread takes one portion of the data, usually an equal portion of the data.
➤ Cyclic: Each thread takes more than one portion of the data.

简而言之,block就是按线程数等分数据,10个线程就把数据分成10份,一个线程处理一份;而cyclic则是数据的份数大于线程数,举个例子,10个线程把数据分成20份,第一个线程处理第111份,第二个线程处理第212份。。。。。。,循环处理多次。