CUDA编程笔记(1)——Parallelism

这篇笔记摘自Professional CUDA C Programming

There are two fundamental types of parallelism in applications:
➤ Task parallelism
➤ Data parallelism
Task parallelism arises when there are many tasks or functions that can be operated independently and largely in parallel. Task parallelism focuses on distributing functions across multiple cores.

Data parallelism arises when there are many data items that can be operated on at the same time. Data parallelism focuses on distributing the data across multiple cores.

CUDA programming is especially well-suited to address problems that can be expressed as data parallel computations. Many applications that process large data sets can use a data-parallel model to speed up the computations. Data-parallel processing maps data elements to parallel threads.

There are two basic approaches to partitioning data:
➤ Block: Each thread takes one portion of the data, usually an equal portion of the data.
➤ Cyclic: Each thread takes more than one portion of the data.

简而言之,block就是按线程数等分数据,10个线程就把数据分成10份,一个线程处理一份;而cyclic则是数据的份数大于线程数,举个例子,10个线程把数据分成20份,第一个线程处理第111份,第二个线程处理第212份。。。。。。,循环处理多次。

什么是CUDA?

从这篇文章介绍了什么是CUDA

CUDA® is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

CUDANVIDIA提供的一个并行计算平台和模型,可以让程序更好地利用GPU。下面这段话则很好地解释了什么是“GPU computing”:

Using high-level languages, GPU-accelerated applications run the sequential part of their workload on the CPU – which is optimized for single-threaded performance – while accelerating parallel processing on the GPU. This is called “GPU computing.”

CUDA Zone展示了CUDA提供的产品,NVML也包含在其中。

GNU Parallel简介

GNU parallel可以并行地执行shell命令。看一个简单的例子:

# ls
a  b  c
# cat a
aaaa

# cat b
bbbb

# cat c
cccc

# parallel cat ::: *
aaaa

bbbb

cccc

上面的例子输出了abc3个文件的内容。:::告诉GNU parallel从命令行而不是stdin读取参数,而shell会把*扩展成当前目录下的文件名。

再看一个例子:

# parallel sleep {}\; echo {} ::: 2 1 4 3
1
2
3
4
# parallel -k sleep {}\; echo {} ::: 2 1 4 3
2
1
4
3

正常情况下,完成一个jobparallel就会把这个job的内容输出。-k选项保证输出顺序和输入顺序一致。而{}会替换成input line

GNU parallel不会把多个输出的内容混杂在一起,对比下列两个命令输出:

# traceroute foss.org.my & traceroute debian.org & traceroute freenetproject.org & wait
[1] 4920
[2] 4921
[3] 4922
traceroute to debian.org (149.20.20.20), 30 hops max, 60 byte packets
traceroute to freenetproject.org (80.68.94.117), 30 hops max, 60 byte packets
foss.org.my: Name or service not known
Cannot handle "host" cmdline arg `foss.org.my' on position 1 (argc 1)
[1]   Exit 2                  traceroute foss.org.my
 1  16.187.248.2 (16.187.248.2)  1.705 ms  1.709 ms  2.067 ms
 1  16.187.248.2 (16.187.248.2)  1.315 ms  1.314 ms  1.618 ms
......

# parallel traceroute ::: foss.org.my debian.org freenetproject.org
foss.org.my: Name or service not known
Cannot handle "host" cmdline arg `foss.org.my' on position 1 (argc 1)
traceroute to debian.org (140.211.15.34), 30 hops max, 60 byte packets
 1  16.187.248.2 (16.187.248.2)  1.871 ms  1.857 ms  2.151 ms
......
traceroute to freenetproject.org (80.68.94.117), 30 hops max, 60 byte packets
 1  16.187.248.2 (16.187.248.2)  2.175 ms  2.471 ms  2.471 ms
 2  16.160.221.81 (16.160.221.81)  0.456 ms  0.463 ms  0.463 ms
......

参考资料:
GNU Parallel: The Command-Line Power Tool
GNU Parallel manual