Porting CUDA program from ArchLinux to Ubuntu 16.04

Today I ported a CUDA project from Arch Linux to Ubuntu 16.04, and this post records the pitfalls.

(0) Update cmake to newest version (follow this);

(1) Specify nvcc path in CMakeLists.txt:

SET(CMAKE_CUDA_COMPILER /usr/local/cuda-9.0/bin/nvcc)

otherwise, following error may generate:

No CMAKE_CUDA_COMPILER could be found.

(2) Since Ubuntu 16.04‘s default compiler is still gcc-5, install gcc-6 first, then pass gcc-6 as default compiler for nvcc:

SET(CMAKE_CUDA_FLAGS "-std=c++11 -ccbin gcc-6")

(3) Execute cmake command:


Not Locating CUDA Compiler;
CMake: How to pass mode dependent compile flags to nvcc in visual studio environment;
Tensorflow crashes on build on Ubuntu 16.04 when building for skylake (avx512).


CUDA P2P is not guaranteed to be faster than staged through the host

Today, I write a simple test to verify whether CUDA Peer-to-Peer Memory Copy is always faster than using CPU to transfer. At least from my platform, it is not:

(1) Disable P2P, you can see CPU utilization ratio is very high: 86.7%, and the bandwidth is nearly 10.67GB/s:

(2) Enable P2P, CPU utilization drops down to 1.3% only, and the bandwidth is about 1.6GB/s fall behind: 9.00GB/s:

The test file is here.


Import existing CUDA project into Nsight

The steps to import an existing CUDA project (who uses CMake) into Nsight are as following:

(1) Select File -> New -> CUDA C/C++ Project:


Untick “Use default location“, and select the root directory of your project.

(2) Change Build location in Properties to points to the Makefile position.


(3) After building successfully, right click project: Run As -> Local C/C++ Application, then select which binary you want to execute.

Setting Nsight to run with existing Makefile project;
How to create Eclipse project from CMake project;
How to change make location in Eclipse.