Invoke profile function in Nsight

When using Nsight to develop CUDA program, you can use profile function to profile the program:

1

You can also toggle the C/C++ and profile view in the right corner:

2

BTW, if you only want to profile a part of the program (not the whole), you can usecudaProfilerStart() and cudaProfilerStop to surround the code, then untick “Start execution with profiling enabled” in “Profile Configuration“:

Import existing CUDA project into Nsight

The steps to import an existing CUDA project (who uses CMake) into Nsight are as following:

(1) Select File -> New -> CUDA C/C++ Project:

1

Untick “Use default location“, and select the root directory of your project.

(2) Change Build location in Properties to points to the Makefile position.

2

(3) After building successfully, right click project: Run As -> Local C/C++ Application, then select which binary you want to execute.

References:
Setting Nsight to run with existing Makefile project;
How to create Eclipse project from CMake project;
How to change make location in Eclipse.

 

Don’t use “-G” compile option for profiling CUDA programs

I use Nsight as an IDE to develop CUDA programs:

capture

Use nvprof to measure the load efficiency and store efficiency of accessing global memory:

$ nvprof --devices 2 --metrics gld_efficiency,gst_efficiency ./cuHE_opt

................... CRT polynomial Terminated ...................

==1443== Profiling application: ./cuHE_opt
==1443== Profiling result:
==1443== Metric result:
Invocations   Metric NameMetric Description Min Max Avg
Device "Tesla K80 (2)"
Kernel: gpu_cuHE_crt(unsigned int*, unsigned int*, int, int, int, int)
  1gld_efficiency Global Memory Load Efficiency  62.50%  62.50%  62.50%
  1gst_efficiencyGlobal Memory Store Efficiency 100.00% 100.00% 100.00%
Kernel: gpu_crt(unsigned int*, unsigned int*, int, int, int, int)
  1gld_efficiency Global Memory Load Efficiency  39.77%  39.77%  39.77%
  1gst_efficiencyGlobal Memory Store Efficiency 100.00% 100.00% 100.00%

But if I use nvcc to compile the program directly:

 nvcc -arch=sm_37 cuHE_opt.cu  -o cuHE_opt

The nvprof displays the different measuring results:

$ nvprof --devices 2 --metrics gld_efficiency,gst_efficiency ./cuHE_opt
......
................... CRT polynomial Terminated ...................

==1801== Profiling application: ./cuHE_opt
==1801== Profiling result:
==1801== Metric result:
Invocations   Metric NameMetric Description Min Max Avg
Device "Tesla K80 (2)"
Kernel: gpu_cuHE_crt(unsigned int*, unsigned int*, int, int, int, int)
  1gld_efficiency Global Memory Load Efficiency 100.00% 100.00% 100.00%
  1gst_efficiencyGlobal Memory Store Efficiency 100.00% 100.00% 100.00%
Kernel: gpu_crt(unsigned int*, unsigned int*, int, int, int, int)
  1gld_efficiency Global Memory Load Efficiency  50.00%  50.00%  50.00%
  1gst_efficiencyGlobal Memory Store Efficiency 100.00% 100.00% 100.00%

After some investigations, the reason is using -G compile option in the first case. As the document of nvcc has mentioned:

--device-debug (-G)
    Generate debug information for device code. Turns off all optimizations.
    Don't use for profiling; use -lineinfo instead.

So don’t use -G compile option for profiling CUDA programs.

Enable C++11 support for NVCC compiler in Nsight

When using Nsight as an IDE to develop CUDA programs, sometimes, the program may require C++11 support, otherwise errors like this will occur:

/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/include/c++/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
make: *** [src/subdir.mk:20: src/cuHE_opt.o] Error 1

To enable C++11 support, you need to do following configurations:
(1) Right-click the project, and select the last item: Properities.

1

(2) Check Settings->Tool Settings->Code Generation->Enable C++11 support (-std=c++11).

2