In general, the host cannot directly access device variables, and the device cannot directly access host variables. There is one exception to this rule: zero-copy memory. Both the host and device can access zero-copy memory.
GPU threads can directly access zero-copy memory. There are several advantages to using zero-copy memory in CUDA kernels, such as:
➤ Leveraging host memory when there is insufficient device memory
➤ Avoiding explicit data transfer between the host and device
➤ Improving PCIe transfer rates
When using zero-copy memory to share data between the host and device, you must synchronize memory accesses across the host and device. Modifying data in zero-copy memory from both the host and device at the same time will result in undefned behavior.
There are two common categories of heterogeneous computing system architectures: Integrated and discrete.
In integrated architectures, CPUs and GPUs are fused onto a single die and physically share main memory. In this architecture, zero-copy memory is more likely to benefit both performance and programmability because no copies over the PCIe bus are necessary.
For discrete systems with devices connected to the host via PCIe bus, zero-copy memory is advantageous only in special cases.
Because the mapped pinned memory is shared between the host and device, you must synchronize memory accesses to avoid any potential data hazards caused by multiple threads accessing the same memory location without synchronization.
Be careful to not overuse zero-copy memory. Device kernels that read from zero-copy memory can be very slow due to its high-latency.