CUDA编程笔记(16)——Shared Memory

Posted by nanxiao on 一月 3, 2017 in CUDA编程笔记 |

这篇笔记摘自Professional CUDA C Programming

Global memory is large, on-board memory and is characterized by relatively high latencies. Shared memory is smaller, low-latency on-chip memory that offers much higher bandwidth than global memory. You can think of it as a program-managed cache. Shared memory is generally useful as:
➤ An intra-block thread communication channel
➤ A program-managed cache for global memory data
➤ Scratch pad memory for transforming data to improve global memory access patterns

Shared memory is partitioned among all resident thread blocks on an SM; therefore, shared memory is a critical resource that limits device parallelism. The more shared memory used by a kernel, the fewer possible concurrently active thread blocks.




电子邮件地址不会被公开。 必填项已用*标注

Copyright © 2013-2017 我的站点 All rights reserved.
This site is using the Multi Child-Theme, v2.2, on top of
the Parent-Theme Desk Mess Mirrored, v2.5, from BuyNowShop.com