December, 2018 | Nan Xiao's Blog

Update keyring first if your Arch Linux is old enough

My Arch Linux is not updated for nearly 3 months. When running pacman -Syu, it prompts following errors:

$ sudo pacman -Syu
......
error: python-dnspython: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python-dnspython-1.16.0-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python-distlib: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python-distlib-0.2.8-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python-pytoml: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python-pytoml-0.1.20-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-distlib: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python2-distlib-0.2.8-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-dnspython: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python2-dnspython-1.16.0-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-pytoml: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/python2-pytoml-0.1.20-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: qbittorrent: signature from "Eli Schwartz <eschwartz@archlinux.org>" is unknown trust
:: File /var/cache/pacman/pkg/qbittorrent-4.1.5-1-x86_64.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: failed to commit transaction (invalid or corrupted package)
Errors occurred, no packages were upgraded.
......

The solution is updating archlinux-keyring first:

$ sudo pacman -S archlinux-keyring

Then all goes well!

Modifying memory pool helps me find a hidden bug

My project has a CUDA memory pool which uses C++‘s std::queue. Allocating from the head of queue:

ptr = q.front();
q.pop();

While freeing memory insert it into the tail of queue:

q.push(ptr);

I changed the implementation from std::queue to std::deque. Both allocating and freeing all occur in the front of queue:

ptr = q.front();
q.pop_front();
......
q.push_front(ptr);

This modification helps me find a hidden bug which is releasing memory early. In origin code, the memory is inserted at the end of queue. So there is a interval between it is reused by other threads and current thread, and the work can still be done correctly as long as it is not reused by others. But after using std::deque, the memory is immediately used by other threads, which disclose the bug.

Beware of synchronizing steam when using “default-stream per-thread” in CUDA

Yesterday, I refactored a project through adding”--default-stream per-thread” option to improve its performance. Unfortunately, program will crash in cudaMemcpy:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f570d3eb7f0 in ?? () from /usr/lib/libcuda.so.1
[Current thread is 1 (Thread 0x7f5620fa1700 (LWP 31206))]
(gdb) bt
#0  0x00007f570d3eb7f0 in ?? () from /usr/lib/libcuda.so.1
#1  0x00007f570d45ffef in ?? () from /usr/lib/libcuda.so.1
#2  0x00007f570d3bff90 in ?? () from /usr/lib/libcuda.so.1
#3  0x00007f570d3198d5 in ?? () from /usr/lib/libcuda.so.1
#4  0x00007f570d319da7 in ?? () from /usr/lib/libcuda.so.1
#5  0x00007f570d21d665 in ?? () from /usr/lib/libcuda.so.1
#6  0x00007f570d21de08 in ?? () from /usr/lib/libcuda.so.1
#7  0x00007f570d352455 in cuMemcpy_ptds () from /usr/lib/libcuda.so.1
#8  0x00007f570ee1b0f9 in cudart::driverHelper::memcpyDispatch(void*, void const*, unsigned long, cudaMemcpyKind, bool) ()
   from /home/xiaonan/DSI_cuRlib_v2.0/build/src/libtest.so
#9  0x00007f570ede70f9 in cudart::cudaApiMemcpy_ptds(void*, void const*, unsigned long, cudaMemcpyKind) () from /home/xiaonan/DSI_cuRlib_v2.0/build/src/libtest.so
#10 0x00007f570ee2772b in cudaMemcpy_ptds ()
   from /home/xiaonan/DSI_cuRlib_v2.0/build/src/libtest.so  
......

After reading GPU Pro Tip: CUDA 7 Streams Simplify Concurrency and How to Overlap Data Transfers in CUDA C/C++ carefully, I found the root cause. Because in my program, the CUDA memory is allocated through cudaMalloc (not unified memory), I also need synchronizing stream, like this:

cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyDefault);  
cudaStreamSynchronize(0);

Small tips of optimizing CUDA programs

In this post, I will introduce some tips which can improve CUDA programs’ performance:

(1) Use “pool” to cache resources.

“Pool” is a common data structure which can be used in boosting performance (You can refer my another post which introduces “pool” specially). From my experience, using memory pool to avoid allocating/freeing CUDA memory frequently is a very effective trick. The other resource I want to cache is CUDAstream. Yes, since CUDA 7, you can use --default-stream per-thread compile option to enable a “regular” stream for every host thread, but if you want to use multiple streams in one thread, a “stream” pool may be a choice.

P.S., this is my implementation of memory pool.

(2) Batch processing in stream.

The effect of tip is to reduce synchronizing stream. I.e.:

kernel_1< , , , st>();

kernel_2< , , , st>();

cudaStreamSynchronize(st);

instead of:

kernel_1< , , , st>();
cudaStreamSynchronize(st);

kernel_2< , , , st>();
cudaStreamSynchronize(st);

(3) Use Peer-to-Peer communication.

In most cases, the Peer-to-Peer communication among devices should be faster than using host as an agent, but it is not “absolute truth” (You can refer this post).

These tips are just my own ideas, and you should test and pick appropriate ones for your own application environment.

Two practical software engineering rules

There are so many huge books which introduce software engineering, and in this article, I want to share two practical rules which are based on my own experience.

(1) No fear of refactoring

As time goes on, refactoring code is inevitable: the original design can’t handle current situation seamlessly; we can use the new characteristics of programming language to polish existed code, etc. Since refactoring code is time-consuming, risking, and costly, many companies are reluctant to do it for some reasons. Whereas the refactoring is beneficial to both company and engineers literally.

For companies: After refactorig, the code should become more reasonable and easier-maintainable, and the consequence is that it will save you much time and cost to add new features. For engineers: refactoring code can let you be more familiar with the the code logic, try using new characteristics of programming language and practice module design skills, and it is a precious opportunity to enrich yourself. So in the long run, refactoring code is a win-win situation actually. ( If the software quality becomes worse, oh boy! Don’t refactor it!)

(2) “Real” peer-to-peer code review

I haven’t experienced pair-programming, but took part in many “fake” peer-to-peer code review: before reviewing, the reviewer didn’t read code before. During the reviewing, the code author needed to spell out what was the intention of this code, then the reviewer would analyze the code on the spot. It seemed the reviewer and code author were very busy in the review meeting, but in fact it was a totally time-wasting and inefficient!

From my viewpoint, there should be two maintainers for any software module, and the two maintainers have the same familiarity of code. No matter adding a new big feature or just fixing a small bug, the two maintainers should co-work the whole design flow in advance, then if the task is small, one maintainer can take over the whole work, otherwise they can share it. Since everybody has took part in the discussion before, he/she can review partner’s code alone. This method can avoid misleading by code author, saving time, and finding bug efficiently. The potential benefit for company is if one guy resigns, there is no loss because there is always another engineer who is an genuine backup.

These two rules seem feasible? Why not give them a shot?