Technology | Nan Xiao's Blog

Fix a weird “undefined symbol” issue

Today I met a weird “undefined symbol” issue, i.e., I built a program successfully on one machine, but after transferring it to another machine, it reported following error during running:

    ......
    ... symbol lookup error: xxxxxx: undefined symbol: LZ4F_compressFrameBound

from the output of ldd, all libraries are there. After some time of debugging, I found the reason is the build machine and running machine have different versions of the librdkafka library. After upgrading the librdkafka library from build machine to the same version as running machine’s, the issue is fixed.

The shortcut keys for perf-report command

I am not sure it is only me, but I can’t find any document to introduce the shortcut keys for perf-report command. After executing perf report, press h will show the shortcut keys:

If you want to filter some symbols, press /:

To remove filter, you should press / + ENTER, instead of pressing q/ESC:

Otherwise you will exit perf-report program (Because the filtered symbols screen is actually the main screen when you run perf report):

Leaked socket causes zmq_ctx_term() block forever

I met an issue that zmq_ctx_term() blocks forever:

#0  0x00007ffff33bdddd in poll () from /usr/lib64/libc.so.6
#1  0x00007ffff1519d1a in zmq::signaler_t::wait(int) () from /opt/lib/libzmq.so.5
#2  0x00007ffff1500915 in zmq::mailbox_t::recv(zmq::command_t*, int) () from /opt/lib/libzmq.so.5
#3  0x00007ffff14ef42d in zmq::ctx_t::terminate() () from /opt/lib/libzmq.so.5
......

After debugging, I found the reason is the socket leak which caused by not handling an error condition. So do remember call zmq_close() in every possible path.

AddressSanitizer’s ChunkHeader

Recently, I came across following core dump from libasan:

#0  0x00007fffe76c7387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007fffe76c8a78 in __GI_abort () at abort.c:90
#2  0x00007ffff74c4582 in __sanitizer::Abort() () from /usr/lib64/libasan.so.6
#3  0x00007ffff74d012c in __sanitizer::Die() () from /usr/lib64/libasan.so.6
#4  0x00007ffff74af63c in __asan::ScopedInErrorReport::~ScopedInErrorReport() () from /usr/lib64/libasan.so.6
#5  0x00007ffff74ad989 in __asan::ReportMallocUsableSizeNotOwned(unsigned long, __sanitizer::BufferedStackTrace*) () from /usr/lib64/libasan.so.6
#6  0x00007ffff7418b82 in __asan::asan_malloc_usable_size(void const*, unsigned long, unsigned long) () from /usr/lib64/libasan.so.6
......;

To debug this issue, I checked libasan source code and found there is a 16-byte ChunkHeader in front of user memory which records the information of the used memory:

class ChunkHeader {
 public:
  atomic_uint8_t chunk_state;
  u8 alloc_type : 2;
  u8 lsan_tag : 2;

  // align < 8 -> 0
  // else      -> log2(min(align, 512)) - 2
  u8 user_requested_alignment_log : 3;

 private:
  u16 user_requested_size_hi;
  u32 user_requested_size_lo;
  atomic_uint64_t alloc_context_id;
  ......
}

By using user_requested_size_hi and user_requested_size_lo, we can calculate how much memory is required, and if it is 0, the above exception will be reported:

uptr asan_malloc_usable_size(const void *ptr, uptr pc, uptr bp) {
  if (!ptr) return 0;
  uptr usable_size = instance.AllocationSize(reinterpret_cast<uptr>(ptr));
  if (flags()->check_malloc_usable_size && (usable_size == 0)) {
    GET_STACK_TRACE_FATAL(pc, bp);
    ReportMallocUsableSizeNotOwned((uptr)ptr, &stack);
  }
  return usable_size;
}

The caveat of thread name length in glibc

Recently, our team met an interesting bug: the process is configured to spawn 16 threads, but only spawns 10 threads in reality. The thread code is like this:

static void *
stat_consumer_thread_run(void *data)
{
    stat_consumer_thread_t *thread = data;
    char thread_name[64];
    snprintf(thread_name, sizeof(thread_name), "stat.consumer.%d",
        thread->id);
    int rc = pthread_setname_np(pthread_self(), thread_name);
    if (rc != 0) {
        return NULL;
    }

    ......
    return NULL;
}

After checking pthread_setname_np manual, we found:

The thread name is a meaningful C language string, whose length is restricted to 16 characters, including the terminating null byte (’\0’).

So thread name is restricted to 16 characters, “stat.consumer.0” ~ “stat.consumer.9” are set successfully, but “stat.consumer.10” ~ “stat.consumer.15” are not, and the corresponding threads are failed to run.