Beware of out-of-boundary access of array

Today my colleague fixed one bug related to out-of-boundary access of array: a hash function returns the selected index of the array, but the hash function’s return value is int, so in corner case, when the hash value is overflow, it can become negative, and this will cause access an invalid element of the array. The lessons I learnt from this bug:
(1) Review the return value of hash function;
(2) Pay attention to the index when accessing array, is it possible to cause out-of-boundary access?

The caveat of thread name length in glibc

Recently, our team met an interesting bug: the process is configured to spawn 16 threads, but only spawns 10 threads in reality. The thread code is like this:

static void *
stat_consumer_thread_run(void *data)
{
    stat_consumer_thread_t *thread = data;
    char thread_name[64];
    snprintf(thread_name, sizeof(thread_name), "stat.consumer.%d",
        thread->id);
    int rc = pthread_setname_np(pthread_self(), thread_name);
    if (rc != 0) {
        return NULL;
    }

    ......
    return NULL;
}

After checking pthread_setname_np manual, we found:

The thread name is a meaningful C language string, whose length is restricted to 16 characters, including the terminating null byte (’\0’).

So thread name is restricted to 16 characters, “stat.consumer.0” ~ “stat.consumer.9” are set successfully, but “stat.consumer.10” ~ “stat.consumer.15” are not, and the corresponding threads are failed to run.

Bisection assert is a good debug methodology

Recently, I fixed an issue which is related to uninitialised bit-field in C programming language. Because the bit-filed can be either 0 or 1, so the bug will occur randomly. But the good news is the reproduced rate is very high, nearly 50%. Though I am not familiar with the code, I used bisection assert to help:

 {
  ......
  assert(bit-field == 0);
  ......
  assert(bit-field == 0);
  ......
 }

If the first assert is not triggered, but the second one is, I can know which code block has the bug, then bisect code and add assert again, until the root cause is found.

An issue related to uninitialised memory

Today I met an interesting bug: A C program behaved differently between debug (gcc -O0) and release (gcc -O3) modes.

First of all, I compared the logs between two modes, and pinned down in which function, the logs began to diverge.

Secondly, I used gdb to debug two programs simultaneously, and checked the variables’ values, then found a variable which had disparate values that would cause two programs enter different branches in a if-else statement. Hmm, this was the root cause.

My gut feeling was the release mode program may fetch the staled value, but after reviewing code carefully, I found the reason is one block memory (the variable belonged to) allocated from heap was not initialised, so this will introduce notorious “undefined behaviour”.

As far as I know, the reasons for uninitialising variables:
(1) The programmer forgets;
(2) The programmer reckons the variable will be assigned correct value before use, and there may be performance penalty for initialising a block of memory.
Anyway, the lesson I learnt today is unless you are 100% sure it will be OK to uninitialise the specified variable, otherwise please initialise it, and this can save you several hours in the future.

Beware of using GNU libc basename() function

From the manual page, we know there are two versions of basename() implementation. One for POSIX-compliant:

#include <libgen.h>

char *basename(char *path);

Another for GNU version:

#define _GNU_SOURCE         /* See feature_test_macros(7) */
#include <string.h>

But the manual doesn’t mention that the prototype type of GNU version is different from POSIX-compliant one (The parameter type is const char*, not char*):

char *basename (const char *__filename)

And the implementation is also simple, just invokes strrchr():

char *
__basename (const char *filename)
{
  char *p = strrchr (filename, '/');
  return p ? p + 1 : (char *) filename;
}