Resolve “Runtime Error” problem when hackinging in hackerrank

A few days ago, I was trying to resolve the classic maximum subarray conundrum in hackerrankand bumped into an interesting issue. When I submitted the solution, one testcase would fail. But if I ran the testcase individually, the result is right. After discussing in the IRC and Support, it seems the code didn’t pass some memory/time limit constraint in the environment, so I began to optimize my code.

My initial Go code is like this:

package main
import "fmt"
import "math"
import "os"

func MaxNonConArray(s []int) int {
    var max int

    if len(s) < 1 {
        return 0
    }

    for _, v := range s {
        if v > 0 {
            max += v
        }
    }

    if max == 0 {
        max = s[0]
        for _, v := range s[1:] {
            if v > max {
                max = v
            }
        } 
    }
    return max
}

func MaxConArray(s []int) int {
    var max int

    if len(s) > 0 {
        max = s[0]
        currMax := s[0]
        for _, v := range s[1:] {
            currMax = int(math.Max(float64(currMax+v), float64(v)))
            max = int(math.Max(float64(currMax), float64(max)))
        }
    }
    return max
}


func main() {
 //Enter your code here. Read input from STDIN. Print output to STDOUT
    num := 0
    s := [][]int(nil)

    _, err := fmt.Scanf("%d", &num)
    if err != nil {
        os.Exit(1)
    }

    s = make([][]int, num)
    for i := 0; i < len(s); i++ {
        n := 0
        _, err := fmt.Scanf("%d", &n)
        if err != nil {
            os.Exit(1)
        }

        s[i] = make([]int, n)
        for j := 0; j < n; j++ {
            _, err := fmt.Scanf("%d", &s[i][j])
            if err != nil {
                os.Exit(1)
            }
        }
    }

    for i := 0; i < len(s); i++ {
        fmt.Println(MaxConArray(s[i]), MaxNonConArray(s[i]))
    }
}

The main function would allocate a two-dimension slice to accommodate all the input elements. Suddenly I realized a one-dimension slice should be enough, and it could be reused after the wanted value was calculated. So the main code was changed like this:

func main() {
    //Enter your code here. Read input from STDIN. Print output to STDOUT
    var num int

    _, err := fmt.Scanf("%d", &num)
    if err != nil {
        os.Exit(1)
    }

    for i := 0; i < num; i++ {
        var n int
        _, err := fmt.Scanf("%d", &n)
        if err != nil {
            os.Exit(1)
        }

        s := make([]int, n)
        for j := 0; j < n; j++ {
            _, err := fmt.Scanf("%d", &s[j])
            if err != nil {
                os.Exit(1)
            }
        }
        fmt.Println(MaxConArray(s), MaxNonConArray(s))
    }
}

Unfortunately, this time the testcase still can’t pass. So I resorted to C programming language:

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>

int maxNonConSubArray(int *array, int n) {
    int max = 0;

    if (n > 0) {
        for (int i = 0; i < n; i++) {
            if (array[i] > 0) {
                max += array[i];
            }
        }

        if (max == 0) {
            max = array[0];
            for (int i = 1; i < n; i++) {
                if (array[i] > max) {
                    max = array[i];
                }
            }
        }
    }

    return max;
}

int MaxConSubArray(int *array, int n) {
    int max = 0, currMax = 0;

    if (n > 0) {
        max = currMax = array[0];
        for (int i = 1; i < n; i++) {
            currMax = (currMax + array[i]) > array[i] ? (currMax + array[i]) : array[i];
            max = max > currMax ? max : currMax;
        }
    }
    return max;
}

int main() {
    int num = 0;
    /* Enter your code here. Read input from STDIN. Print output to STDOUT */
    if (scanf("%d", &num) != 1) {
        return 1;
    }

    for (int i = 0; i < num; i++) {
        int n = 0, *array = NULL;

        if (scanf("%d", &n) != 1) {
            return 1;
        }

        array = calloc(n, sizeof(int));
        if (array == NULL) {
            return 1;
        }
        for (int j = 0; j < n; j++) {
            if (scanf("%d", array + j) != 1) {
                return 1;
            }
        }
        printf("%d %d\n",MaxConSubArray(array, n), maxNonConSubArray(array, n));
        free(array);
    }
    return 0;
}

In the C code, I need to manage memory manually, and this time, the testcase didn’t complain, and the submission was success.

Fix compile error: “fatal error: ‘libelf.h’ file not found”

Recently, when I try to build an open source project, I meet the following compile error:

fatal error: 'libelf.h' file not found
#include <libelf.h>
     ^
1 error generated.

The solution is installing elfutils-libelf-devel package:

sudo yum install elfutils-libelf-devel

Or:

sudo dnf install elfutils-libelf-devel

(You can also read this post on itechlounge.net)

A trick of building multithreaded application on Solaris

Firstly, Let’s see a simple multithreaded application:

#include <stdio.h>
#include <pthread.h>
#include <errno.h>

void *thread1_func(void *p_arg)
{
           errno = 0;
           sleep(3);
           errno = 1;
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

void *thread2_func(void *p_arg)
{
           errno = 0;
           sleep(5);
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread1_func, "Thread 1");
        pthread_create(&t2, NULL, thread2_func, "Thread 2");

        sleep(10);
        return;
}

What output do you expect from this program? Per my understanding, the errnoshould be a thread-safe variable. Though The thread1_func function changes theerrno, it should not affect errno in thread2_func function.

Let’s check it on Solaris 10:

bash-3.2# gcc -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 1

Oh! The errno in thread2_func function is also changed to 1. Why does it happen? Let’s find the root cause from the errno.h file:

/*
 * Error codes
 */

#include <sys/errno.h>

#ifdef  __cplusplus
extern "C" {
#endif

#if defined(_LP64)
/*
 * The symbols _sys_errlist and _sys_nerr are not visible in the
 * LP64 libc.  Use strerror(3C) instead.
 */
#endif /* _LP64 */

#if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif  /* defined(_REENTRANT) || defined(_TS_ERRNO) */

#ifdef  __cplusplus
}
#endif

#endif  /* _ERRNO_H */

We can find the errno can be a thread-safe variable(#define errno (*(___errno()))) only when the following macros defined:

defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L

Let’s try it:

bash-3.2# gcc -D_POSIX_C_SOURCE=199506L -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 0

Yes, the output is right!

From Compiling a Multithreaded Application, we can see:

For POSIX behavior, compile applications with the -D_POSIX_C_SOURCE flag set >= 199506L. For Solaris behavior, compile multithreaded programs with the -D_REENTRANT flag.

So we should pay more attentions when building multithreaded application on Solaris.

P.S., the full code is here.

Reference:
(1) Compiling a Multithreaded Application;
(2) What is the correct way to build a thread-safe, multiplatform C library?

C programming tips in SPARC architecture

If you are a newbie of C programmers in SPARC architecture (For example, working on Solaris), you should pay attention to the following tips:

(1) By default, SPARC is big-endian (For Endianness, you can refer http://en.wikipedia.org/wiki/Endianness). It means for an integer (short, int, long, etc), the MSB will be stored in the lower address, while the LSB will be stored in the higher address.

(2) SPARC requires byte-alignment. It means for a short (2 bytes long) variable, the start address of the variable must be the multiples of 2, while a int (4 bytes long) variable, the start address of the variable must be the multiples of 4. If the address can’t satisfy this condition, the application will core dump, and a “Bus Error” will be reported. For this tip, you can refer Expert C Programming: Deep C Secrets (Bus Error section, page 163 ~ page 164).

For more SPARC information, you can refer:
http://en.wikipedia.org/wiki/SPARC;
SPARC Processor Issues.

An experience of fixing a memory-corruption bug

During the last 4 months, I was disturbed by a memory-corruption bug, and this bug will cause program crash. Until last Monday, I found the root cause and fixed it. This debug process is a difficult but memorable experience, so I will share it in this article.

My program works as a SMS Hub. When it receives a SMS, it will allocate a structure in heap memory like this:

typedef struct
{
......
int *a[8];
......
} info;

After processing the SMS, the program will free the memory, and send the SMS to the next Hub or Operator.

Since last November, the program will crash sometimes, and the cause is the second element in array a (a[1])will be changed from a valid value to NULL.

(1) Checking the logs and reproduced the bug
Firstly, I checked the commercial logs, but there were no clues can be found. And the SMS which killed program also seemed no different than others. I also tried to use this SMS to reproduce the bug in testbed, but also failed.

(2) Using libumem
Because our program runs in Solaris, I linked the libumem and hoped it can help me. After a few days, the program crashed again. But the tags before and after the corrupted memory are all OK, so it is not a memory off-bound bug. I also checked the memory before and after the corrupted memory, but nothing valuable can be found.

(3) Adding more logs
Until then, the only thing I can think is adding more logs. After adding enough logs, I found the variable is modified between functions in the same thread: when leaving the last function, the variable is OK, but entering the next function, the variable is changed. So I can make sure the variable is destroyed by another thread. But how can I find the murderer?

(4) Asking for help from other geeks
I began to ask for help from other geeks. I put this question on the stackoverflow: How to debug the memory is changed randomly issue, and received a very good and comprehensive answer. I recommended every one should read this post. I also sent emails to other geeks, most of them replied and gave some suggestions, such as exchange some variables definition sequences, etc. I also wanted to get the root cause from the core dump file directly: Can I know which thread change the global variable’s value from core dump file?, but at the end, I am failed. Until one day, I found an article (Aha, written in Chinese!)by accident, this article describes the bug is very similar to mine except his programming language is C++ and mine is C. I began to follow the steps he provided to find the root cause.

(5) Porting the program on Linux and use valgrind
I wanted to use valgrind to help find the bug, but my program runs on Solaris, and valgrind can’t be used on Solaris, so another colleague helped to ported it on Linux. After running valgrind, I did find some memory-related bugs, but these bugs can’t explain the cause of this issue. I still can’t find the root cause.

(6) Using electric-fence
I tried to use electric-fence, but the home page of electric-fence was closed. I found some packages and source codes from the internet, and tried to integrated them into my program. But this time, I also failed.

(7) Using mprotect function
Because the second element of the array is always changed to NULL, I used mprotect instead of malloc to allocate the memory of the array, and set read-only attribute of the memory. So if the memory is changed, the program will crash. But the mprotect will allocate a page size of the memory, and this may cause the program change the behavior, I am not sure whether the bug can occur again.

(8) Finding the cause
About 2 weeks later, when a colleague stopped the application, the application crashed again. The cause was a global variable was changed. I checked all the old core dumps immediately, and found the global variable was always changed, and pointed to the address of the array! It seemed I was very clear to the truth.

After about 2 days analysis, the cause was found: When a thread calls pthread_create to create another thread, it will changes a global variable. But in few cases, the child thread will execute firstly, and it will also changes the global variable. The code assumes the parent thread always execute firstly, and this will cause the program crash.

When looking back the 4-month experience, I have studied a lot of things for debugging this bug: libumem, valgrid, libefence, etc. It is a really memorable and cool experience!