What I learn from Practical Binary Analysis as a non-reverse-engineering engineer?

I spent the past two weeks in reading Practical Binary Analysis. Since I am not a professional reverse engineer, I glossed over the “Part III: Advanced Binary Analysis”, so I only read half the book. Even though, I still get a big gain:

(1) Know better of ELF file. On *nix Operating system, ELF file is everywhere: executable file, object file, shared library and coredump. “Chapter 2: The ELF format” gives me a clear explanation of the composition of ELF. E.g., I know why some functions have “@plt” suffix when using gdb to debug it.

(2) Master many tricks about GNU Binutils. GNU Binutils is a toolbox which provides versatile command line programs to analyze ELF files. Literally it relies heavily on BFD library. I also get some sense about how to use BFD library to tweak ELF files.

(3) “Appendix A: A crash course on X86 assembly” is a good tutorial for refreshing X86 architecture and assembly language.

(4) Others: E.g., I understand how to use LD_PRELOAD environmental variable and dynamic linking functions to manipulate shared library.

All in all, if you are working on *nix (although this book is based on Linux, I think most knowledge are also applicable to other *nix), you should try to read this book. I promise it is not a waste of time and you can always learn something, believe me!

The anatomy of ldd program on OpenBSD

In the past week, I read the ldd source code on OpenBSD to get a better understanding of how it works. And this post should also be a reference for other*NIX OSs.

The ELF file is divided into 4 categories: relocatable, executable, shared, and core. Only the executable and shared object files may have dynamic object dependencies, so the ldd only check these 2 kinds of ELF file:

(1) Executable.

ldd leverages the LD_TRACE_LOADED_OBJECTS environment variable in fact, and the code is as following:

if (setenv("LD_TRACE_LOADED_OBJECTS", "true", 1) < 0)
    err(1, "setenv(LD_TRACE_LOADED_OBJECTS)");

When LD_TRACE_LOADED_OBJECTS is set to 1 or true, running executable file will show shared objects needed instead of running it, so you even not needldd to check executable file. See the following outputs:

# /usr/bin/ldd
usage: ldd program ...
# LD_TRACE_LOADED_OBJECTS=1 /usr/bin/ldd
        Start            End              Type Open Ref GrpRef Name
        00000b6ac6e00000 00000b6ac7003000 exe  1    0   0      /usr/bin/ldd
        00000b6dbc96c000 00000b6dbcc38000 rlib 0    1   0      /usr/lib/libc.so.89.3
        00000b6d6ad00000 00000b6d6ad00000 rtld 0    1   0      /usr/libexec/ld.so  

(2) Shared object.

The code to print dependencies of shared object is as following:

if (ehdr.e_type == ET_DYN && !interp) {
    if (realpath(name, buf) == NULL) {
        printf("realpath(%s): %s", name,
            strerror(errno));
        fflush(stdout);
        _exit(1);
    }
    dlhandle = dlopen(buf, RTLD_TRACE);
    if (dlhandle == NULL) {
        printf("%s\n", dlerror());
        fflush(stdout);
        _exit(1);
    }
    _exit(0);
}

Why the condition of checking a ELF file is shared object or not is like this:

if (ehdr.e_type == ET_DYN && !interp) {
    ......
}

That’s because the file type of position-independent executable (PIE) is the same as shared object, but normally PIE contains a interpreter program header since it needs dynamic linker to load it while shared object lacks (refer this article). So the above condition will filter PIE file.

The dlopen(buf, RTLD_TRACE) is used to print dynamic object information. And the actual code is like this:

if (_dl_traceld) {
    _dl_show_objects();
    _dl_unload_shlib(object);
    _dl_exit(0);
}

In fact, you can also implement a simple application which outputs dynamic object information for shared object yourself:

#include <dlfcn.h>

int main(int argc, char **argv)
{
    dlopen(argv[1], RTLD_TRACE);
    return 0;
}

Compile and use it to analyze /usr/lib/libssl.so.43.2:

# cc lddshared.c
# ./a.out /usr/lib/libssl.so.43.2
    Start            End              Type Open Ref GrpRef Name
    000010e2df1c5000 000010e2df41a000 dlib 1    0   0      /usr/lib/libssl.so.43.2
    000010e311e3f000 000010e312209000 rlib 0    1   0      /usr/lib/libcrypto.so.41.1

The same as using ldd directly:

# ldd /usr/lib/libssl.so.43.2
/usr/lib/libssl.so.43.2:
    Start            End              Type Open Ref GrpRef Name
    00001d9ffef08000 00001d9fff15d000 dlib 1    0   0      /usr/lib/libssl.so.43.2
    00001d9ff1431000 00001d9ff17fb000 rlib 0    1   0      /usr/lib/libcrypto.so.41.1

Through the studying of ldd source code, I also get many by-products: such as knowledge of ELF file, linking and loading, etc. So diving into code is a really good method to learn *NIX deeper!