Notice the linking library position on Ubuntu

This week, I ported tcpbench from OpenBSD to Linux. The idiomatic method of OpenBSD is putting the linking library in front of generating final target:

cc -g -O2 -Wall -levent -o tcpbench tcpbench.c

However this doesn’t work in Ubuntu since its the linker uses --as-needed option. So I change the Makefile to put the library at the end:

cc -g -O2 -Wall -o tcpbench tcpbench.c -levent

Please refer this discussion if you are interested.

Use epoll in multiple-thread programming

epoll provides a simple but high-efficient polling mechanism:

(1) epoll_create1 creates a epoll instance;
(2) epoll_ctl modifies the file descriptors in epoll instance;
(3) epoll_wait is used to wait I/O events.

Moustique shows a method of using epoll in multiple-thread program:

auto event_loop_fn = [listen_fd, conn_handler] {

    int epoll_fd = epoll_create1(0);

    ......
    epoll_ctl(listen_fd, EPOLLIN | EPOLLET);

    const int MAXEVENTS = 64;
    ......

    // Event loop.
    epoll_event events[MAXEVENTS];
    while (true)
    {
      int n_events = epoll_wait (epoll_fd, events, MAXEVENTS, -1);
      ......
    }
}

Every thread has its own epoll instance, and monitors the listen_fd. When a new connection is established, a dedicated thread will serve it. Since every thread has its own epoll instance and events, this will eliminate synchronization among threads.

If you want multiple threads using the same epoll instance, I think this topic can be a reference.

Use “cat” to concatenate files

cat is a neat command to concatenate files on Unix (please see this post). Let’s see some examples:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > new.txt
# cat new.txt
1
2
# cat 1.txt 2.txt >> new.txt
# cat new.txt
1
2
1
2

Please notice if the output file is also the input file, the input file content will be truncated first:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > 1.txt
# cat 1.txt
2
# cat 2.txt
2

So the correct appending file method is using >>:

# cat 1.txt
1
# cat 2.txt
2
# cat 2.txt >> 1.txt
# cat 1.txt
1
2
# cat 2.txt
2

Check the implementation of cat in OpenBSD, and the core parts are iterating files and reading content from them:

(1) Iterate every file (raw_args):

void
raw_args(char **argv)
{
    int fd;

    fd = fileno(stdin);
    filename = "stdin";
    do {
        if (*argv) {
            if (!strcmp(*argv, "-"))
                fd = fileno(stdin);
            else if ((fd = open(*argv, O_RDONLY, 0)) < 0) {
                warn("%s", *argv);
                rval = 1;
                ++argv;
                continue;
            }
            filename = *argv++;
        }
        raw_cat(fd);
        if (fd != fileno(stdin))
            (void)close(fd);
    } while (*argv);
}

You need to pay attention that cat use - to identify stdin.

(2) Read content from every file (raw_cat):

void
raw_cat(int rfd)
{
    int wfd;
    ssize_t nr, nw, off;
    static size_t bsize;
    static char *buf = NULL;
    struct stat sbuf;

    wfd = fileno(stdout);
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    if (nr < 0) {
        warn("%s", filename);
        rval = 1;
    }
}

a) When reading the first file, the cat command uses fstat to get the st_blksize attribute of stdout which is “optimal blocksize for I/O”, then allocates the memory:

    ......
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    ......

b) Read the content of file and write it into stdout:

    ......
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    ......

When read returns 0, it means reaching the end of file. If write doesn’t return the number you want to write, it is also considered as an error.

Benchmark C++ ifstream and mmap

After reading Which is fastest: read, fread, ifstream or mmap?, I try to benchmark C++ ifstream and mmap myself.

The test file is number.txt, and the size is 4GiB:

# ls -alt number.txt
-rw-r--r-- 1 root root 4294967296 Apr  2 13:51 number.txt

The test_ifstream.cpp is like this:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

char chunk[1048576];

int main(void)
{
    std::ifstream ifs(FILE_NAME, std::ios_base::binary);
    if (!ifs) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        unsigned long long res = 0;
        ifs.seekg(0);
        auto begin = std::chrono::system_clock::now();

        for (size_t j = 0; j < 4096; j++) {
            ifs.read(chunk, sizeof(chunk));
            for (size_t k = 0; k < sizeof(chunk); k++) {
                res += chunk[k];
            }
        }
        ofs << res;

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;
    return 0;
}

The program reads 1MiB(1024 * 1024 = 1048576) every time (the total count is 4096). Use -O2 optimization:

# clang++ -O2 test_ifstream.cpp -o test_ifstream
# ./test_ifstream
57208
57085
57061
57105
57069
Average exec time: 57105

The average execution time is 57105 ms. From the htop output:

1

We can see test_ifstream occupies very little memory.

The following is test_mmap file:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>


const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

int main(void)
{
    int fd = ::open(FILE_NAME.c_str(), O_RDONLY);
    if (fd < 0) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    auto file_size = lseek(fd, 0, SEEK_END);

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        lseek(fd, 0, SEEK_SET);
        unsigned long long res = 0;
        auto begin = std::chrono::system_clock::now();

        char *chunk = reinterpret_cast<char*>(mmap(NULL, file_size, PROT_READ, MAP_FILE | MAP_SHARED, fd, 0));
        char *addr = chunk;

        for (size_t j = 0; j < file_size; j++) {
            res += *chunk++;
        }
        ofs << res;

        munmap(addr, file_size);

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;

    ::close(fd);
    return 0;
}

Still use -O2 optimization:

# clang++ -O2 test_mmap.cpp -o test_mmap
# ./test_mmap
57241
57095
57038
57008
57175
Average read time: 57111

We can see the execution time of test_mmap is similar as test_ifstream, whereas test_mmap uses more memory:

2

P.S., the full code isĀ here.

Set up Haskell development environment on Arch Linux

Follow this post, install stack first:

# pacman -S stack
resolving dependencies...
looking for conflicting packages...
......

At the end of installation, it prompts following words:

You need to either 1) install latest stable ghc package from [extra] or 2) install ncurses5compat-libs from AUR for the prebuilt binaries installed by stack to work.

So install ghc further:

# pacman -S ghc
resolving dependencies...
looking for conflicting packages...

Now that the environment is ready, you should modify your own ~/.stack/config.yaml file:

# This file contains default non-project-specific settings for 'stack', used
# in all projects.  For more information about stack's configuration, see
# http://docs.haskellstack.org/en/stable/yaml_configuration/

# The following parameters are used by "stack new" to automatically fill fields
# in the cabal config. We recommend uncommenting them and filling them out if
# you intend to use 'stack new'.
# See https://docs.haskellstack.org/en/stable/yaml_configuration/#templates
templates:
  params:
#    author-name:
#    author-email:
#    copyright:
#    github-username:

For example, add name, email etc.

Then follow this link to create a simple program:

# stack new hello
# cd hello
# stack setup
# stack build
# stack exec hello-exe

You will see “someFunc” is outputted.

BTW, you can also use ghc compiler directly. E.g., write a “Hello world” program (Reference is here):

# cat hello.hs
main :: IO ()
main = putStrLn "Hello World!"
# ghc -dynamic hello.hs
# ./hello
Hello World!

That’s all!