Performance comparison between string::at and string::operator[] in C++

Check following testPalindrome_index.cpp program which utilizes string::operator[]:

#include <string>

bool isPalindrome(
    std::string& s,
    std::string::size_type start,
    std::string::size_type end) {

        auto count = (end - start + 1) / 2;
        for (std::string::size_type i = 0; i < count; i++) {
            if (s[start] != s[end]) {
                return false;
            }
            start++;
            end--;
        }

        return true;
}

int main() {
        std::string s(1'000'000'000, 'a');

        isPalindrome(s, 0, s.size() - 1);
        return 0;
}

My compile is clang++, and measure the execution time without & with optimization:

# c++ -std=c++14 testPalindrome_index.cpp -o index
# time ./index
    0m13.84s real     0m12.77s user     0m01.06s system
# c++ -std=c++14 -O2 testPalindrome_index.cpp -o index
# time ./index
    0m01.44s real     0m00.42s user     0m01.01s system

We can see the time differences are so large (13.84 vs 1.44)!

Then change the code to use string::at:

#include <string>

bool isPalindrome(
    std::string& s,
    std::string::size_type start,
    std::string::size_type end) {

        auto count = (end - start + 1) / 2;
        for (std::string::size_type i = 0; i < count; i++) {
            if (s.at(start) != s.at(end)) {
                return false;
            }
            start++;
            end--;
        }

        return true;
}

int main() {
        std::string s(1'000'000'000, 'a');

        isPalindrome(s, 0, s.size() - 1);
        return 0;
}

Compile and test again:

# c++ -std=c++14 testPalindrome_at.cpp -o at
# time ./at
    0m07.31s real     0m06.36s user     0m00.96s system
# c++ -std=c++14 -O2 testPalindrome_at.cpp -o at
# time ./at
    0m06.42s real     0m05.45s user     0m00.97s system

We can see the time gap is nearly 1 second, and not outstanding as the first case. But the time with “-O2” optimization is 6.42, far bigger than 1.44which uses string::operator[].

The conclusion is if the string is long enough, the performance bias of using string::operator[] and string::at is remarkable. So this factor should be considered when decide which function should be used.

P.S., the full code isĀ here.

Use epoll in multiple-thread programming

epoll provides a simple but high-efficient polling mechanism:

(1) epoll_create1 creates a epoll instance;
(2) epoll_ctl modifies the file descriptors in epoll instance;
(3) epoll_wait is used to wait I/O events.

Moustique shows a method of using epoll in multiple-thread program:

auto event_loop_fn = [listen_fd, conn_handler] {

    int epoll_fd = epoll_create1(0);

    ......
    epoll_ctl(listen_fd, EPOLLIN | EPOLLET);

    const int MAXEVENTS = 64;
    ......

    // Event loop.
    epoll_event events[MAXEVENTS];
    while (true)
    {
      int n_events = epoll_wait (epoll_fd, events, MAXEVENTS, -1);
      ......
    }
}

Every thread has its own epoll instance, and monitors the listen_fd. When a new connection is established, a dedicated thread will serve it. Since every thread has its own epoll instance and events, this will eliminate synchronization among threads.

If you want multiple threads using the same epoll instance, I think this topic can be a reference.

Use “cat” to concatenate files

cat is a neat command to concatenate files on Unix (please see this post). Let’s see some examples:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > new.txt
# cat new.txt
1
2
# cat 1.txt 2.txt >> new.txt
# cat new.txt
1
2
1
2

Please notice if the output file is also the input file, the input file content will be truncated first:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > 1.txt
# cat 1.txt
2
# cat 2.txt
2

So the correct appending file method is using >>:

# cat 1.txt
1
# cat 2.txt
2
# cat 2.txt >> 1.txt
# cat 1.txt
1
2
# cat 2.txt
2

Check the implementation of cat in OpenBSD, and the core parts are iterating files and reading content from them:

(1) Iterate every file (raw_args):

void
raw_args(char **argv)
{
    int fd;

    fd = fileno(stdin);
    filename = "stdin";
    do {
        if (*argv) {
            if (!strcmp(*argv, "-"))
                fd = fileno(stdin);
            else if ((fd = open(*argv, O_RDONLY, 0)) < 0) {
                warn("%s", *argv);
                rval = 1;
                ++argv;
                continue;
            }
            filename = *argv++;
        }
        raw_cat(fd);
        if (fd != fileno(stdin))
            (void)close(fd);
    } while (*argv);
}

You need to pay attention that cat use - to identify stdin.

(2) Read content from every file (raw_cat):

void
raw_cat(int rfd)
{
    int wfd;
    ssize_t nr, nw, off;
    static size_t bsize;
    static char *buf = NULL;
    struct stat sbuf;

    wfd = fileno(stdout);
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    if (nr < 0) {
        warn("%s", filename);
        rval = 1;
    }
}

a) When reading the first file, the cat command uses fstat to get the st_blksize attribute of stdout which is “optimal blocksize for I/O”, then allocates the memory:

    ......
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    ......

b) Read the content of file and write it into stdout:

    ......
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    ......

When read returns 0, it means reaching the end of file. If write doesn’t return the number you want to write, it is also considered as an error.

A scenario in which clang will generate ud2 instruction while gcc not

Check following main.cpp:

# cat main.cpp
#include <functional>
#include <thread>

int main()
{
    int a = 2;

    std::function<void ()> work = [&]()
    {
        if (a < 1) {
            return 1;
        }
        while (1)
        {
            if (a > 1)
            {
                break;
            }
        }
    };


    std::thread t(mergeWork);
    t.join();
    return 0;
}

The prototype of work is void (), while in a < 1 branch, it actually has a return value:

if (a < 1) {
    return 1;
}

Compile the program with g++, and execute it:

# g++ -g -pthread main.cpp
# ./a.out
#

It runs smoothly. Switch to clang++:

# clang++ -g -pthread main.cpp
main.cpp:20:5: warning: control may reach end of non-void lambda [-Wreturn-type]
    };
    ^
1 warning generated.
# ./a.out
Illegal instruction (core dumped)

A warning is generated during building and “Illegal instruction” occurs during running. Check the assembly code of work:

 disassemble
Dump of assembler code for function main::$_0::operator()() const:
   0x0000555555555100 <+0>:     push   %rbp
   0x0000555555555101 <+1>:     mov    %rsp,%rbp
   0x0000555555555104 <+4>:     mov    %rdi,-0x8(%rbp)
   0x0000555555555108 <+8>:     mov    -0x8(%rbp),%rdi
=> 0x000055555555510c <+12>:    mov    (%rdi),%rax
   0x000055555555510f <+15>:    cmpl   $0x1,(%rax)
   0x0000555555555112 <+18>:    mov    %rdi,-0x10(%rbp)
   0x0000555555555116 <+22>:    jge    0x555555555123 <main::$_0::operator()() const+35>
   0x000055555555511c <+28>:    mov    $0x1,%eax
   0x0000555555555121 <+33>:    pop    %rbp
   0x0000555555555122 <+34>:    retq
   0x0000555555555123 <+35>:    jmpq   0x555555555128 <main::$_0::operator()() const+40>
   0x0000555555555128 <+40>:    mov    -0x10(%rbp),%rax
   0x000055555555512c <+44>:    mov    (%rax),%rcx
   0x000055555555512f <+47>:    cmpl   $0x1,(%rcx)
   0x0000555555555132 <+50>:    jle    0x55555555513d <main::$_0::operator()() const+61>
   0x0000555555555138 <+56>:    jmpq   0x555555555142 <main::$_0::operator()() const+66>
   0x000055555555513d <+61>:    jmpq   0x555555555128 <main::$_0::operator()() const+40>
   0x0000555555555142 <+66>:    ud2
End of assembler dump.

We can find an ud2 instruction at the end of lambda function. Modify the a < 1 branch:

if (a < 1) {
     return;
 }

This time, the program runs OK.

Upgrade OpenBSD from 6.2 to 6.3

Since OpenBSD 6.3 is released, it is time to upgrade 6.2.

The upgrade manual is here. But for newbies like me, I think the most challenge step is booting from ramdisk kernel, bsd.rd: Download and copy it into root file system:

# mv bsd.rd /

Then reboot machine, during prompting boot>, input boot /bsd.rd:

boot> boot /bsd.rd

Then upgrade OpenBSD according to the instructions.

For me, there are 2 important aspects of OpenBSD 6.3:
(1) The vim is upgrade to 8.0.1589, and the strange display issue is fixed when using terminal (Please refer this thread).
(2) My lscpu is included into ports since OpenBSD 6.3, so you can install it on x86 architectures:

# pkg_add lscpu
# lscpu
Architecture:            amd64
Byte Order:              Little Endian
Active CPU(s):           2
Total CPU(s):            2
......