Linux | Nan Xiao's Blog

Beware “No such file or directory” error in using ksh

Check following simple script:

#!/usr/bbin/python
print("hello world!")

I misspelled python path intentionally. On Linux Bash, it reported following error:

$ ./hello.py
-bash: ./hello.py: /usr/bbin/python: bad interpreter: No such file or directory

It prompted me that “/usr/bbin/python” couldn’t be found. While on OpenBSD ksh:

$ ./hello.py
ksh: ./hello.py: No such file or directory

It gave an illusion that hello.py didn’t exist. So be careful about this error information if you use ksh.

Notice the linking library position on Ubuntu

This week, I ported tcpbench from OpenBSD to Linux. The idiomatic method of OpenBSD is putting the linking library in front of generating final target:

cc -g -O2 -Wall -levent -o tcpbench tcpbench.c

However this doesn’t work in Ubuntu since its the linker uses --as-needed option. So I change the Makefile to put the library at the end:

cc -g -O2 -Wall -o tcpbench tcpbench.c -levent

Please refer this discussion if you are interested.

Use epoll in multiple-thread programming

epoll provides a simple but high-efficient polling mechanism:

(1) epoll_create1 creates a epoll instance;
(2) epoll_ctl modifies the file descriptors in epoll instance;
(3) epoll_wait is used to wait I/O events.

Moustique shows a method of using epoll in multiple-thread program:

auto event_loop_fn = [listen_fd, conn_handler] {

    int epoll_fd = epoll_create1(0);

    ......
    epoll_ctl(listen_fd, EPOLLIN | EPOLLET);

    const int MAXEVENTS = 64;
    ......

    // Event loop.
    epoll_event events[MAXEVENTS];
    while (true)
    {
      int n_events = epoll_wait (epoll_fd, events, MAXEVENTS, -1);
      ......
    }
}

Every thread has its own epoll instance, and monitors the listen_fd. When a new connection is established, a dedicated thread will serve it. Since every thread has its own epoll instance and events, this will eliminate synchronization among threads.

If you want multiple threads using the same epoll instance, I think this topic can be a reference.

Use “cat” to concatenate files

cat is a neat command to concatenate files on Unix (please see this post). Let’s see some examples:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > new.txt
# cat new.txt
1
2
# cat 1.txt 2.txt >> new.txt
# cat new.txt
1
2
1
2

Please notice if the output file is also the input file, the input file content will be truncated first:

# cat 1.txt
1
# cat 2.txt
2
# cat 1.txt 2.txt > 1.txt
# cat 1.txt
2
# cat 2.txt
2

So the correct appending file method is using >>:

# cat 1.txt
1
# cat 2.txt
2
# cat 2.txt >> 1.txt
# cat 1.txt
1
2
# cat 2.txt
2

Check the implementation of cat in OpenBSD, and the core parts are iterating files and reading content from them:

(1) Iterate every file (raw_args):

void
raw_args(char **argv)
{
    int fd;

    fd = fileno(stdin);
    filename = "stdin";
    do {
        if (*argv) {
            if (!strcmp(*argv, "-"))
                fd = fileno(stdin);
            else if ((fd = open(*argv, O_RDONLY, 0)) < 0) {
                warn("%s", *argv);
                rval = 1;
                ++argv;
                continue;
            }
            filename = *argv++;
        }
        raw_cat(fd);
        if (fd != fileno(stdin))
            (void)close(fd);
    } while (*argv);
}

You need to pay attention that cat use - to identify stdin.

(2) Read content from every file (raw_cat):

void
raw_cat(int rfd)
{
    int wfd;
    ssize_t nr, nw, off;
    static size_t bsize;
    static char *buf = NULL;
    struct stat sbuf;

    wfd = fileno(stdout);
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    if (nr < 0) {
        warn("%s", filename);
        rval = 1;
    }
}

a) When reading the first file, the cat command uses fstat to get the st_blksize attribute of stdout which is “optimal blocksize for I/O”, then allocates the memory:

    ......
    if (buf == NULL) {
        if (fstat(wfd, &sbuf))
            err(1, "stdout");
        bsize = MAXIMUM(sbuf.st_blksize, BUFSIZ);
        if ((buf = malloc(bsize)) == NULL)
            err(1, "malloc");
    }
    ......

b) Read the content of file and write it into stdout:

    ......
    while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
        for (off = 0; nr; nr -= nw, off += nw)
            if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 ||
                 nw == -1)
                err(1, "stdout");
    ......

When read returns 0, it means reaching the end of file. If write doesn’t return the number you want to write, it is also considered as an error.

Benchmark C++ ifstream and mmap

After reading Which is fastest: read, fread, ifstream or mmap?, I try to benchmark C++ ifstream and mmap myself.

The test file is number.txt, and the size is 4GiB:

# ls -alt number.txt
-rw-r--r-- 1 root root 4294967296 Apr  2 13:51 number.txt

The test_ifstream.cpp is like this:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

char chunk[1048576];

int main(void)
{
    std::ifstream ifs(FILE_NAME, std::ios_base::binary);
    if (!ifs) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        unsigned long long res = 0;
        ifs.seekg(0);
        auto begin = std::chrono::system_clock::now();

        for (size_t j = 0; j < 4096; j++) {
            ifs.read(chunk, sizeof(chunk));
            for (size_t k = 0; k < sizeof(chunk); k++) {
                res += chunk[k];
            }
        }
        ofs << res;

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;
    return 0;
}

The program reads 1MiB(1024 * 1024 = 1048576) every time (the total count is 4096). Use -O2 optimization:

# clang++ -O2 test_ifstream.cpp -o test_ifstream
# ./test_ifstream
57208
57085
57061
57105
57069
Average exec time: 57105

The average execution time is 57105 ms. From the htop output:

We can see test_ifstream occupies very little memory.

The following is test_mmap file:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>


const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

int main(void)
{
    int fd = ::open(FILE_NAME.c_str(), O_RDONLY);
    if (fd < 0) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    auto file_size = lseek(fd, 0, SEEK_END);

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        lseek(fd, 0, SEEK_SET);
        unsigned long long res = 0;
        auto begin = std::chrono::system_clock::now();

        char *chunk = reinterpret_cast<char*>(mmap(NULL, file_size, PROT_READ, MAP_FILE | MAP_SHARED, fd, 0));
        char *addr = chunk;

        for (size_t j = 0; j < file_size; j++) {
            res += *chunk++;
        }
        ofs << res;

        munmap(addr, file_size);

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;

    ::close(fd);
    return 0;
}

Still use -O2 optimization:

# clang++ -O2 test_mmap.cpp -o test_mmap
# ./test_mmap
57241
57095
57038
57008
57175
Average read time: 57111

We can see the execution time of test_mmap is similar as test_ifstream, whereas test_mmap uses more memory:

P.S., the full code is here.