Use “LC_ALL=C” to improve performance

Using “LC_ALL=C” can improve some program’s performance. The following is the test without LC_ALL=C of join program:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time join 1.sorted 2.sorted > 1-2.sorted.aggregated

real    0m49.903s
user    0m48.427s
sys 0m0.786s

And this one is using “LC_ALL=C“:

$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time LC_ALL=C join 1.sorted 2.sorted > 1-2.sorted.aggregated

real    0m12.752s
user    0m5.628s
sys 0m0.971s

some good references about this topic are Speed up grep searches with LC_ALL=C and Everyone knows grep is faster in the C locale.

Clear file system cache before doing I/O-intensive benchmark on Linux

If you do any I/O-intensive benchmark, please run following command before it (Otherwise you may get wrong impression of the program performance):

$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"

sync means writing data from cache to file system (otherwise the dirty cache can’t be freed); “echo 3 > /proc/sys/vm/drop_caches” will drop clean caches, as well as reclaimable slab objects. Check following command:

$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time ./benchmark

real    0m12.434s
user    0m5.633s
sys 0m0.761s
$ time ./benchmark

real    0m6.291s
user    0m5.645s
sys 0m0.631s

the first run time of benchmark program is ~12 seconds. Now that the files are cached, the second run time of benchmark program is halved: ~6 seconds.

References:
Why drop caches in Linux?;
/proc/sys/vm;
CLEAR_CACHES.

Use OpenSSL to simulate TLS 1.3 “Session Resumption”

Thanks the great help from OpenSSL community, I finally can simulate an TLS 1.3 “Session Resumption”. The Operation System I used is OmniOS, and OpenSSL version is 1.1.1k, but I think the methods here can also be applied to other platforms:

(1) Open one terminal to launch tcpdump to capture TLS packets:

$ pfexec /opt/ooce/sbin/tcpdump -w tls.pcap port 443

(2) Open another terminal to initiate the first TLS 1.3 session:

$ openssl s_client -connect cloudflare.com:443 -tls1_3 -sess_out sess.pem -keylogfile keys1.txt
......

Once the connection is established, input “GET /” to trigger TLS 1.3 Server to send “New Session Ticket” message, and this will be saved in sess.pem file.

(3) Initiate another TLS 1.3 session to reuse the saved “Session Ticket“:

$ echo | openssl s_client -connect cloudflare.com:443 -tls1_3 -sess_in sess.pem -keylogfile keys2.txt

(4) Stop the tcpdump process.

(5) Combine two keys file into one:

$ cat keys1.txt keys2.txt > keys.txt

Then the keys.txt can be used to decrypt the two TLS 1.3 sessions (refer Use Wireshark to decrypt TLS flows).

Build OpenSSL on macOS

The default installed OpenSSL by brew is actually LibreSSL:

$ openssl version
LibreSSL 2.8.3

The method of building real OpenSSL is like this:

$ git clone https://github.com/openssl/openssl.git
$ cd openssl
$ mkdir build
$ cd build
$ ../Configure darwin64-x86_64 --debug --prefix=/Users/nanxiao/install
$ make 
$ make install

Check the freshly built OpenSSL:

$ /Users/nanxiao/install/bin/openssl version
OpenSSL 3.0.0-beta2-dev  (Library: OpenSSL 3.0.0-beta2-dev )

How to process large file?

In Process large data in external memory, I mentioned:

Update: Split large file into smaller ones, and use multiple threads to handle them is a good idea.

I want to elaborate how to process large file here:

(1) Split the large file into small ones which are independent from each other. E.g., based on users. Then you can spawn multiple threads to process each small file.

(2) For the output: if all threads output to same file, the write operations must be atomic and it will become bottleneck of the program. So every thread should have its own output file.

(3) After all threads exit, main thread can use cat or other methods to consolidate all output files into one.