The tips of optimising OpenSSL applications

This post introduces some tips of optimising applications which use OpenSSL.

(1) Per-thread memory pool. Because OpenSSL heavily allocate/free memories in its internals, implementing bespoke memory management functions which use per-thread memory pool (register them with CRYPTO_set_mem_functions) can improve performance.

(2) Reuse contexts. E.g., for EVP_PKEY_CTX, since every time EVP_PKEY_derive_init() will initialise its content, every thread can pre-allocate one EVP_PKEY_CTX and avoid allocating & freeing EVP_PKEY_CTX frequently. Further more, from the document:

The function EVP_PKEY_derive() can be called more than once on the same context if several operations are performed using the same parameters.

Another example is about EVP_CIPHER_CTX; a typical program is like this:

    EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
    EVP_EncryptInit_ex(ctx, EVP_aes_128_gcm(), NULL, key, nonce);
    EVP_EncryptUpdate(ctx, NULL, &len, aad, sizeof(aad));
    EVP_EncryptUpdate(ctx, ct, &ct_len, pt, sizeof(pt));
    EVP_EncryptFinal_ex(ctx, ct + ct_len, &len);
    EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_GET_TAG, 16, ct + ct_len);
    EVP_CIPHER_CTX_free(ctx);

Actually, we can create a dedicated EVP_CIPHER_CTX for one EVP_CIPHER, i.e.:

    EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
    EVP_EncryptInit_ex(ctx, EVP_aes_128_gcm(), NULL, NULL, NULL);

Then in program, we can reuse this EVP_CIPHER_CTX, and just modify other parameters:

    EVP_EncryptInit_ex(ctx, NULL, NULL, key, nonce);
    EVP_EncryptUpdate(ctx, NULL, &len, aad, sizeof(aad));
    EVP_EncryptUpdate(ctx, ct, &ct_len, pt, sizeof(pt));
    EVP_EncryptFinal_ex(ctx, ct + ct_len, &len);
    EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_GET_TAG, 16, ct + ct_len);

This also applies to EVP_Decrypt* functions.

Configure static IP address on Arch Linux

I use NetworkManager to manage network on Arch Linux:

# systemctl --type=service
  UNIT                               LOAD   ACTIVE SUB     DESCRIPTION                               >
  dbus.service                       loaded active running D-Bus System Message Bus                  >
  getty@tty1.service                 loaded active running Getty on tty1                             >
  kmod-static-nodes.service          loaded active exited  Create list of static device nodes for the>
  NetworkManager.service             loaded active running Network Manager
......

So I will leverage nmcli to configure static IP address:

(1) Check current connection:

# nmcli con show
NAME                UUID                                  TYPE      DEVICE
Wired connection 1  f1014ad6-4291-3c29-9b0d-b552a0c5eb02  ethernet  enp0s3

(2) Configure IP address, gateway and DNS:

# nmcli con modify f1014ad6-4291-3c29-9b0d-b552a0c5eb02 ipv4.addresses 192.168.1.124/24
# nmcli con modify f1014ad6-4291-3c29-9b0d-b552a0c5eb02 ipv4.gateway 192.168.1.1
# nmcli con modify f1014ad6-4291-3c29-9b0d-b552a0c5eb02 ipv4.dns "8.8.8.8"
# nmcli con modify f1014ad6-4291-3c29-9b0d-b552a0c5eb02 ipv4.method manual

(3) Active connection:

# nmcli con up f1014ad6-4291-3c29-9b0d-b552a0c5eb02

It is done! BTW, the connection file can be found:

# cat /etc/NetworkManager/system-connections/Wired\ connection\ 1.nmconnection
[connection]
id=Wired connection 1
uuid=f1014ad6-4291-3c29-9b0d-b552a0c5eb02
type=ethernet
autoconnect-priority=-999
interface-name=enp0s3
......

Reference:
How to Configure Network Connection Using ‘nmcli’ Tool.

Compile code using x86 intrinsics

Check following simple program:

# cat foo.c
#include <inttypes.h>
#include <stdio.h>
#include <x86intrin.h>

int main(void) {
    printf("%" PRIu32 "\n", _mm_crc32_u32(42, 2534474250));
    return 0;
}

Use gcc to compile with no options:

# gcc foo.c
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:37,
                 from /usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/x86intrin.h:32,
                 from foo.c:3:
foo.c: In function ‘main’:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/smmintrin.h:839:1: error: inlining failed in call to ‘always_inline’ ‘_mm_crc32_u32’: target specific option mismatch
  839 | _mm_crc32_u32 (unsigned int __C, unsigned int __V)
      | ^~~~~~~~~~~~~
foo.c:6:2: note: called from here
    6 |  printf("%" PRIu32 "\n", _mm_crc32_u32(42, 2534474250));
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Should use -march=native option here:

# gcc -march=native foo.c
#

Reference:
stackoverflow.

The registers’ values in core dump file on x86_64

On x86_64 platforms, some registers are “caller-saved” whilst others are “callee-saved” (refer AMD64 Calling Conventions for Linux / Mac OSX), or from Optimizing subroutines in assembly language, section 4.1, Register usage, “Registers that
can be used freely” (“caller-saved”) and “Registers that must be saved and restored” (“callee-saved”). When using gdb to display registers values, the values are relative to the selected stack frame (Refer Registers):

Normally, register values are relative to the selected stack frame (see Selecting a Frame). This means that you get the value that the register would contain if all stack frames farther in were exited and their saved registers restored. In order to see the true contents of hardware registers, you must select the innermost frame (with ‘frame 0’).

……

Also, the more “outer” the frame is you’re looking at, the more likely a call-clobbered register’s value is to be wrong, in the sense that it doesn’t actually represent the value the register had just before the call.

So it means when using gdb to analyse core dump file, you must pay attention to the registers values since it may not reflect correct values of current stack frame. Check following diagram:

You can see only RSPRIP and “callee-saved” registers are different among frame 07 and 8.