The tips of learning Linux kernel

As the Linux kernel has become one of the most gigantic and complex software project in the world, its complication scare many novices away. In this post, I will give some personal experience on how to learn Linux kernel, and hope these tips can offer some help to newcomers.

(1) Download vanilla kernel and install it.

Yes, I suggest you can find a physical machine, or if you really don’t have one at hand, virtual machine is also OK. Download the newest vanilla kernel from kernel.org, then build and install it. This process isn’t too hard and makes you conquer the fear of Linux kernel. After your first successful setting up of Linux kernel, and read the release version number from uname -r output:

# uname -r
4.6.0

I think this will enable you gain more confidence.

(2) Study the elementary skills of Linux kernel programming.

Looking back when you begin user-space C programming on *nix platform, you need to know allocating memory through malloc; opening file through fopen/open; using pthread library to construct concurrent program, and so on. Linux is nothing more than a platform, and you also need to study the rules of playing with it. For example, you should be familiar with how to tweak list (list.h); giving out memory should use kmalloc, etc. There are many classical books and tutorials elaborate these knowledge. Although some posts seem outdated (the version of kernel is still 2.6.x.), but they are also applicable to current.

(3) Dive into one module.

Once you get the basic expertise of Linux kernel programming, you should focus on one aspect of the kernel. If you are a full-time kernel programmer, congratulations! You should concentrate on your work area and try to be the expert of this domain. If kernel is just your hobby, you should select one module which you have great interest on. I.e., if you are curious about debugging, kdump should be your taste; if you pay close attention to dynamic tracing, BPF will be the right stuff which you want to find. After picking out the part you want to contribute, you should dig into the code and attempt to master every detail of it. You should also subscribe the related mailing list to acquaint the newest progress. The final goal is to check in meaningful patches for kernel, from a trivial typo to an enhanced feature. Think your code will run on millions of thousands of devices, it is really amazing!

(4) Others

When you meet an issue, you can try to get help from mailing list or forums. You can also try to take part in local community to recognize people in the same camp. Anyway, Endeavor to utilize all the resource you can find.

Happy hacking!

 

Install docker on Ubuntu 14.04

If you want to play docker on Ubuntu 14.04, please pay attention to the installation instruction: it is “apt-get install docker.io“, not “apt-get install docker“. You can find the difference between them by following command:

# apt-cache search docker
......
docker - System tray for KDE3/GNOME2 docklet applications
......
docker.io - Linux container runtime
......

OK! Since you have set up docker successfully, you can check its process now:

# ps -ef | grep docker
root       4715      1  0 13:22 ?        00:00:00 /usr/bin/docker -d
root       4857   4691  0 13:50 pts/0    00:00:00 grep --color=auto docker
# pstree -ps 4715
init(1)───docker(4715)─┬─{docker}(4717)
                       ├─{docker}(4722)
                       ├─{docker}(4723)
                       ├─{docker}(4724)
                       ├─{docker}(4734)
                       ├─{docker}(4754)
                       ├─{docker}(4762)
                       ├─{docker}(4769)
                       └─{docker}(4793)

You can use “service start docker” and “service stop docker” to start and stop docker daemon.

If your host runs behind proxy, you may meet problems when pulling image:

# docker run hell-world
Unable to find image 'hell-world:latest' locally
Pulling repository hell-world
FATA[0005] Get https://index.docker.io/v1/repositories/library/hell-world/images: x509: certificate is valid for FG3K6C3A15800021, not index.docker.io

The solution is add proxy configurations in /etc/default/docker:

......
# If you need Docker to use an HTTP proxy, it can also be specified here.
export http_proxy="http://web-proxy.corp.xxxxxx.com:8080/"
export https_proxy="https://web-proxy.corp.xxxxxx.com:8080/"
......

Then you can download images successfully:

# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from hello-world
d59cd4c39e50: Pull complete
f1d956dc5945: Pull complete
Digest: sha256:4f32210e234b4ad5cac92efacc0a3d602b02476c754f13d517e1ada048e5a8ba
Status: Downloaded newer image for hello-world:latest

Hello from Docker.
This message shows that your installation appears to be working correctly.
......

Now that all the preparations are ready, please enjoy playing docker!

 

A brief intro of delve

delve is a debugger developed in Golang and also dedicated to help trouble-shooting Golang programs (Home page is here). Though it is still in pre-1.0 release, I think it is stable enough for daily use. BTW, if you find some bugs, you can report it to developers and help to make delve more stronger! P.S., Albeit fmt.Print buddies are useful in most cases, I strongly recommend you try to usedelve to inspect the internal mechanism of your code, because it can help you know Golang deeper, not just superficial stuff.

Installing delve is very simple: taking Linux platform as an example, it is no different from setting up other Golang projects, just “go get” is enough:

go get github.com/derekparker/delve/cmd/dlv 

Now, in $GOPATH/bin, there is an extra dlv executable binary file (Notice: the project is named delve, while the executable file is calleddlv. I even made a foolish mistake when began to use it).Run dlv command, and it will show you a detailed manual of delve:

# dlv
Delve is a source level debugger for Go programs.

Delve enables you to interact with your program by controlling the execution of the process,
evaluating variables, and providing information of thread / goroutine state, CPU register state and more.

The goal of this tool is to provide a simple yet powerful interface for debugging Go programs.

Usage:
  dlv [command]

Available Commands:
  version     Prints version.
  run         Deprecated command. Use 'debug' instead.
  debug       Compile and begin debugging program.
......

Let’s check this artificial Hello.go program:

package main

import "fmt"

func main() {
        var s []byte
        s = append(s, []byte("Hello, Debugging!")...)
        fmt.Println(string(s))
}

Use delve to debug it:

# dlv debug Hello.go
Type 'help' for list of commands.
(dlv) help
The following commands are available:
    help (alias: h) ------------- Prints the help message.
    break (alias: b) ------------ Sets a breakpoint.
    trace (alias: t) ------------ Set tracepoint.
    restart (alias: r) ---------- Restart process.
    continue (alias: c) --------- Run until breakpoint or program termination.
    step (alias: s) ------------- Single step through program.
    step-instruction (alias: si)  Single step a single cpu instruction.
    next (alias: n) ------------- Step over to next source line.
    threads --------------------- Print out info for every traced thread.
    thread (alias: tr) ---------- Switch to the specified thread.
......

If you are familiar with gdb, you will find the commands are very similar, and I promise you can master delve soon.

An interesting thing is that the delve doesn’t provide start command which gdb offers, so you should try to set breakpoints first, then run continue command:

(dlv) b Hello.go:8
Breakpoint 1 set at 0x4011ea for main.main() ./Hello.go:8
(dlv) c
> main.main() ./Hello.go:8 (hits goroutine(1):1 total:1) (PC: 0x4011ea)
     3: import "fmt"
     4:
     5: func main() {
     6:         var s []byte
     7:         s = append(s, []byte("Hello, Debugging!")...)
=>   8:         fmt.Println(string(s))
     9: }
(dlv) p s
[]uint8 len: 17, cap: 32, [72,101,108,108,111,44,32,68,101,98,117,103,103,105,110,103,33]
(dlv) goroutines
[4 goroutines]
* Goroutine 1 - User: ./Hello.go:8 main.main (0x4011ea)
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 4 - User: /usr/local/go/src/runtime/mfinal.go:144 runtime.runfinq (0x413f80)

Cool! Isn’t it? Now You can observe almost everything you want to know about your program.

Happy Debugging! Happy delving!

 

Watch out for the permission of “tmp” directory during installation of SAP HANA

My colleague creates a SLES11SP3 docker image, and wants to install SAP HANA database in the SLES11SP3 container. But the error occurs during installation:

# ./hdbinst
......
Creating instance...
  hdbparam: Working configuration directory:  "/hana/shared/H00/global/hdb/custom/config"
  hdbnsutil: creating persistence ...
  hdbnsutil: writing initial topology...
  hdbnsutil: writing initial license: status check = 2
Installation failed
  error installing
    Cannot create Instance
      Cannot start sapstartsrv
        Waiting for sapstartsrv failed: timeout reached (120)
        Waiting for sapstartsrv failed: timeout reached (120)

Log file written to '/var/tmp/hdb_H00_install_2016-05-04_19.18.27/hdbinst.log' on host 'fe769d9f6bae'.

Check the /var/tmp/hdb_H00_install_2016-05-04_19.18.27/hdbinst.log:

19:22:42.406 - INFO:     Starting service
19:22:42.406 - INFO:       Starting external program /usr/sap/H00/HDB00/exe/sapstartsrv
19:22:42.406 - INFO:         Command line is: /usr/sap/H00/HDB00/exe/sapstartsrv pf=/hana/shared/H00/profile/H00_HDB00_fe769d9f6bae -D -u h00adm
19:22:42.438 - INFO:         Output line 1: Impromptu CCC initialization by 'rscpCInit'.
19:22:42.438 - INFO:         Output line 2:   See SAP note 1266393.
19:22:42.726 - INFO:         Output line 3: Impromptu CCC initialization by 'rscpCInit'.
19:22:42.726 - INFO:         Output line 4:   See SAP note 1266393.
19:22:42.857 - INFO:         Program terminated with exit code 0
19:22:42.857 - INFO:     Waiting for sapstartsv...
19:22:42.862 - INFO:       sapstartsrv is not running: Net::HTTPS: connect: Connection refused
19:22:43.864 - INFO:       sapstartsrv is not running: Net::HTTPS: connect: Connection refused
......
19:24:43.091 - ERR :       Waiting for sapstartsrv failed: timeout reached (120)
19:24:43.092 - INFO:     Checking unix domain socket
19:24:43.093 - ERR :       Cannot establish http connection to unix domain socket '/tmp/.sapstream50013' (No such file or directory)
19:24:43.093 - INFO:       sapstartsrv is not running: connect: No such file or directory
19:24:44.094 - ERR :       Cannot establish http connection to unix domain socket '/tmp/.sapstream50013' (No such file or directory)
19:24:44.094 - INFO:       sapstartsrv is not running: connect: No such file or directory
......

After tough debugging, the reason is the tmp folder doesn’t grant write permissions to users except root:

drwxr-xr-x    2 root root  4096 May  4 19:17 tmp

Change the permission of /tmp:

# chmod a+w /tmp

Then the installation is successful!

 

Fix “ORA-03114: not connected to ORACLE” error

I utilize docker-oracle12c to run Oracle in docker, and bind specified CPU and memory:

docker run -d -it --cpuset-cpus=xx-xx,xx-xxx  --cpuset-mems=x,x ... 

All containers run OK but one Oracle database is always created failed, and the error log is:

ORA-03114: not connected to ORACLE

After tough debugging, the reason is the memory on specified NUMA node is not enough:

# numactl -H
......
node 2 size: 786432 MB
node 2 free: xxxxx MB

node 3 size: 786432 MB
node 3 free: xxxxx MB

The solution is disable HugePages temporarily:

# cat /etc/sysctl.conf
......
vm.nr_hugepages=0
......
# sysctl -p

After creating database, enable HugePages again:

# cat /etc/sysctl.conf
......
vm.nr_hugepages=xxxxxx
......
# sysctl -p