我的站点

一个系统软件工程师的随手涂鸦

Tag: hardware (Page 1 of 3)

Linux系统上如何查看进程(线程)所运行的CPU

本文介绍如何在Linux系统上查看某个进程(线程)所运行的CPU,但在此之前我们需要弄清楚两个基本概念:

(1)Linux操作系统上的进程和线程没有本质区别,在内核看来都是一个task。属于同一个进程的各个线程共享某些资源,每一个线程都有一个ID,而“主线程”的线程ID同进程ID,也就是我们常说的PID是一样的。

(2)使用lscpu命令,可以得到当前系统CPU的数量:

$ lscpu
......
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
......

系统有2个物理CPUSocket(s): 2),每个CPU6coreCore(s) per socket: 6),而每个core又有2hardware threadThread(s) per core: 2)。所以整个系统上一共有2X6X2=24CPU(s):24)个逻辑CPU,也就是实际运行程序的CPU

使用htop命令可以得到进程(线程)所运行的CPU信息,但是htop默认情况下不会显示这一信息:

1
开启方法如下:
(1)启动htop后,按F2Setup):

2
(2) Setup中选择Columns,然后在Available Columns中选择PROCESSOR - ID of the CPU the process last executed, 接下来按F5Add)和F10Done)即可:

3

现在htop就会显示CPU的相关信息了。需要注意的是,其实htop显示的只是“进程(线程)之前所运行的CPU”,而不是“进程(线程)当前所运行的CPU”,因为有可能在htop显示的同时,操作系统已经把进程(线程)调度到其它CPU上运行了。

下面是一个运行时会包含4个线程的程序:

#include <omp.h>

int main(void){

        #pragma omp parallel num_threads(4)
        for(;;)
        {
        }

        return 0;
}

编译并运行代码:

$ gcc -fopenmp thread.c
$ ./a.out &
[1] 17235

使用htop命令可以得到各个线程ID,以及在哪个CPU上运行:

4

参考资料:
How to find out which CPU core a process is running on
闲侃CPU(一)

Profiling CPU使用

本文内容取自于《Systems Performance: Enterprise and the Cloud》

Profiling CPU的方法是通过对CPU状态进行周期性地采样,然后进行分析。包含5个步骤:

1. Select the type of profile data to capture, and the rate.
2. Begin sampling at a timed interval.
3. Wait while the activity of interest occurs.
4. End sampling and collect sample data.
5. Process the data.

CPU采样数据基于下面两个因素:

a. User level, kernel level, or both
b. Function and offset (program-counter-based), function only, partial stack trace, or full stack trace

抓取user levelkernel level所有的函数调用栈固然可以完整地得到CPUprofile,但这样会产生太多的数据。因此通常只采样user levelkernel level部分函数调用栈就可以了,有时可能仅需要保留函数的名字。

下面是一个使用DTraceCPU采样的例子:

 # dtrace -qn 'profile-997 /arg1/ {@[execname, ufunc(arg1)] = count();} tick-10s{exit(0)}'

 top                                                 libc.so.7`0x801154fec                                             1
 top                                                 libc.so.7`0x8011e5f28                                             1
 top                                                 libc.so.7`0x8011f18a9                                             1

 

Linux kernel 笔记 (60)——scheduling domain

NUMA系统上,由于不同CPU直接访问本地内存和远端内存的时间相差很大,所以更好地调度算法就显得很重要。Linux kernel引入了scheduling domain的概念。可以参看下面例子:

[root@localhost ~]# cd /proc/sys/kernel/sched_domain/
[root@localhost sched_domain]# ls
cpu0  cpu1  cpu2  cpu3  cpu4  cpu5  cpu6  cpu7
[root@localhost sched_domain]# ls -alt *
cpu0:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu1:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu2:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu3:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu4:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu5:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu6:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu7:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

/proc/sys/kernel/sched_domain/目录下每个CPU都有一个自己的目录,并且每个CPU目录下都有和自己相关的domain信息。

multi-level系统中,也拥有multi-levelscheduling domain(内核中结构体是struct sched_domain)。每个scheduling domain包含一组共享属性和调度策略的CPU;每个scheduling domain包含至少一个或多个CPU group(内核中结构体是struct sched_group),每个CPU group会被scheduling domain看做一个独立的单元。

scheduling domain的核心代码位于kernel\sched\core.c中,关于/proc/sys/kernel/sched_domain/cpu$/domain$中各个文件的含义,都可以在这里找到。

NUMA系统上,如果一个node利用率非常高,比如高于90%,而另一个node利用率可能只有60%~70%,这时可以尝试disable wakeup affinity

参考资料:
Scheduling domains
sched-domains.txt
Does domain0 in /proc/sys/kernel/sched_domain/cpu$ refer top-level domain in the system?
How to understan /proc/sys/kernel/sched_domain/cpu$/domain$/flags?

UMA vs NUMA

本文选自Non-Uniform Memory Access (NUMA)

流行的并行体系结构计算机分为以下两种模型:

Shared Memory Architecture

1

所有处理器共享相同的内存地址空间。这种体系结构主要解决的问题是关于cache一致性。

Distributed Memory Architecture

2

每个处理器拥有自己的本地内存,不存在跨处理器内存地址映射问题。因此也没有cache一致性问题。

Shared Memory Architecture又分为以下两种:

UMA(Uniform Memory Access)

12

所有处理器都一样,并且对所有的内存区域有相同的访问时间。

NUMA(Non-Uniform Memory Access)

12

所有处理器都一样,但是每个处理器拥有自己的本地内存。与Distributed Memory Architecture不同的是,不同处理器会存在内存地址映射,并且访问本地内存和其它处理器内存的时间会有不同。

 

CPU,GPU和GPGPU的区别

下面摘自Why are we still using CPUs instead of GPUs?

GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations.

GPU(Graphics Processing Unit)core数量比CPU的多,它是显卡(video card)的CPU。由于它的指令集不如CPU强大,但是core数量多,所以适合做一些相对简单的,计算密集性的运算:比如图像处理等等。GPGPU(General Purpose Graphics Processing Unit)则不仅仅只做图像处理的相关运算,也会做一些一般性的运算。

更新:计算机屏幕上的图像是如何显示出来的?这个帖子给了很好的解释:

The GPU has a series of registers that the BIOS maps. These permit the CPU to access the GPU’s memory and instruct the GPU to perform operations. The CPU plugs values into those registers to map some of the GPU’s memory so that the CPU can access it. Then it loads instructions into that memory. It then writes a value to a register that tells the GPU to execute the instructions the CPU loaded into its memory.

The information consists of the software that the GPU needs to run. This software is bundled with the driver and then the driver handles the responsibility split between the CPU and GPU (by running portions of its code on both devices).

The driver then manages a series of “windows” into GPU memory that the CPU can read from and write to. Generally, the access pattern involves the CPU writing instructions or information into mapped GPU memory and then instructing the GPU, through a register, to execute those instruction or process that information. The information includes shader logic, textures, and so on.

简单地讲,CPU会把要显示的图像和指令存到显卡(video card)的register中,然后通知GPU(显卡上的CPU)去执行画图命令。 此外,wiki百科上的这张图形象地描述了整个过程:

CUDA_processing_flow_(En)

参考资料:

 

为什么单CPU没有“memory reorder”问题?

这篇文章提到单核系统上不会有“memory reorder”问题:

Two threads being timesliced on a single CPU core won’t run into a reordering problem. A single core always knows about its own reordering and will properly resolve all its own memory accesses. Multiple cores however operate independently in this regard and thus won’t really know about each other’s reordering.

仍以Memory Reordering Caught in the Act的图为例:

marked-example2

reordered

其实可以这样理解:单核CPU系统上,多个线程实际是交替顺序执行的,无法真正做到“并行”。无论两个线程或多个线程的代码如何乱序执行,CPU知道它们原本应该的执行顺序,一旦这种乱序会改变程序的运行结果,CPU会做出相应的“补救”措施,比如丢弃结果,重新执行等等,来保证代码会按照应该执行的顺序执行。所以“memory reorder”问题不会在单核系统上出现。

参考资料:
Why doesn’t the instruction reorder issue occur on a single CPU core?
preempt_disable的问题

 

“Memory order”分析笔记

以下图片摘自Memory Reordering Caught in the Act,它描述了memory reorder问题:
代码:

marked-example2

实际执行:

reordered

为什么会发生memory reorder?一言以蔽之,因为性能。

在支持memory reorder的系统上,有以下3order需要考虑:

Program order: the order in which the memory operations are specified in the code running on a given CPU.

Execution order: the order in which the individual memory-reference instructions are executed on a given CPU. The execution order can differ from program order due to both compiler and CPU-implementation optimizations.

Perceived order: the order in which a given CPU perceives its and other CPUs’ memory operations. The perceived order can differ from the execution order due to caching, interconnect and memory-system optimizations. Different CPUs might well perceive the same memory operations as occurring in different orders.

Program order是代码里访问内存的顺序。Execution order是代码在CPU上实际执行的顺序,由于编译器优化和CPU的实现,实际指令执行的顺序有可能和代码顺序不一样。Perceived orderCPU用来“感知”自己或者其它CPU对内存操作,由于cachinginterconnect等原因,这个顺序有可能与代码实际的execution order不同。

 

关于memory order的总结:

A given CPU always perceives its own memory operations as occurring in program order. That is, memory-reordering issues arise only when a CPU is observing other CPUs’ memory operations.

An operation is reordered with a store only if the operation accesses a different location than does the store.

Aligned simple loads and stores are atomic.

Linux-kernel synchronization primitives contain any needed memory barriers, which is a good reason to use these primitives.

参考资料:
Memory Reordering Caught in the Act
Memory Ordering in Modern Microprocessors, Part I

*NIX & Hacking —— 第10期

做一本我感兴趣的杂志,就这么简单!

Git

Some of git internals

Kernel

A History of Linux Kernel Module Signing
Linux Kernel Hacking Talk

KVM

Using the KVM API

Tracing

Linux Networking, Tracing and IO Visor, a New Systems Performance Tool for a Distributed World

Unix

Syscall table reference tool for several arch(Linux x86/64,ARM,IA64,Winx86/64,OSX BSD & more)

Vim

A vim Tutorial and Primer

X86

A Primer on Disassembling Function Calls and Understanding Stack Frames in x86
Advanced x86: Introduction to BIOS & SMM

Easter Egg

Awesome Open Source Documents
Closing a door

“/dev/tty”,“/dev/console”和“/dev/tty0”的区别

这篇笔记来自于stackoverflow的一篇帖子,答案如下:

From the documentation(http://www.kernel.org/doc/Documentation/devices.txt):

    /dev/tty        Current TTY device
    /dev/console    System console
    /dev/tty0       Current virtual console

In the good old days /dev/console was System Administrator console. And TTYs were users' serial devices attached to a server.
Now /dev/console and /dev/tty0 represent current display and usually are the same. You can override it for example by adding console=ttyS0 to grub.conf. After that your /dev/tty0 is a monitor and /dev/console is /dev/ttyS0.

An exercise to show the difference between /dev/tty and /dev/tty0:

Switch to the 2nd console by pressing Ctrl+Alt+F2. Login as root. Type "sleep 5; echo tty0 > /dev/tty0". Press Enter and switch to the 3rd console by pressing Alt+F3.
Now switch back to the 2nd console by pressing Alt+F2. Type "sleep 5; echo tty > /dev/tty", press Enter and switch to the 3rd console.

You can see that "tty" is the console where process starts, and "tty0" is a always current console.

早些时候,/dev/console是系统管理员控制台,而TTYs则代表用户连接服务器的串行设备。而现在,/dev/console/dev/tty0均指当前的显示设备,并且通常情况下是一样的。你可以修改/dev/console所关联的设备。举个例子,在grub.conf中加入console=ttyS0。则现在,/dev/tty0所关联的是显示器,而dev/console则关联/dev/ttyS0

/dev/tty是当前进程控制的tty设备,而tty0则是当前的控制台。当你在一个终端执行“sleep 5; echo tty0 > /dev/tty0”命令后,切换到其它终端,则tty0会在你切换后的终端显示。而执行“sleep 5; echo tty > /dev/tty”命令后,无论切换到那个终端,tty始终会在输入命令的终端显示。

 

*NIX & Hacking —— 第8期

做一本我感兴趣的杂志,就这么简单!

C

C Programming Substance Guidelines
The International Obfuscated C Code Contest

Docker

A Beginners Guide to Docker and Containers

Gcc

About GCC printf optimization

Git

Git Cheat Sheet
Git Tips

Hardware

Interfacing the Serial / RS232 Port

Kernel

Porting Linux to a new processor architecture, part 1: The basics
The newbie’s guide to hacking the Linux kernel
Writing a Linux Kernel Module — Part 1: Introduction

Lua

Embedding LuaJIT in 30 minutes (or so)

Network

Mobile TCP optimization – lessons learned in production

Unix

The First Port Of Unix

Page 1 of 3

Powered by WordPress & Theme by Anders Norén