PCI寻址笔记

Linux现在支持多个PCI domain,每个PCI domain管理256bus2^8),每个bus上可以挂载322^5)个设备,每个设备最多支持82^3)个function。同一PCI bus上的设备共享memory locationI/O port的地址空间,而configuration register则不是。每个PCI设备的configuration register256个字节。

执行“lspci -D”命令显示当前系统的PCI设备信息。格式为:domain:bus:dev:func

[root@localhost ~]# lspci -D
0000:00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
0000:00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
0000:00:01.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
0000:00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
0000:00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
0000:00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
0000:00:06.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
0000:00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
0000:00:0b.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller
0000:00:0d.0 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)

“Understanding Caching”笔记

本文是“Understanding Caching”的笔记:

(1)什么是cache line

A cache line is the smallest unit of memory that can be transferred to or from a cache. The essential elements that quantify a cache are called the read and write line widths. These signify the minimum amount of data the cache must read or write from the memory or cache below it. Frequently, these quantities are the same, so caches often are quantified simply by the line width. Even if they differ, the longest width usually is called the line width.

(2)inclusiveexclusive

A multilevel cache can be either inclusive or exclusive. Exclusive means a particular cache line may be present in exactly one of the cache levels and no more than one. Inclusive means the line may be present simultaneously in more than one level of cache. Nothing prevents the line widths from being different in differing cache levels.

(3)Write throughwrite back

Write through means the cache may store a copy of the data, but the write must be completed at the next level down before it can be signaled as complete to the layer above. Write back means a write may be considered complete as soon as the data is stored in the cache. For a write back cache, as long as the written data is not transmitted, the cache line is considered dirty, because it ultimately must be written out to the level below.

(4)Coherency

A cache line is termed coherent when the data in the line is identical to the data stored in the main memory being cached. If this is not true, the cache line is termed incoherent.

没有coherency带来的两个问题:
a)所有种类cache都有可能发生(stale data):主存数据更新了,但是cache数据不是最新的。因此会导致读错误。如下图所示:

7105f1.inline

这是一个暂时的错误,因为正确的数据在主存中,只需cache重新读一下即可。

b)这种错误只发生在write back cache中:主存和cache里分别更新了数据,现在要使二者数据一致。由于每次cache要更新一个cache line,所以必然要导致更新的数据出现不一致。如下图所示:

7105f2.inline

DMA Remapping —— Domain

Domain是平台上一个抽象的隔离环境,并且被分配了一块主机物理内存。I/O设备作为domain的指定设备(assigned device),可以访问分配给domain的内存。在虚拟化环境下,每个虚拟机都会被当做一个独立的domain

I/O设备分配到指定的domain,并只能访问指定domain所拥有的物理资源。依赖于具体的软件模型,DMA请求的地址可以是虚拟机,也就是domainGuest-Physical AddressGPA),或是由PASID指定进程定义的application Virtual AddressVA),或是由软件定义的抽象的I/O virtual addressIOVA)。不管哪种情况,DMA Remapping硬件都是把相应的地址翻译成Host-Physical AddressHPA)。

DMA Remapping —— DMA请求类型

Remapping硬件把来自设备的DMA内存访问请求分成两种类型:

(1)Requests without address-space-identifier:这是来自endpoint device的正常的内存访问请求,包括访问类型(读,写,原子访问),DMA地址和大小,发起请求的设备标示;

(2)Requests with address-space-identifier:这种内存访问请求会包含额外的信息:表明支持virtual memoryendpoint devicetargeted process address space。除了常规请求信息外,还有process address space identifier (PASID),扩展属性:Execute-Requested (ER) flag
(to indicate reads that are instruction fetches)
Privileged-mode-Requested (PR) flag (to distinguish user versus supervisor access))等等。

参考资料:
Intel ® Virtualization Technology for Directed I/O

*NIX & Hacking —— 第6期

做一本我感兴趣的杂志,就这么简单!

CPU

Is there a way to dump a CPU’s CPUID information?

DTrace

Reducing RAM usage in pkgin

Git

365GIT
Deal with git am failures
Ry’s Git Tutorial

Golang

cgasm
Embedding Lua in Go
GopherCon 2015
GoWork
go-torch

Kernel

Getting into Linux Kernel Development
KernelDebuggingTricks
vmlinuz Definition

Python

Functional Programming in Python
Fun with BPF, or, shutting down a TCP listening socket the hard way

Unix

On Hurd, Linux and the (mis)adventures of cross-compiling a GNU Hurd toolchain
Unix as IDE: Introduction

Easter egg

What are some good computer tricks that are not commonly known?
Why aren’t there a lot of old programmers at software companies?
x86 Exploitation 101: “House of Lore” – People and traditions

PCI总线相关术语

本文列举PCI总线的相关术语:

Agent:可以操作总线的设备(device)或实体(entity)。
Master:可以发起一次总线事务(transaction)的agent
Transaction:在PCI上下文中,一次transaction包含一次address phase和一次或多次data phase。也被称为burst transfer
Initiator:获得总线控制权的master,也就是发起transactionagent
Target:在address phase认识到自己addressagentTarget会响应transaction
Central Resource:主机系统上提供总线支持(如产生CLK信号等等),总线仲裁等等功能的元素。
Latency:在一次transaction中,两次状态转换之间消耗的时钟周期。Latency用来度量一个agent响应另外一个agent请求所花的时间,因此是性能的一个度量指标。

 

UP VS SMP

UP(Uni-Processor):系统只有一个处理器单元,即单核CPU系统。

SMP(Symmetric Multi-Processors):系统有多个处理器单元。各个处理器之间共享总线,内存等等。在操作系统看来,各个处理器之间没有区别。

要注意,这里提到的“处理器单元”是指“logic CPU”,而不是“physical CPU”。举个例子,如果一个“physical CPU”包含2core,并且一个core包含2hardware thread。则一个“处理器单元”就是一个hardware thread

*NIX & Hacking —— 第5期

做一本我感兴趣的杂志,就这么简单!

C

Integer Overflow

Golang

breaking out of a select statement when all channels are closed

Kernel

Feature: High Memory In The Linux Kernel
What’s Next for Containers? User Namespaces

Network

Recent advancements in Linux TCP congestion control
SO_REUSEPORT – Scaling Techniques for Servers with High Connection Rates

Rust

Rust in Detail: Writing Scalable Chat Service from Scratch

Shell

fish shell

Easter egg

Announcing the first Art of Computer Programming eBooks
Goodbye Moto
Intel DDIO, LLC cache, buffer alignment, prefetching, shared locks and packet rates
Jokes
List of freely available Programming Books
Mandelbrot Set with SIMD Intrinsics

DMA Remapping简介

DMA(Direct Nemory Access) Remapping是一种用来限制硬件设备只能使用DMA访问预先分配的内存区域(domain or physical memory regions)的技术。DMA Remapping会把DMA请求里的地址转化成正确的物理内存地址,同时还会检查设备是否允许访问指定的内存。请看下图:

dma_remapping

虚拟机的操作系统(Guest OS)所提供的物理地址称为Guest Physical Address (GPA) ,它不一定与实际的物理地址一致,也就是Host Physical Address (HPA),而DMA技术则要求访问真实的物理地址。DMA Remapping技术可以把Guest OS提供的GPA转化成HPA,然后数据就可以直接发送到Guest OS的缓冲区(buffer)了。

主机平台(host platform)可以支持一个或多个DMA remapping硬件单元(hardware unit),每个硬件单元remapping从它控制的作用域内发出的DMA remapping请求。主机固件(BIOS)需要把每个DMA remapping硬件单元报给系统软件(比如操作系统)。

DMA remapping硬件单元使用source-id来标示发出DMA请求的设备。对一个PCI Express设备,source-id就是resource identifier

 ________________________________________________________
|____Bus(8 bits)_________|__Device(5 bits)|_func(3 bits)_|

Root-entry作为最顶层的数据结构,会把某特定PCI总线上的设备映射到对应的domain。一个context-entry会把一个地址总线上的某个具体设备映射到对应的domain。参考下图:

root-entry

每个root-entry会有一个指向一个context-entry的表的指针,而每个context-entry则会包含如何用来进行地址转化的结构。

Linux kernel 笔记 (1) ——CPU在做什么?

In fact, in Linux, we can generalize that each processor is doing exactly one of three things at any given moment:
a) In user-space, executing user code in a process
b) In kernel-space, in process context, executing on behalf of a specific process
c) In kernel-space, in interrupt context, not associated with a process, handling an
interrupt

Linux中,任何时候,CPU都在做下面三件事中的一件:

a)运行进程的用户空间代码;
b)运行进程的内核空间代码;
c)处理中断(也是工作在内核空间,但不与任何进程关联)。