Process large pcap file

To process large pcap file, usually it is better to split it into small chunks first, then process every chunk in parallel. I implement a simple shell script to do it:

#!/bin/sh

input_pcap=input.pcap
output_pcap=./pcap/frag.pcap
spilt_size=1000
output_index=1
loop_count=10
exit_flag=0

command() {
    echo "$1" "$2" > log"$2"
}

tcpdump -r ${input_pcap} -w ${output_pcap} -C ${spilt_size}

command ${output_pcap}

while :
do
    loop_index=0
    while test ${loop_index} -lt ${loop_count}
    do
        if test -e ${output_pcap}${output_index}
        then
            command ${output_pcap} ${output_index} &
            output_index=$((output_index + 1))
            loop_index=$((loop_index + 1))
        else
            exit_flag=1
            break
        fi
    done
    wait

    if test ${exit_flag} -eq 1
    then
        exit 0
    fi
done

First of all, split input pcap file into 1GB chunks. Then launch 10 processes to crunch data (in above example, just simple output). Definitely you can customize it.

BTW, the code can be downloaded here.

Search IP fragmentation pcap files

The following shell script searches IP fragment pcap files in a folder:

#!/bin/sh

for file in ./*.pcap
do
    frag_packets=$(tshark -r $file -Y "ip.flags.mf==1 || ip.frag_offset>0")
    if [[ "${frag_packets}" != "" ]]
    then
        echo "$file"
    fi
done

We should pay attention to -Y option which is for display filters; if what you want is capture filters, -f is the right choice.

P.S., the code can be downloaded here.

Handle IP fragmentation pcap file

Wireshark has a handy feature which can follow TCP stream, but sometimes, it may not work as you expect. Check following diagram:

The IP packet carries a GTP payload, but since it is fragmented, and only first one is captured, so Wireshark won’t dissect it, and if you try follow TCP stream of this session, this packet will be ignored.

stripe is a cool tool which can peel away encapsulating headers. But from my testing, you should add -f option, otherwise the IP fragmented packet which I mentioned previously will be skipped, but even with this option, stripe will not remove the headers. So I write a simple program which just removes headers for specified packet (The code is here for reference).

Reassemble packets for pcap file

In TCP protocol, because MSS limitation, sometimes one endpoint needs to split one TCP packet into multiple packets and send them. Today, I met a case which requires to reassemble them into one.

Firstly, I used Wireshark to “Hex Dump” first need-reassemble packet:

0000   18 cf 24 4c 71 4b 54 89 98 76 b8 30 08 00 45 00
......

Modify the length in IP header, append remaining TCP payload, then used colrm to remove offset:

# colrm 1 4 < data > data.txt

Used awk to prepend 0x and append , for every value:

awk '{ for(i = 1; i <= NF; i++) {$i="0x"$i","} print}' data.txt

Added the variable definition for array:

const u_char new_packet_4[] = {
    0x18, 0xcf, ......
    .......
}

Lastly, write a small program to insert new packet 4 and remove original packet 4 and 5, and code is here (Don’t forget to modify the header of packet 4).

The difference of loopback packets on Linux and OpenBSD

Capture the packets on loopback network card on Linux:

# tcpdump -i lo -w lo.pcap port 33333
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
......

Download it onto Windows and use wireshark to analyze it:

1

We can see every packet conforms to standard ethernet format.

Capture lookback packets on OpenBSD:

# tcpdump -i lo0 -w lo.pcap port 33333
tcpdump: listening on lo0, link-type LOOP
......

Also download it onto Windows and open it with wireshark:

2

The wireshark just recognizes the packet as “Raw IP” format, but can’t show details.

After referring discussion in Wireshark mailing list, I know it is related to network link-layer header type0x0C stands for “Raw IP”:

3

I modified the 0x0C to 0x6C, which means “OpenBSD loopback”:

4

Now the packets can be decoded successfully:

5

P.S., I also started a discussion about this issue in mailing list.

Update: I write a script to do this conversion.