The Overview
At the request of one of my faithful readers in my original article on packet capture with libpcap, I decided to post a guide to offline packet capture processing. Why is this useful? Because popular packet capture programs like Wireshark or tcpdump can save captures to files that can be processed later. You can then apply your specialized code to these previously captured packets.
The Code
NOTE: This program makes use of the http.cap Wireshark packet capture sample.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
The Breakdown
1 2 3 4 5 6 7 8 9 10 11 |
|
These are the includes and declarations necessary for reading the packet captures. The first 2 are self explanatory, the following 5 includes might be less so. These are used for parsing and transforming data found in packets. The functions and structures included in these headers are integral to packet processing and are available natively on Linux systems (Ubuntu in this case).
1 2 3 4 5 6 7 8 9 10 |
|
After entering the main execution, we go straight to opening our target packet capture file, http.cap. To do this we use pcap_open_offline() and give it the capture filename and an error buffer as parameters. If all goes well, we get a pcap_t descriptor returned. If not, check the error buffer for details.
1 2 3 4 5 6 7 8 9 10 |
|
Just like in a live packet capture, we use pcap_loop() to set up a handler callback for each packet to be processed. We give it the following:
- descr - the descriptor we just created with pcap_open_offline()
- count - 0 (zero), to indicate there is no limit to the number of packets we want to process
- callback - The name of our packet handler function
- userdata - NULL, to indicate that we will be passing no user defined data to the callack
When the entire file has been processed, we will print the “capture complete” message and then exit.
1 2 3 4 5 6 7 8 9 10 |
|
Here we define the packet handler callback, as per the libpcap specifications. For more details, check out my original post on packet capture. The following declarations define variables that will help us parse meaningful data out of the packets. These include packet header data, IP addresses, source/destination ports, and payload data.
There’s LOTS more useful information to be analyzed from the average packet. Check out the structure defined in the network includes at the beginning of the code for more details. Actually, it would probably be a hell of a lot easier to just download and fire up Wireshark. It will give you a greater appreciation for what can be learned from a packet.
1 2 3 4 5 |
|
I’m not going to delve to deeply into the specifics of network protocols, as that could be a post… check that… that could be a book of its own. Basically here we are parsing the ethernet header from the packet and using its type to determine if it is an IP packet or not. We use the ntohs() to convert the type from network byte order to host byte order.
If it is an IP packet, we parse out the IP header and use the inet_ntop() function to convert the IP addresses found in the IP header into a human readable format (i.e., xxx.xxx.xxx.xxx). In a lot of older examples you’ll see the use of inet_ntoa(), but this is not thread-safe and is deprecated.
1 2 3 4 5 6 |
|
Similar to above, I use the IP header to determine if this is a TCP packet (they all should be since its a HTTP capture) and then parse out the TCP header. With the TCP header we can then determine the source and destination ports, with ntohs() again, and then determine the contents of the packet payload.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
In the final step of the packet handler we display the results of our rudimentary analysis. First we iterate through the bytes of the payload and save it in a format that is human friendly. If you try to print it out with the non-printable characters in there you will get some very messy results in your console. After this cleanup we simply output the packet data we have extracted and display it in the console.
The Summary
So now that you can process packets offline, what do you want to do with them? I don’t know about you, but aside from obvious applications to network analysis, I’d like to use this data for trending, visualization, or even generative art and sound. But then again I’m weird. What are you gonna do?