Using tshark to Watch and Inspect Network Traffic
Most of you probably have heard of Wireshark, a very popular and capable network protocol analyzer. What you may not know is that there exists a console version of Wireshark called tshark. The two main advantages of tshark are that it can be used in scripts and on a remote computer through an SSH connection. Its main disadvantage is that it does not have a GUI, which can be really handy when you have to search lots of network data.
You can get tshark either from its Web site and compile it yourself or from your Linux distribution as a precompiled package. The second way is quicker and simpler. To install tshark on a Debian 7 system, you just have to run the following command as root:
# apt-get install tshark
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libc-ares2 libcap2-bin libpam-cap libsmi2ldbl
libwireshark-data libwireshark2
libwiretap2 libwsutil2 wireshark-common
Suggested packages:
libcap-dev snmp-mibs-downloader wireshark-doc
The following NEW packages will be installed:
libc-ares2 libcap2-bin libpam-cap libsmi2ldbl
libwireshark-data libwireshark2
libwiretap2 libwsutil2 tshark wireshark-common
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
Need to get 15.6 MB of archives.
After this operation, 65.7 MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
...
To find out whether tshark is installed properly, as well as its version, execute this command:
$ tshark -v
TShark 1.8.2
...
Note: this article assumes that you already are familiar with network data, TCP/IP, packet capturing and maybe Wireshark, and that you want to know more about tshark.
About tsharktshark can do anything Wireshark can do, provided that it does not require a GUI. It also can be used as a replacement for tcpdump, which used to be the industry standard for network data capturing. Apart from the capturing part, where both tools are equivalent, tshark is more powerful than tcpdump; therefore, if you want to learn just one tool, tshark should be your choice.
As you can imagine, tshark has many command-line options. Refer to its man page for the full list.
Capturing Network Traffic Using tshark
The first command you should run is sudo tshark -D
to get a list
of the available network interfaces:
$ sudo tshark -D
1. eth0
2. nflog (Linux netfilter log (NFLOG) interface)
3. any (Pseudo-device that captures on all interfaces)
4. lo
If you run tshark as a normal user, you most likely will get the following output, because normal users do not have direct access to network interface devices:
$ tshark -D
tshark: There are no interfaces on which a capture can be done
The simplest way of capturing data is by running
tshark
without any
parameters, which will display all data on screen. You can stop data
capturing by pressing Ctrl-C.
The output will scroll very fast on a busy network, so it won't be helpful at all. Older computers could not keep up with a busy network, so programs like tshark and tcpdump used to drop network packets. As modern computers are pretty powerful, this is no longer an issue.
Saving and Reading Network Data Using Files
The single-most useful command-line parameter is -w
, followed by a
filename. This parameter allows you to save network data to a file in
order to process it later. The following tshark command captures 500
network packets (-c 500
) and saves them into a file called LJ.pcap
(-w LJ.pcap
):
$ tshark -c 500 -w LJ.pcap
The second-most useful parameter is -r
. When followed by a valid
filename, it allows you to read and process a previously captured file
with network data.
Capture filters are filters that are applied during data capturing;
therefore, they make tshark discard network traffic that does not match
the filter criteria and avoids the creation of huge capture files. This
can be done using the -f
command-line parameter, followed by a
filter in double quotes.
The most important TCP-related Field Names used in capture filters are tcp.port (which is for filtering the source or the destination TCP port), tcp.srcport (which is for checking the TCP source port) and tcp.dstport (which is for checking the destination port).
Generally speaking, applying a filter after data capturing is considered more practical and versatile than filtering during the capture stage, because most of the time, you do not know in advance what you want to inspect. Nevertheless, if you really know what you're doing, using capture filters can save you time and disk space, and that is the main reason for using them.
Remember that filter strings always should be written in lowercase.
Display FiltersDisplay filters are filters that are applied after packet capturing; therefore, they just "hide" network traffic without deleting it. You always can remove the effects of a display filter and get all your data back.
Display Filters support comparison and logical operators. The
http.response.code == 404 &&
ip.addr == 192.168.10.1
display filter shows the traffic that either comes from the
192.168.10.1 IP address or goes to the 192.168.10.1 IP address that also has the 404 (Not
Found) HTTP response code in it. The !bootp &&
!ip
filter excludes BOOTP and IP traffic
from the output. The eth.addr == 01:23:45:67:89:ab &&
tcp.port == 25
filter displays the
traffic to or from the network device with the 01:23:45:67:89:ab MAC address that uses TCP port
25 for its incoming or outgoing connections.
When defining rules, remember that the ip.addr !=
192.168.1.5
expression does not mean that none of the ip.addr
fields can contain the
192.168.1.5 IP address. It means that one of the ip.addr
fields should
not contain the 192.168.1.5 IP address! Therefore, the other
ip.addr
field value can be equal to 192.168.1.5! You can think of it as "there
exists one ip.addr
field that is not 192.168.1.5". The correct way of
expressing it is by typing !(ip.addr ==
192.168.1.5)
. This is a common
misconception with display filters.
Also remember that MAC addresses are truly useful when you want to track a given machine on your LAN, because the IP of a machine can change if it uses DHCP, but its MAC address is more difficult to change.
Display filters are extremely useful tools when used correctly, but you still have to interpret the results, find the problem and think about the possible solutions yourself. It is advisable that you visit the display filters reference site for TCP-related traffic at http://www.wireshark.org/docs/dfref/t/tcp.html. For the list of all the available field names related to UDP traffic, see http://www.wireshark.org/docs/dfref/u/udp.html.
Exporting DataImagine you want to extract the frame number, the relative time of the frame, the source IP address, the destination IP address, the protocol of the packet and the length of the network packet from previously captured network traffic. The following tshark command will do the trick for you:
$ tshark -r login.tcpdump -T fields -e frame.number -e
↪frame.time_relative -e ip.src -e ip.dst -e
↪frame.protocols -e frame.len -E header=y -E
↪quote=n -E occurrence=f
The -E header=y
option tells tshark first to print a header
line. The -E quote=n
dictates that tshark not include the data
in quotes, and the -E occurrence=f
tells tshark to use
only the first occurrence for fields that have multiple occurrences.
Having plain text as output means that you easily can process it the
UNIX way. The following command shows the ten most popular IPs using
input from the ip.src
field:
$ tshark -r ~/netData.pcap -T fields -e ip.src | sort
↪| sed '/^\s*$/d' | uniq -c | sort -rn
↪| awk {'print $2 " " $1'} | head
Two Python Scripts That Use tshark
Now, let's look at two Python scripts that read tshark's text output and process it. I can't imagine doing the same thing with a GUI application, such as Wireshark!
Listing 1 shows the full Python code of the first script that checks the validity of an IP address.
Listing 1. checkIP.py
# Programmer: Mihalis Tsoukalos
# Date: Tuesday 28 October 2014
import socket
import sys
import re
def valid_ip(address):
try:
socket.inet_aton(address)
return True
except:
return False
# Counters for the IPs
total = 0
valid = 0
invalid = 0
# Read the file from stdin, line by line
for line in sys.stdin:
line = line.rstrip('\n')
if valid_ip(line):
valid = valid + 1
# print "The IP is valid!"
else:
# print "The IP is not valid!"
invalid = invalid + 1
total = total + 1
# Present the total number of IPs checked
print "Total number of IPs checked:", total
print "Valid IPs found:", valid
print "Invalid IPs found:", invalid
The purpose of the checkIP.py Python script is just to find invalid IP addresses, and it implies that the network data already is captured with tshark. You can use it as follows:
$ tshark -r ~/networkData.pcap -T fields -e ip.src
↪| python checkIP.py
Total number of IPs checked: 1000
Valid IPs found: 896
Invalid IPs found: 104
Listing 2 shows the full code of the second Python script (storeMongo.py).
Listing 2. store Mongo.py
# Programmer: Mihalis Tsoukalos
# Date: Tuesday 28 October 2014
#
# Description: This Python script reads input from
# tshark, parses it and stores it in a MongoDB database
import sys
import pymongo
import re
# The number of BSON documents written
total = 0
# Open the MongoDB connection
connMongo = pymongo.Connection('mongodb://localhost:27017')
# Connect to database named LJ (Linux Journal)
db = connMongo.LJ
# Select the collection to save the network packet
traffic = db.netdata
# Read the file from stdin, line by line
for line in sys.stdin:
line = line.rstrip('\n')
parsed = line.split("\t")
total = total + 1
# Construct the "record to be inserted
netpacket = {
'framenumber': parsed[0],
'sourceIP': parsed[1],
'destIP': parsed[2],
'framelength': parsed[3],
'IPlength': parsed[4]
}
# Store it!
net_id = traffic.insert(netpacket)
connMongo.close()
# Present the total number of BSON documents written
print "Total number of documents stored: ", total
The Python script shown in Listing 2 inserts network data into a MongoDB database for further processing and querying. You can use any database you want. The main reason I used MongoDB is because I like the flexibility it offers when storing structured data that may have some irregular records (records with missing fields).
The name of the Python script is storeMongo.py, and it assumes that the network data already is captured using either tshark or tcpdump. The next shell command runs the Python script with input from tshark:
$ tshark -r ~/var/test.pcap -T fields -e frame.number
↪-e ip.src -e ip.dst -e frame.len -e
↪ip.len -E header=n -E quote=n -E occurrence=f
↪| python storeMongo.py
Total number of documents stored: 500
The text output of the tshark command is similar to the following:
5 yy.xx.zz.189 yyy.74.xxx.253 66 52
6 197.224.xxx.145 yyy.74.xxx.253 86 72
7 109.xxx.yyy.253 zzz.224.xxx.145 114 100
8 197.xxx.zzz.145 zzz.xxx.xxx.253 86 72
9 109.zzz.193.yyy 197.224.zzz.145 114 100
Currently, all numerical values are stored as strings, but you easily can convert them to numbers if you want. The following command converts all string values from the IPlength column to their respective integer values:
> db.netdata.find({IPlength : {$exists : true}}).forEach(
↪function(obj) { obj.IPlength = new NumberInt(
↪obj.IPlength ); db.netdata.save(obj); } );
Now you can start querying the MongoDB database. The following commands find all "records" (called documents in NoSQL terminology) that contain a given destination IP address:
> use LJ
switched to db LJ
> db.netdata.find({ "destIP": "192.168.1.12" })
...
>
The next command finds all entries with a frame.len value that is less than 70:
> use LJ
switched to db LJ
> db.netdata.find({ "framelength": {"$lt" : "70" }})
...
>
The next command finds all entries with an IPlength value greater than 100 and less than 200:
> use LJ
switched to db LJ
> db.netdata.find({ "IPlength": {"$lt" : "200", "$gt": "100" }})
...
>
What you should remember is not the actual commands but the fact that you can query the database of your choice, using the query language you want and find useful information without the need to re-run tshark and parse the network data again.
After you test your queries, you can run them as cron jobs. La vie est belle!
Examining an Nmap ping Scan Using tsharkNext, let's examine the network traffic that is produced by Nmap when it performs a ping scan. The purpose of the ping scan is simply to find out whether an IP address is up. What is important for Nmap in a ping scan is not the actual data of the received packets but, put simply, the actual existence of a reply packet. Nmap ping scans inside a LAN are using the ARP protocol; whereas hosts outside a LAN are scanned using the ICMP protocol. The performed scan pings IP addresses outside the LAN.
The following Nmap command scans 64 IP addresses, from 2.x.yy.1 to 2.x.yy.64:
# nmap -sP 2.x.yy.1-64
Starting Nmap 6.00 ( http://nmap.org ) at 2014-10-29 11:55 EET
Nmap scan report for ppp-4.home.SOMEisp.gr (2.x.yy.4)
Host is up (0.067s latency).
Nmap scan report for ppp-6.home.SOMEisp.gr (2.x.yy.6)
Host is up (0.084s latency).
...
Nmap scan report for ppp-64.home.SOMEisp.gr (2.x.yy.64)
Host is up (0.059s latency).
Nmap done: 64 IP addresses (35 hosts up) scanned in 3.10 seconds
The results show that at execution time only 35 hosts were up, or to be 100% precise, only 35 hosts answered the Nmap scan. Nmap also calculates the round-trip time delay (or latency). This gives a pretty accurate estimate of the time needed for the initial packet (sent by Nmap) to go to the target device plus the time that the response packet took to return back to Nmap.
The following tshark command is used for the capturing and is terminated with Ctrl-C:
# tshark -w nmap.pcap
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
2587 ^C
18 packets dropped
# ls -l nmap.pcap
-rw------- 1 root root 349036 Oct 29 11:55 nmap.pcap
Now, let's analyze the generated traffic using tshark. The following command searches for traffic to or from the 2.x.yy.6 IP address:
$ tshark -r nmap.pcap -R "ip.src == 2.x.yy.6 || ip.dst == 2.x.yy.6"
712 3.237125000 109.zz.yyy.253 -> 2.x.yy.6
↪ICMP 42 Echo (ping) request id=0xa690, seq=0/0, ttl=54
1420 5.239804000 109.zz.yyy.253 -> 2.x.yy.6
↪ICMP 42 Echo (ping) request id=0x699a, seq=0/0, ttl=49
1432 5.240111000 109.zz.yyy.253 -> 2.x.yy.6
↪TCP 58 41242 > https [SYN] Seq=0 Win=1024 Len=0 MSS=1460
1441 5.296861000 2.x.yy.6 -> 109.zz.yyy.253 ICMP 60
↪Timestamp reply id=0x0549, seq=0/0, ttl=57
As you can see, the existence of a response packet (1441) from 2.x.yy.6 is enough for the host to be considered up by Nmap; therefore, no additional tests are tried on this IP.
Now, let's look at the traffic for an IP that is considered down:
$ tshark -r nmap.pcap -R "ip.src == 2.x.yy.2 || ip.dst == 2.x.yy.2"
708 3.236922000 109.zz.yyy.253 -> 2.x.yy.2
↪ICMP 42 Echo (ping) request id=0xb194, seq=0/0, ttl=59
1407 5.237255000 109.zz.yyy.253 -> 2.x.yy.2
↪ICMP 42 Echo (ping) request id=0x24ed, seq=0/0, ttl=47
1410 5.237358000 109.zz.yyy.253 -> 2.x.yy.2
↪TCP 58 41242 > https [SYN] Seq=0 Win=1024 Len=0 MSS=1460
1413 5.237448000 109.zz.yyy.253 -> 2.x.yy.2
↪TCP 54 41242 > http [ACK] Seq=1 Ack=1 Win=1024 Len=0
1416 5.237533000 109.zz.yyy.253 -> 2.x.yy.2
↪ICMP 54 Timestamp request id=0xf7af, seq=0/0, ttl=51
1463 5.348871000 109.zz.yyy.253 -> 2.x.yy.2
↪ICMP 54 Timestamp request id=0x9d7e, seq=0/0, ttl=39
1465 5.349006000 109.zz.yyy.253 -> 2.x.yy.2
↪TCP 54 41243 > http [ACK] Seq=1 Ack=1 Win=1024 Len=0
1467 5.349106000 109.zz.yyy.253 -> 2.x.yy.2
↪TCP 58 41243 > https [SYN] Seq=0 Win=1024 Len=0 MSS=1460
As the ICMP packet did not get a response, Nmap makes more tries on the 2.x.yy.2 IP by sending an HTTP and an HTTPS packet, still without any success. This happens because Nmap adds intelligence to the standard ping (ICMP protocol) by trying some common TCP ports in case the ICMP request is blocked for some reason.
The total number of ICMP packets sent can be found with the help of the following command:
$ tshark -r nmap.pcap -R "icmp" | grep "2.x" | wc -l
233
Displaying Statistics for a Specific Protocol
tshark allows you to display useful statistics about a specific protocol. The following command displays statistics about the HTTP protocol using an existing file with network data:
$ tshark -q -r http.pcap -R http -z http,tree
=====================================================
HTTP/Packet Counter value rate percent
-----------------------------------------------------
Total HTTP Packets 118 0.017749
HTTP Request Packets 66 0.009928 55.93%
GET 66 0.009928 100.00%
HTTP Response Packets 52 0.007822 44.07%
???: broken 0 0.000000 0.00%
1xx: Informational 0 0.000000 0.00%
2xx: Success 51 0.007671 98.08%
200 OK 51 0.007671 100.00%
3xx: Redirection 0 0.000000 0.00%
4xx: Client Error 1 0.000150 1.92%
404 Not Found 1 0.000150 100.00%
5xx: Server Error 0 0.000000 0.00%
Other HTTP Packets 0 0.000000 0.00%
=====================================================
All the work is done by the -z
option, which is for
calculating statistics,
and the -q
option, which is for disabling the printing of information
per individual packet. The -R
option discards all packets that do not
match the specified filter before doing any other processing.
Here's another useful command that shows protocol hierarchy statistics:
$ tshark -nr ~/var/http.pcap -qz "io,phs"
Try it yourself to see the output!
SummaryIf you have an in-depth understanding of display filters and a good knowledge of TCP/IP and networks, with the help of tshark or Wireshark, network-related issues will not longer be a problem.
It takes time to master tshark, but I think it will be time well spent.
Resourcestshark: http://www.wireshark.org/docs/man-pages/tshark.html
Wireshark: http://www.wireshark.org
Display Filters Reference: http://www.wireshark.org/docs/dfref
Internetworking with TCP/IP, Volume I, Douglas E. Comer, Prentice Hall