Friday, May 02, 2014

Wireshark, GeoIP and Checking Up on Mobile/Home Carriers

As enterprises move an ever-growing list of services into the mobile space, it becomes essential to understand the limitations of the mobile network infrastructure.  No longer can we perform true end-to-end capture or analysis of network data; what was the "last mile" is now an indeterminate path through any number of relatively impenetrable mobile networks.  In this respect, troubleshooting issues involving mobile devices can be quite the challenge. At the same time, we're dealing with an increasing number of telecommuters, those "work from home" people who are at the mercy of their ISP.  What, then, is the enterprise network analyst to do?

The answer (or, at least, a good start toward an answer) lies in geolocation - the association of IP address spaces with their geographic and/or corporate assignments. Geolocation can be been integrated with DNS (or, at least, BIND implementations of DNS), the Apache web server, and any number of other applications, including (as of version 1.1.2) our favorite network tool - Wireshark. The marraige of Wireshark's analysis and GeoIP's provider identification produces some powerful analysis capabilities.

You can download free GeoLite versions of current GeoIP databases from MaxMind.  MaxMind provides free GeoLite databases for IPv4 and IPv6 city, country and autonomous system numbers (ASNs); you'll want to download the binary versions, not the CSV editions.  The MaxMind databases are updated on a monthly basis; if you like the results of this exercise, you'll need to set up a process to handle monthly updates.

Now it's time to make Wireshark GeoIP-aware:

1) Once you've downloaded the GeoIP databases, unzip them to a permanent home. On my Linux systems, I created the /usr/local/geoip directory for this purpose; on Windows systems, I use a \geoip subdirectory under the Wireshark installation directory.  The databases can be (and should be) read-only; you won't be adding any data. Now, we're ready to pull them into Wireshark.

2) Open Edit->Preferences in Wireshark, select Name Resolution, and click the "Edit" button next to GeoIP database directories; click New in the resulting dialog and add the directory you created in step 1. Using my Linux example above, you should have something like this (click to enlarge):
3) Close Wireshark and reopen.  You're ready to go!

So, what exactly does this give you? Well, to start with, you'll find that Wireshark's Statistics->Endpoints includes sortable columns for City, Country and AS Number, like so (click to enlarge):
You'll also find GeoIP information in the Details pane of the packet view, under Internet Protocol:
Finally, you can now use GeoIP information in your Wireshark display filters. For instance, I'll take the ASN definition from the previous example (British Sky Broadcasting) and use it in a display filter to show me ALL traffic from that provider:
ip.geoip.src_asnum == "AS5607 British Sky Broadcasting Limited"
Other GeoIP display filters allow you to select/view traffic based on country (e.g. ip.geoip.country=="Egypt"), city (e.g. ip.geoip.src_city == "Birmingham, AL") or even longitude or latitude (e.g. ip.geoip.src_lat == 33.520699).

From here, you can isolate, analyze and/or export data for specific providers, whether they serve mobile or home users; you could even develop "country profiles" if you're serving an international clientele.  While the GeoIP data isn't perfect, it's more than adequate to help you create a profile of your mobile userbase.

Have fun!

Thursday, May 01, 2014

Network Troubleshooting - Sometimes It's What You DON'T See...

I spend a healthy chunk of my typical work day analyzing network packet captures.  My primary tool is Wireshark, which humbly presents itself as "The World's Most Popular Network Protocol Analyzer."  (Seriously - if you aren't using Wireshark, go download it NOW.)  Protocol analyzers are great for identifying typical "red flags" in packet data, but they're all limited to what the raw data might indicate; customer network environments are so broad (and so varied) that the network engineer--especially one "on the outside looking in" with only a small data set--relies heavily on experience and intuition.

One recent case was presented as "many failed connections," and a 6-minute packet capture soon landed in my lap.  Now, every Wireshark user has their own approach; I usually take advantage of Wireshark's display filters to get a general "feel" for the incidence of Layer 3/4 problems. With a typical capture file, I'll start with tcp.analysis.flags,which simply tells Wireshark, "hey, show me what YOU think are TCP problems." Now, as I said, none of these tools are perfect, so take these results with a grain of salt; they're only as good as are the underlying data, and it's very easy to collect inaccurate or incomplete data. After taking a look at the results of this display filter, I noticed what seemed an high number of TCP retransmissions, so I decided to see exactly which packets were being retransmitted with a different display filter, tcp.analysis.retransmission, which will show me only those packets Wireshark believes to be TCP retransmissions. The resulting numbers were somewhat high, but I've seen worse. Now, the complaint was very specific that new connections were failing; no mention was made of existing connections being interrupted/terminated; so, I went to Wireshark's Statistics->Conversations dialog and sorted on the "Packets" column to look for very short conversations and found HUNDREDS of conversations that only lasted for a few packets, like these:
Well, now, wait just a minute - the TCP handshake requires 3 packets (SYN, SYN/ACK, ACK) to establish a conversation, and I'm seeing hundreds of conversations that are only exchanging 3 to 6 packets. After checking a few suspect conversations, I found a pattern, namely this:

So, the remote endpoint starts a conversation with a SYN packet and the local endpoint responds immediately, but we see the remote endpoint retransmitting its SYN packet within 10ms. The local endpoint retransmits its SYN/ACK, but neither the original nor the retransmitted SYN/ACK seem to reach the remote endpoint, and the conversation attempt is ultimately terminated with a TCP reset (RST) packet. Back I go to Wireshark's display, this time to ask about a very specific type of TCP retransmission:
tcp.analysis.retransmission && tcp.flags.syn==1 && !tcp.flags.ack==1
With this display filter, I'm asking Wireshark to show me all retransmitted SYN packets; the "!tcp.flags.ack==1" eliminates SYN/ACK packets from the display. The results were startling; within a 6-minute period, more than 110 endpoints had retransmitted more than 170 SYN packets...and all of them had failed to complete the TCP handshake.

Well, if conditions are this bad to START conversations, then there must be thousands of cases in which existing connections die before completing successfully, right?  Let's go back to Wireshark's Statistics->Conversations dialog and sort on Duration to look at long-lived conversations:
Hmm...I have hundreds of conversations that last longer than 2 minutes...but I can't find one that suffers from retransmissions sufficient to terminate the conversation.

If I were looking at a general network congestion issue on the local network, I'd expect conversations to suffer equally--packets are packets, right?--but this is something different. That seeming conflict in the data prompted what proved to be the key question:
If I'm seeing HUNDREDS of new conversations fail the TCP handshake due to excessive retransmissions, why DON'T I see established conversations suffering excessive retransmissions as well?
Well, after few moments' thought, it occurred to me that the only network devices that usually make specific distinctions between new and existing connections are those involved in network security. A brief conversation with the customer revealed that an intrusion protection system (IPS) was in place and "inspecting" conversations. When we conducted a test that bypassed the IPS, the incidence of failed TCP handshakes decreased by roughly 98%; our troubleshooting attention is now properly directed.

So, the moral of this story: Pay attention to the data, but pay equal attention to what isn't there.