Wednesday, July 11, 2012

Understanding Email Headers, Part III - The Received Header, or How'd It Get Here?

(If you haven't read Part 1 - "What Are They, and Where Can I See Them?" and Part 2 - "The Basics" yet, you probably should.  We'll wait...)

Tonight, we're going to talk about the Received header.  RFC 5322, Internet Message Format, establishes an entire class of "trace" headers, but does not define them beyond their most basic format.  The reason for that is simple; we are not limited to any single delivery mechanism for Internet messages, nor does a single mail server always use the same delivery mechanism.  It's true that most of us receive Internet messages via Simple Mail Transfer Protocol (SMTP), but the RFC 5322 authors left the specific format of "trace" headers to the discretion of other protocol standards. Most of us use SMTP these days, so we'll look at the definition of Received in RFC 5321, Simple Mail Transfer Protocol.

The Received header simply says what system got the message, what system (name & IP address) they got it from, the date/time, and (optionally) the addressee for whom the message is intended.  You may also see indicators of what SMTP server software was in use at each step along the way.  Each SMTP server that touches a message prepends its Received header to those already present (if any).  So, a typical Received header might look like this:

Received: from mail02.connect.vmware.com ([209.167.231.113]) by SNT0-MC3-F18.Snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4900);
Wed, 11 Jul 2012 16:35:43 -0700

or this (note the presence of the "for whom" clause):

Received: from mercure3.sct.gouv.qc.ca ([142.213.66.19] helo=mercure.sct.gouv.qc.ca)
by magus.postgresql.org with esmtp (Exim 4.72)
(envelope-from <eric.fournier@cspq.gouv.qc.ca>)
id 1Rfv4l-0005Vc-Jx
for pgsql-admin@postgresql.org; Wed, 28 Dec 2011 15:05:57 +0000

For many of today's Internet email services, connectivity is so plentiful that users of Hotmail, Gmail et al. may only see one or two Received headers in a typical email message.  In the corporate world, however, such is not the case; many enterprises only send/receive Internet email through a particular SMTP relay (or relays), so you may see a number of Received headers.  If you do, the topmost is the most recent (and should reflect ultimate delivery.)  For instance, here are the Received headers from an email message sent from within ibm.com to hotmail.com:

Received: from e33.co.us.ibm.com ([32.97.110.151]) by COL0-MC1-F36.Col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4900);
Tue, 10 Jul 2012 20:40:25 -0700
Received: from /spool/local
by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
for <asdfqwerasdf@hotmail.com> from <asdfqwerwen@us.ibm.com>;
Tue, 10 Jul 2012 21:40:24 -0600
Received: from d03dlp02.boulder.ibm.com (9.17.202.178)
by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;
Tue, 10 Jul 2012 21:40:22 -0600
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106])
by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id A425E3E4004E
for <
asdfqwerasdf@hotmail.com>; Wed, 11 Jul 2012 03:40:21 +0000 (WET)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q6B3dt7X244944
for <
asdfqwerasdf@hotmail.com>; Tue, 10 Jul 2012 21:40:06 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q6B3denJ030524
for <
asdfqwerasdf@hotmail.com>; Tue, 10 Jul 2012 21:39:40 -0600
Received: from wtfbes02.edc.lotus.com (wtfbes02.lotus.com [9.32.140.208])
by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q6B3ddop030485
for <
asdfqwerasdf@hotmail.com>; Tue, 10 Jul 2012 21:39:39 -0600

Wow, that message went through quite a path!  Careful scrutiny of these Received headers can yield important information, so let's take a look. 

  • There's only one Internet-routable IP address in these headers; that must be an Internet SMTP relay for ibm.com
  • Note that the outbound SMTP relay added "from <asdfqwerwen@us.ibm.com>" to its Received header.  That's perfectly legal, and is often seen when a single mail relay handles messages to/from several domains.
  • One more point about that Internet SMTP relay - it's either a dual-homed system or parked in a NATted DMZ. It has an internal address of 192.168.1.133 (as seen in the third Received header) and an external address of 32.97.110.151 (as seen in the first Received header).
  • Looking at the hostnames and descriptions, it looks like there's an anti-virus appliance in the SMTP path - d03av01.boulder.ibm.com
    • Apparently, that anti-virus appliance isolates inbound messages until they're scanned - note that it accepted the message with an "AVin" service, but sent it to itself (via loopback [127.0.0.1]) for further delivery by the "AVout" service.
  • The first SMTP server to handle the message (the last Received header) was "wtfbes02" - that strongly implies a Blackberry Enterprise Server (BES).  (Yes, this message was sent from a Blackberry.)
  • The admins of d03dlp02 need to fix their system timezone; the RFCs specifically discourage the use of +0000 by systems outside that timezone.
  • We can identify three different SMTP server software packages used along the way - Microsoft SMTPSVC, sendmail ("8.13.8" and "8.14.4" are sendmail version numbers), and Postfix.
  • From the first Received header to the last, only 1 minutes' time was required to deliver this message.

That last observation is probably the most frequently use of Received headers in troubleshooting - finding the point at which a "late message" was delayed.  Yes, the sender is SUPPOSED to receive a "message delivery delayed" message if the mail path breaks down, but that doesn't always happen.  If you're ever asked "Why did this take so long?  They sent it yesterday morning!", you can take a quick look at the Received headers and determine where the logjam occurred.

Another important point to remember is that "from the same person" does NOT necessarily mean "traversed the same mail path."  For instance, mail sent from my Blackberry takes a different path than does mail sent from my Lotus Notes client, even though they carry the same From header when they hit your inbox.  Received headers tell the REAL behind-the-scenes story.

Next up - all those miscellaneous headers...

Monday, July 09, 2012

Understanding Email Headers, Part II - The Basics

(I'm assuming that you've read the first installment in this series - if not, go do so - and have examined the raw text of at least one email message.)

Now that you've seen that mess of semi-human-readable spew with which a typical email message is opened, let's go into a bit of detail.  We're going to start with the bare minimum - the headers required in every standards-compliant Internet email message:

From: Wes Morgan <wesmorgan1@nowayjose.com>
Date: Mon, 9 Jul 2012 19:12:28 -0400

Yep, that's it.  If you look at RFC 5322 (specifically, Section 3.6, the table on Page 20), you'll see that only these two headers MUST appear in the typical Internet message.  (As you read RFCs, you'll gain a new appreciation for the differences among MUST, SHOULD, SHOULD NOT, MUST NOT, et al.  There's a reason those terms appear in capital letters.)  So, it's quite possible that you may receive a message with naught but the From and Date headers; relax, it's "legal." 

Obviously, the From header includes the address of the sender of the message (and, optionally, their name); there isn't much more to say about that without getting into a lengthy discussion of acceptable address formats. That's definitely beyond the scope of this article; suffice it to say that I once had an email address of <ukecc!flamtap!wes%ukma@UKCC.uky.edu>...

The Date header has changed in recent years, most notably where time zone information is concerned.  Older standards allowed for abbreviations, such as EST for Eastern Standard Time; today, the standard calls for a 4-digit numeric offset from Coodinated Universal Time (UTC), with a + or - prefix as required.  So, the example above specifies an offset of -4 hours from UTC; that's Eastern Daylight Time in the US.  It should also be noted that the day of the week and seconds are optional; between that and the fact that most mail agents graciously accept the old-style headers as well, you may see some variety in Date: headers.  (IMPORTANT NOTE: This is NOT the date/time the message was delivered, but rather the date/time when the sender put the message in its final form - in other words, when they hit "Send.")

Now that we've covered the two required headers, let's talk about those which, if they appear, should only show up once in a given email message.  We'll start with the obvious:

To: Wes Morgan <wesmorgan1@nowayjose.com>
CC: <wesmorgan@hewentthataway.org>, <mybuddyfred@foobarbaz.com>
Subject: RE: testing funky name text in headers
Message-ID: <SNT124-W841AE084F66FD2255A6D287D20@phx.gbl>

These are all fairly self-explanatory, with the possible exception of Message-ID.  All "well-behaved" mail agents insert a Message-ID header, which is supposed to look like "messageidentifier@sitename"; however, you'll notice that the example above uses "phx.gbl", which isn't a meaningful sitename at all.  That's because this is from a Hotmail message; for some internal reason, Microsoft uses "phx.gbl" in its Message-IDs.  Moral of the story?  Once again, sometimes strange-looking stuff can be OK.

The sharp-eyed among you are probably thinking, "Wait a second - where's the Bcc header?"  Well, the answer is simple.  Bcc stands for "blind carbon copy", so while that header IS passed between mail transfer agents as needed, it is removed (if present) before the message is placed in your mailbox.  (If you're really interested, run a network analyzer (like Wireshark) against an unencrypted SMTP service; you'll probably catch a few Bcc headers in the data flow.)

So, what's left?  Well, if you send (or receive) a reply to an earlier email message, a few more headers make an appearance in the reply:

In-Reply-To: <SNT124-W2664003139C1E520CF4F6787D30@phx.gbl>
References: <SNT124-W2664003139C1E520CF4F6787D30@phx.gbl>

Did you notice?  The values of the In-Reply-To and References headers are taken from the Message-ID header of the original!  Ah, but what happens if I "reply to the reply"?   Well, my message gets its own unique Message-ID, of course...and the Message-ID of the message to which I'm replying goes in my In-Reply-To header (which usually has only one Message-ID)...but that In-Reply-To Message-ID is also APPENDED to the References header.  So, in a lengthy back-and-forth, you might see headers that look like this:

In-Reply-To: <4BE8776D.4080504@kheb.fr>
References: <AANLkTik0c9hCMm2Efyj7rB7Us7hL3ZdESYEhE2GBQCfM@mail.gmail.com>
<20100509165117.GD20976@ovh.net>
<AANLkTins_dUSqRbR371SNnOIPYlatKdTCIVM8oDbtVtX@mail.gmail.com>
<AANLkTimKu7l1AtEG-0CI7Q3Ely9PUL2yuyvYuhcMIuSn@mail.gmail.com>
<AANLkTikJaHXYM_DF8zqdaH0vVnJ-fCpqvJ3OCQweoeAb@mail.gmail.com>
<AANLkTinq7riyV4w3VnCoHyj9GsNf3H7jDzu1awU2PNRb@mail.gmail.com>
<AANLkTilrPTCZPj_Tb7bXO5SLym_QY3KUp4J1jkZd5-ZE@mail.gmail.com>
<4BE72FF3.3030501@kheb.fr> <4BE7B451.8060700@linuxant.fr>
<4BE8776D.4080504@kheb.fr>
From: XXX <xxx@xxx.xx>
Date: Mon, 10 May 2010 23:20:59 +0200
Message-ID: <AANLkTikC5oN2rO5VTj8HN7U03b2H3HUqt89KYdemGlcJ@mail.gmail.com>

(Notice that this Message-ID isn't in the References header - because no one has yet referenced this message with a reply!)

Surprise - you've just learned how "threaded discussion" email clients work.  Basically, they can look at any message in the thread, grab its References header, and go find the other messages in your mailbox.

On occasion, someone wants to direct replies to a different address than that specified in the From header.  While this can be used by individuals (if their mail client allows it), we most often see it in conjunction with mailing lists.  Thus, we have Reply-To headers like this one:

Reply-To: bighuge-list@listhostsite.org

Finally, there's a "once and only once" header that occasionally makes an appearance in your mail messages:

Sender: Wes Morgan <wesmorgan1@nowayjose.com>

This one should only show up if/when the sender of the message does NOT agree with the address specified in the From header; in other words, someone/something is sending the message on behalf of the original author.  For instance, you'll often see this in mailing list messages, like so:

From: Wes Morgan <wesmorgan1@nowayjose.com> 
Sender: Big Huge Mailing list <bighuge-list@listhostsite.org>

You may also see the Sender header when a mail client allows delegation, as in "Wes can send email in Steve's name," so keep an eye out for that...

So, to recap:

  • Every message must include the Date and From headers.
  • To, Cc, Subject, and Message-ID are NOT required by the RFCs.
  • Bcc headers do exist, but only "in transit" - they're removed before the message lands in your mailbox.
  • The presence of In-Reply-To and References headers indicate that the message is a reply to a previous message.
    • The References header makes "threaded mail reading" and per-discussion archival possible.
    • The References header will grow in size with each reply in a series of messages.
  • The Sender header usually indicates that the message is being delivered by one person/party on behalf of another, as seen with a mailing list or delegated authority.
  • Reply-To directs replies to an address other than that specified in the From header.
  • If you see more than one instance of any of these headers in a single email message, something goofy is going on.

Those are the 11 basic headers of Internet email.  Next, we'll start talking about the common headers that can show up multiple times...and what we can learn from them.

Understanding Email Headers, Part I - What Are They, and Where Can I See Them?

Most users never stop to think about the mechanics of email delivery; it's enough that email is delivered/received in a timely fashion.  Administrators, however, are necessarily more interested in the entire process.  Thankfully, there is a standard means of documenting almost every bit of processing applied to an email message, from sender to destination and every step along the way.  We're talking about "email headers", and my next few posts will (hopefully) give you a basic understanding of this important information.

(If you want to read the way-down-deep nitty-gritty, you can take a look at RFC 5322 - Internet Message Format, which defines the current standard for email messages.  I'll be referring to RFC 5322 throughout this series of posts, but it isn't "required reading".)

Before we can dive into the headers of a typical email message, however, there's an obvious question to be answered - how does one actually get to them?  In the early days of email, most end-user mail applications displayed the headers as part of the message, but almost all of today's GUI email applications hide them.  For our purposes, we have to get past the GUI and look at the raw text of the message.  I'm happy to say, however, that email header information is only a click or two away in most current applications.  Here's how to get there in several commonly-used email applications.

Lotus Notes 8.5: Open the message, then select View -> Show -> Page Source.  The raw text will be displayed in a new tab.

Hotmail: Open the message.  Look in the top right, across from the sender's name, and you'll see something like this:

Click the down-arrow button next to "Reply", and select "View Message Source".  The raw text will be displayed in a new tab or new window, depending upon your browser's configuration.

Thunderbird: Open the message.  You'll see an "Other Actions" button - select "View Source".  The raw text is displayed in a popup window.

Gmail: Open the message.  At the top right, directly across from the sender's name, you'll see something like this:

Click the down arrow and select "Show Original"; the raw text will be displayed in a new tab.

If you're using a different email application, look for terms like "show message text," "show source," et cetera; when you find it, post a comment here to help others using the same app!

IMPORTANT NOTE: If you're using an enterprise client, like Lotus Notes, be sure that you select a message that came from an Internet user.  Messages that are purely internal (e.g. from one user at your company to another) may not use standard Internet headers. In fact, Lotus Notes doesn't even provide "View Source" for such internal messages; since they don't have to traverse the Internet, Notes/Domino doesn't add Internet email headers to those messages.

Take a look at your email from this perspective, and you'll have a better understanding of "what it takes to get there."  Next, we'll take a look at the required/basic headers that should be seen in any Internet email message...

Sunday, July 01, 2012

Time, Time, Time: Handling Leap Seconds with Grace and Style

OK, so the dreaded leap second 23:59:60 (no kidding) has come and gone.  A few folks noticed discrepancies in their various computer systems' time and date, and there were some reports of "pegged CPU" as individual systems crossed the leap-second boundary.

Well, those of you running your own NTP server for local time services (what, you mean you AREN'T running a local NTP service for your enterprise?  SHAME ON YOU...but that's another article) can anticipate leap seconds and handle them gracefully in the future.  The US National Institute of Standards and Technology publishes a list (the list, actually) of leap seconds--both past and anticipated future--that your NTP servers will happily use to make a graceful navigation through the perils of 23:59:60.

Check out the link below for details...oh, and you'll need the leap-seconds file itself.  Hint: it's much easier to configure leap second support if you're running ntpd v4.2.6 or later, WHICH YOU SHOULD BE RUNNING ANYWAY.

ConfiguringNTP < Support < NTP

A word of advice to those running Ubuntu Linux - the default configuration for ntpd points you to servers in the ubuntu.pool.ntp.org DNS collection.  The servers in that pool appear to be a mix of North American and European time servers, several of which I found to be "too far away" (in network terms) from my site.  If you're in the US, you'll probably get more accurate time data if you point your NTP daemon to servers in the us.pool.ntp.org DNS collection; I used [0123].us.pool.ntp.org (no, ntp.conf doesn't read regexps).  Users in other countries can check DNS for [country-code].pool.ntp.org (uk.pool.ntp.org, au.pool.ntp.org, etc.) to look for a "closer" pool of public time servers.

Don't let the next leap second turn your data center into a Salvador Dali painting...