mirror of
https://bitbucket.org/smil3y/kde-playground.git
synced 2025-02-24 10:52:52 +00:00
1201 lines
58 KiB
Text
1201 lines
58 KiB
Text
/**
|
|
|
|
\mainpage The KMime Library
|
|
|
|
\section introduction Introduction
|
|
|
|
KMime is a library for handling mail messages and newsgroup articles. Both mail messages and
|
|
newsgroup articles are based on the same standard called MIME, which stands for
|
|
<b>Multipurpose Internet Mail Extensions</b>. In this document, the term \c message is used to
|
|
refer to both mail messages and newsgroup articles.
|
|
|
|
KMime deals solely with the in-memory representation of messages, topics such a transport or storage
|
|
of messages are handled by other libraries, for example by
|
|
<a href="http://api.kde.org/4.x-api/kdepimlibs-apidocs/mailtransport/html/index.html">the mailtransport library</a>
|
|
or by <a href="http://api.kde.org/4.x-api/kdepimlibs-apidocs/kimap/html/index.html">the KIMAP library</a>.
|
|
Similary, this library does not deal with displaying messages or advanced composing, for those there
|
|
are the <a href="http://api.kde.org/4.x-api/kdepim-apidocs/messageviewer/html/index.html">messageviewer</a>
|
|
and the <a href="http://websvn.kde.org/trunk/KDE/kdepim/messagecomposer/">messagecomposer</a>
|
|
components in the KDEPIM module.
|
|
|
|
KMime's main function is to parse, modify and assemble messages in-memory. In a
|
|
\ref string-broken-down "later section", <i>parsing</i> and <i>assembling</i> is actually explained.
|
|
KMime provides high-level classes that make these tasks easy.
|
|
|
|
MIME is defined by various RFCs, see the \ref rfcs "RFC section" for a list of them.
|
|
|
|
\section structure Structure of this document
|
|
|
|
This document will first give an \ref mime-intro "introduction to the MIME specification", as it is
|
|
essential to understand the basics of the structure of MIME messages for using this library.
|
|
The introduction here is aimed at users of the library, it gives a broad overview with examples and
|
|
omits some details. Developers who wish to modifiy KMime should read the
|
|
\ref rfcs "corresponding RFCs" as well, but this is not necessary for library users.
|
|
|
|
After the introduction to the MIME format, the two ways of representing a message in memory are
|
|
discussed, the \ref string-broken-down "string representation and the broken down representation".
|
|
|
|
This is followed by a section giving an
|
|
\ref classes-overview "overview of the most important KMime classes".
|
|
|
|
The last sections give a list of \ref rfcs "relevant RFCs" and provide links for
|
|
\ref links "further reading".
|
|
|
|
\section mime-intro Structure of MIME messages
|
|
|
|
\subsection history A brief history of the MIME standard
|
|
|
|
The MIME standard is quite new (1993), email and usenet existed way before the MIME standard came into
|
|
existence. Because of this, the MIME standard has to keep backwards compatibility. The email
|
|
standard before MIME lacked many capabilities like encodings other than ASCII or attachments. These
|
|
and other things were later added by MIME. The standard for messages before MIME is defined in
|
|
<a href="http://tools.ietf.org/html/rfc5322">RFC 5322</a>. In <a href="http://tools.ietf.org/html/rfc2045">RFC 2045</a>
|
|
to <a href="http://tools.ietf.org/html/rfc2049">RFC 2049</a>, several backward-compatible extensions
|
|
to the basic message format are defined, adding support for attachments, different encodings and many
|
|
others.
|
|
|
|
Actually, there is an even older standard, defined in <a href="http://tools.ietf.org/html/rfc733">RFC 733</a>
|
|
(<i>Standard for the format of ARPA network text messages</i>, introduced in 1977).
|
|
This standard is now obsoleted by RFC 5322, but backwards compatibilty is in some cases supported, as
|
|
there are still messages in this format around.
|
|
|
|
Since pre-MIME messages had no way to handle attachments, attachments were sometimes added to the message
|
|
text in an <a href="http://en.wikipedia.org/wiki/Uuencoding">uuencoded</a> form. Although this is also
|
|
obsolete, reading uuencoded attachments is still supported by KMime.
|
|
|
|
After MIME was introduced, people realized that there is no way to have the filename of attachments
|
|
encoded in anything different than ASCII. Thus, <a href="http://tools.ietf.org/html/rfc2231">RFC 2231</a>
|
|
was introduced to allow abitrary encodings for parameter values, such as the attachment filename.
|
|
|
|
\subsection examples MIME by examples
|
|
|
|
In the following sections, MIME message examples are shown, examined and explained, starting with
|
|
a simple message and proceeding to more interesting examples.
|
|
You can get additional examples by simply viewing the source of your own messages in your mail client,
|
|
or by having a look at the examples in the \ref rfcs "various RFCs".
|
|
|
|
\subsubsection simple-mail A simple message
|
|
|
|
\verbatim
|
|
Subject: First Mail
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Sun, 21 Feb 2010 19:16:11 +0100
|
|
MIME-Version: 1.0
|
|
|
|
Hello World!
|
|
\endverbatim
|
|
The above example features a very simple message. The two main parts of this message are the \b header
|
|
and the \b body, which are seperated by an empty line. The body contains the actual message content,
|
|
and the header contains metadata about the message itself. The header consists of several <b>header fields</b>,
|
|
each of them in their own line. Header fields are made up from the <b>header field name</b>, followed by a colon, followed
|
|
by the <b>header field body</b>.
|
|
|
|
The \b MIME-Version header field is mandatory for MIME messages. \b Subject,
|
|
\b From and \b Date are important header fields, they are usually displayed in the message list of a
|
|
mail client. The \c Subject header field can be anything, it does not have a special structure. It is a
|
|
so-called \b unstructured header field. In contrast, the \c From and the \c Date header fields have
|
|
to follow a special structure, they must be formed in a way that machines can parse. They are \b structured
|
|
header fields. For example, a mail client needs to understand
|
|
the \c Date header field so that it can sort the messages by date in the message list.
|
|
The exact details of how the header field bodies of structured header fields should be
|
|
formed are specified in an RFC.
|
|
|
|
In this example, the \c From header contains a single email address. More precisly, a single email address is called
|
|
a \b mailbox, which is made up of the <b>display name</b> (John Doe) and the <b>address specification</b> (john.doe@domain.com),
|
|
which is enclosed in angle brackets. The \c addr-spec consists of the user name, the <b>local part</b>,
|
|
and the \b domain name.
|
|
|
|
Many header fields can contain multiple email addresses, for example the \c To field for messages with
|
|
multiple recipients can have a comma-seperated list of mailboxes.
|
|
A list of mailboxes, together with a display name for the list, forms a \b group, and multiple groups can form an
|
|
<b>address list</b>. This is however rarely used, you'll most often see a simple list of plain mailboxes.
|
|
|
|
There are many more possible header fields than shown in this example, and the header can even contain
|
|
abitrary header fields, which usually are prefixed with \c X-, like \c X-Face.
|
|
|
|
\subsubsection encodings Encodings and charsets
|
|
|
|
\verbatim
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Mon, 22 Feb 2010 00:42:45 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: quoted-printable
|
|
|
|
Gr=FCezi Welt!
|
|
\endverbatim
|
|
|
|
The above shows a message that is using a different \b charset than the standard \b US-ASCII charset. The
|
|
message body contains the string "Grüezi Welt!", which is \b encoded in a special way.
|
|
|
|
The \b content-type of this message is \b text/plain, which means that the message is simple text. Later,
|
|
other content types will be introduced, such as \b text/html. If there is no \c Content-Type header
|
|
field, it is assumed that the content-type is \c text/plain.
|
|
|
|
Before MIME was introduced, all messages were limited to the US-ASCII charset. Only the
|
|
lower 127 values of the bytes were allowed to be used, the so-called \b 7-bit range. Writing a message in
|
|
another charset or using letters from the upper 127 byte values was not allowed.
|
|
|
|
\par Charset Encoding
|
|
|
|
When talking about charsets, it is important to understand how strings of text are converted to
|
|
byte arrays, and the other way around. A message is nothing else than a big array of bytes.
|
|
The bytes that form the body of the message somehow need to be interpreted as a text string. Interpreting
|
|
a byte array as a text string is called \b decoding the text. Converting a text string to a byte array is called
|
|
\b encoding the text. A \b codec (<b>co</b>der-<b>dec</b>oder) is a utility that can encode and decode text.
|
|
In Qt, the class for text strings is QString, and the class for byte arrays is QByteArray. The base class
|
|
of all codecs is QTextCodec.
|
|
|
|
With the US-ASCII charset, encoding and decoding text is easy, one just has to look at an <a href="http://en.wikipedia.org/wiki/ASCII_table">
|
|
ASCII table</a> to be able to convert text strings to byte arrays and byte arrays to text strings. For
|
|
example, the letter 'A' is represented by a single byte with the value of 65. When encountering a byte
|
|
with the value 84, we can look that up in the table and see that it represents the letter 'T'.
|
|
With the US-ASCII charset, each letter is represented by exactly one byte, which is very convenient.
|
|
Even better, all letters commonly used in English text have byte values below 127, so the 7-bit limit
|
|
of messages is no problem for text encoded with the US-ASCII charset.
|
|
Another example: The string "Hello World!" is represented by the following byte array:<br>
|
|
<code>48 65 6C 6C 6F 20 57 6F 72 6C 64</code><br>
|
|
Note that the byte values are written in hexadecimal form here, not in decimal as earlier.
|
|
|
|
Now, what if we want to write a message that contains German umlauts or Chinese letters? Those
|
|
are not in the ASCII table, therefore a different charset has to be used. There is a wealth of charsets
|
|
to chose from. Not all charsets can handle all letters, for example the
|
|
<a href="http://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1">ISO-8859-1</a> charset can handle
|
|
German umlauts, but can not handle Chinese or Arabic letters. The <a href="http://en.wikipedia.org/wiki/Unicode">
|
|
Unicode standard</a> is an attempt to introduce charsets that can handle all known letters in the
|
|
world, in all languages. Unicode actually has several charsets, for example <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a>
|
|
and <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>. In an ideal world, everyone would be using
|
|
Unicode charsets, but for historic and legacy reasons, other charsets are still much in use.
|
|
|
|
Charsets other than US-ASCII don't generally have as nice properties: A single letter can be represented
|
|
by multiple bytes, and generally the byte values are not in the 7-bit range. Pay attention to the UTF-8
|
|
charset: At first glance, it looks exactly like the US-ASCII charset, common latin letters like A - Z
|
|
are encoded with the same byte values as with US-ASCII. However, letters other than A - Z are suddenly
|
|
encoded with two or even more bytes. In general, one letter can be encoded in an abitrary number of bytes, depending
|
|
on the charset. One can \b not rely on the <code>1 letter == 1 byte</code> assumption.
|
|
|
|
Now, what should be done when the text string "Grüezi Welt!" should be sent in the body of a message?
|
|
The first step is to chose a charset that can represent all letters. This already excludes US-ASCII.
|
|
Once a charset is chosen, the text string is encoded into a byte array.
|
|
"Grüezi Welt!" encoded with the ISO-8859-1 charset produces the following byte array:<br>
|
|
<code>47 72 FC 65 7A 69 20 57 65 6C 74 21</code><br>
|
|
The letter 'ü' here is encoded using a single byte with the value <code>FC</code>.
|
|
The same string encoded with UTF-8 looks slightly different:<br>
|
|
<code>47 72 C3 BC 65 7A 69 20 57 65 6C 74 21</code><br>
|
|
Here, the letter 'ü' is encoded with two bytes, <code>C3 BC</code>. Still, one can see the similarity
|
|
between the two charsets for the other letters.
|
|
|
|
You can try this out yourself: Open your favorite text editor and enter some text with non-latin
|
|
letters. Then save the file and view it in a hex editor to see how the text was converted to a
|
|
byte array. Make sure to try out setting different charsets in your text editor.
|
|
|
|
At this point, the text string is sucessfully converted to a byte array, using e.g. the ISO-8859-1
|
|
charset. To indicate which charset was used, a \b Content-Type header field has to be added, with the correct
|
|
\b charset parameter. In our example above, that was done. If the charset parameter of the \c Content-Type,
|
|
or even the complete \c Content-Type header field is left out, the receiver can not know how to interpret
|
|
the byte array! In these cases, the byte array is usually decoded incorrectly, and the text strings contain
|
|
wrong letters or lots of questionmarks. There is even a special term for such wrongly decoded text,
|
|
<a href="http://en.wikipedia.org/wiki/Mojibake">Mojibake</a>. It is important to always know what charset
|
|
your byte array is encoded with, otherwise an attempt at decoding the byte array into a text string will fail and produce
|
|
Mojibake. <b>There is no such thing as plain text!</b> If there is no \c Content-Type header field in
|
|
a message, the message body should be interpreted as US-ASCII.
|
|
|
|
To learn more about charsets and encodings, read
|
|
<a href="http://www.joelonsoftware.com/articles/Unicode.html">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a>
|
|
and <a href="http://www.cs.tut.fi/~jkorpela/chars.html">A tutorial on character code issues</a>. Especially
|
|
the first article should really be read, as the name indicates.
|
|
|
|
\par Content Transfer Encoding
|
|
|
|
Now, we can't use the byte array that was just created in a message. The string encoded with ISO-8859-1
|
|
has the byte value <code>FC</code> for the letter 'ü', which is decimal value 252. However, as said earlier,
|
|
messages are only valid when all bytes are in the 7-bit range, i.e. have byte value below 127.
|
|
So what should we do for byte values that are greater than 127, how can they be added to messages? The solution
|
|
for this is to use a <b>content transfer encoding</b> (CTE). A content transfer encoding takes a byte
|
|
array as input and transforms it. The output is another byte array, but one which only uses byte values
|
|
in the 7-bit range. One such content transfer encoding is <b>quoted-printable</b> (QTP), which is used in the
|
|
above example. Quoted-printable is easy to understand: When encountering a byte that has a value greater
|
|
than 127, it is simply replaced by a '=', followed by the hexadecimal code of the byte value, represented
|
|
as letters and digits encoded with ASCII. This means
|
|
that a byte with the value 252 is replaced with the ASCII string <code>=FC</code>, since <code>FC</code>
|
|
is the hexadecimal value of 252. The ASCII string <code>=FC</code> itself is now three bytes big,
|
|
<code>3D 46 43</code>. Therefore, the quoted-printable encoding replaces each byte outside of the 7-bit
|
|
range with 3 new bytes. Decoding quoted-printable encoding is also easy: Each time a byte with the value
|
|
<code>3D</code>, which is the letter '=' in ASCII, is encountered, the next two following bytes are interpreted
|
|
as the hex value of the resulting byte. The quoted-printable encoding was invented to make reading the
|
|
byte array easy for humans.
|
|
|
|
The quoted-printable encoding is not a good choice when the input byte array contains lots of bytes
|
|
outside the 7-bit range, as the resulting byte array will be three times as big in the worst case,
|
|
which is a waste of space. Therefore another content transfer encoding was introduced, \b Base64.
|
|
The details of the base64 encoding are too much to write about here; refer to the
|
|
<a href="http://en.wikipedia.org/wiki/Base64">Wikipedia article</a>
|
|
or the <a href="http://tools.ietf.org/html/rfc2045#section-6.8">RFC</a> for details.
|
|
As an example, the ISO-8859-1 encoded text string "Grüezi Welt!" is, after encoding it with base64,
|
|
represented by the following ASCII string: <code>R3L8ZXppIFdlbHQh</code>.
|
|
To express the same in byte arrays: The byte array <code>47 72 FC 65 7A 69 20 57 65 6C 74 21</code>
|
|
is, after encoding it with base64,
|
|
represented by the byte array <code>52 33 4C 38 5A 58 70 70 49 46 64 6C 62 48 51 68</code>
|
|
|
|
There are two other content transfer encodings besides quoted printable and base64: \b 7-bit and
|
|
\b 8-bit. 7-bit is just a marker to indicate that no content transfer encoding is used. This is the
|
|
case when the byte array is already completley in the 7-bit range, for example when writing English
|
|
text using the US-ASCII charset. 8-bit is also a marker to indicate that no content transfer encoding
|
|
was used. This time, not because it was not necessary, but because of a special exception, byte values
|
|
outside of the 7-bit range are allowed. For example, some SMTP servers support the
|
|
<a href="http://tools.ietf.org/html/rfc1652">8BITMIME</a> extension, which indicates that they accept
|
|
bytes outside of the 7-bit range. In this case, one can simply use the byte arrays as-is, without using
|
|
any content transfer encoding. Creating messages with 8-bit content transfer encoding is currently not
|
|
supported by KMime. The advantage of 8-bit is that there is no overhead in size, unlike with
|
|
base64 or even quoted-printable.
|
|
|
|
When using one of the 4 contents transfer encodings, i.e. quoted-printable, base64, 7-bit or 8-bit, this
|
|
has to be indicated in the header field \b Content-Transfer-Encoding. If the header field is left out,
|
|
it is assumed that the content transfer encoding is 7-bit. The example above uses quoted-printable.
|
|
|
|
\verbatim
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Mon, 22 Feb 2010 00:42:45 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: base64
|
|
|
|
R3L8ZXppIFdlbHQh
|
|
\endverbatim
|
|
The same example, this time encoded with the base64 content transfer encoding.
|
|
|
|
\verbatim
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Mon, 22 Feb 2010 00:42:45 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="utf-8"
|
|
Content-Transfer-Encoding: base64
|
|
|
|
R3LDvGV6aSBXZWx0IQ==
|
|
\endverbatim
|
|
Again the same example, this time using UTF-8 as the charset.
|
|
|
|
\verbatim
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Mon, 22 Feb 2010 00:42:45 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="utf-8"
|
|
Content-Transfer-Encoding: quoted-printable
|
|
|
|
Gr=C3=BCezi Welt!
|
|
\endverbatim
|
|
The example with a combination of UTF-8 and quoted-printable CTE. As said somewhere above, with the
|
|
UTF-8 encoding, the letter 'ü' is represented by the two bytes <code>C3 BC</code>.
|
|
|
|
\verbatim
|
|
From: John Doe <john.doe@domain.com>
|
|
Date: Mon, 22 Feb 2010 00:42:45 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="utf-8"
|
|
Content-Transfer-Encoding: 7-bit
|
|
|
|
Hello World
|
|
\endverbatim
|
|
A different example, showing 7-bit content transfer encoding. Although the UTF-8 charset has lots
|
|
of letters that are represented by bytes outside of the 7-bit range, the string "Hello World" can
|
|
be fully represented in the 7-bit range here, even with UTF-8.
|
|
|
|
In the \ref links "further reading" section, you will find links to web applications that demonstrate
|
|
encodings and charsets.
|
|
|
|
\par Conclusion
|
|
|
|
When adding a text string to the body of a message, it needs to be encoded twice: First, the encoding of the charset
|
|
needs to be applied, which transforms the text string into a byte array. Afterwards, the content transfer
|
|
encoding has to be applied, which transforms the byte array from the first step into a byte array that
|
|
only has bytes in the 7-bit range.
|
|
|
|
When decoding, the same has to be done, in reverse: One first has decode the byte array with the content transfer encoding, to get a byte
|
|
array that has all 256 possible byte values. Afterwards, the resulting byte array needs to be decoded
|
|
with the correct charset, to transform it into a text string. For those two decoding steps, one has to
|
|
look at the \c Content-Type and the \c Content-Transfer-Encoding header fields to find the correct
|
|
charset and CTE for decoding.
|
|
|
|
It is important to always keep the charset and the content transfer encoding in mind. Byte arrays and
|
|
strings are not to be confused. Byte arrays that are encoded with a CTE are not to be confused with
|
|
byte arrays that are \b not encoded with a CTE.
|
|
|
|
This section showed how to use different charsets in the <i>body</i> of a message. The next section will
|
|
show what to do when another charset is needed in one of the <i>header</i> field bodies.
|
|
|
|
\subsubsection header-encoding Encoding in Header Fields
|
|
|
|
In the last section, we discussed how to use different charsets in the body of a message. But what if
|
|
a different charset needs to be added to one of the header fields? For example one might want to write
|
|
a mail to a mailbox with the display name "András Manţia" and with the subject "Grüezi!".
|
|
|
|
The header fields are limited to characters in the 7-bit range, and are interpreted as US-ASCII.
|
|
That means the header field names, such as "From: ", are all encoded in US-ASCII. The header field
|
|
bodies, such as the "1.0" of \c MIME-Version, are also encoded with US-ASCII. This is mandated by
|
|
<a href="http://tools.ietf.org/html/rfc5322#section-2">the RFC</a>.
|
|
|
|
The \c Content-Type and the \c Content-Transfer-Encoding header fields only apply to the message body,
|
|
they have no meaning for other header fields.
|
|
|
|
This means that any letter in a different charset has to be encoded in some way to statisfy the RFC.
|
|
Letters with a different charset are only allowed in some of the header field bodies, the header field
|
|
names always have to be in US-ASCII.
|
|
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
Subject: =?iso-8859-1?q?Gr=FCezi!?=
|
|
Date: Mon, 22 Feb 2010 14:34:01 +0100
|
|
MIME-Version: 1.0
|
|
To: =?utf-8?q?Andr=C3=A1s?= =?utf-8?q?_Man=C5=A3ia?= <andras@domain.com>
|
|
Content-Type: Text/Plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
bla bla bla
|
|
\endverbatim
|
|
|
|
The above example shows how text that is encoded with a different charset than US-ASCII is handled
|
|
in the message header. This can be seen in the bodies of the \c Subject header field and the \c To header field.
|
|
In this example, the body of the message is unimportant, it is just "bla bla bla" in US-ASCII.
|
|
The way the header field bodies are encoded is sometimes referred to as a <b>RFC2047 string</b> or as an <b>encoded word</b>, which has
|
|
the origin in the <a href="http://tools.ietf.org/html/rfc2047">RFC</a> where this encoding scheme is defined.
|
|
RFC2047 strings are only allowed in some of the header fields, like \c Subject and in the display name
|
|
of mailboxes in header fields like \c From and \c To. In other header fields, such as \c Date and
|
|
\c MIME-Version, they are not allowed, but they wouldn't make much sense there anyway, since those are
|
|
structured header fields with a clearly defined structure.
|
|
|
|
RFC2047 strings start with "=?" and end with "?=". Between those markers, they consists of three parts:
|
|
\li The charset, such as "iso-8859-1"
|
|
\li The encoding, which is "q" or "b"
|
|
\li The encoded text
|
|
|
|
These three parts are sperated with a '?'. Encoding the third part, the text, is very similar to how
|
|
text strings in the message body are encoded: First, the text string is encoded to a byte array using
|
|
the charset encoding. Afterwards, the second encoding is used on the result, to ensure that all resulting
|
|
bytes are within the 7-bit range.
|
|
|
|
The <i>second encoding</i> here is almost identical to the content transfer encoding. There are two
|
|
possible encodings, \b b and \b q. The \c b encoding is the same as the base64 encoding of the content
|
|
transfer encoding. The \c q encoding is very similar to the quoted-printable encoding of the content
|
|
transfer encoding, but with some little differences that are described in
|
|
<a href="http://tools.ietf.org/html/rfc2047#section-4.2">the RFC</a>.
|
|
|
|
Let's examine the subject of the message, <code>=?iso-8859-1?q?Gr=FCezi!?=</code>, in detail:<br>
|
|
The first part of the RFC2027 string is the charset, so it is ISO-8859-1 in this case. The second part
|
|
is the encoding, which is the \c q encoding here. The last part is the encoded text, which is
|
|
<code>Gr=FCezi!</code>. As with the quoted-printable encoding, "=FC" is the encoding for the byte with
|
|
the value <code>FC</code>, which in the ISO-8859-1 charset is the letter 'ü'. The complete decoded
|
|
text is therefore "Grüezi!".
|
|
|
|
Each RFC2047 string in the header can use a different charset: In this example, the \c Subject uses ISO-8859-1,
|
|
\c To uses UTF-8 and the message body uses US-ASCII.
|
|
|
|
In the \c To header field, two RFC2047 strings are used. A single, bigger, RFC2047 string for the whole
|
|
display name could also have been used. In this case, the second RFC2047 string starts with an underscore,
|
|
which is decoded as a space in the \c q encoding. The space between the two RFC2047 strings is ignored,
|
|
it is just used to seperate the two encoded words.
|
|
|
|
There are some restriction on RFC2047 strings: They are not allowed to be longer than 75 characters,
|
|
which means two or more encoded words have to be used for long text strings. Also, there are some
|
|
restrictions on where RFC2047 strings are allowed; most importantly, the address specification must no
|
|
not be encoded, to be backwards compatible. For further details, refer to the RFC.
|
|
|
|
\subsubsection multipart-mixed Messages with attachments
|
|
|
|
Until now, we only looked at messages that had a single text part as the message body. In this section,
|
|
we'll examine messages with attachments.
|
|
|
|
\verbatim
|
|
From: frank@domain.com
|
|
To: greg@domain.com
|
|
Subject: Nice Photo
|
|
Date: Sun, 28 Feb 2010 19:57:00 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Multipart/Mixed;
|
|
boundary="Boundary-00=_8xriL5W6LSj00Ly"
|
|
|
|
--Boundary-00=_8xriL5W6LSj00Ly
|
|
Content-Type: Text/Plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Hi Greg,
|
|
|
|
attached you'll find a nice photo.
|
|
|
|
--Boundary-00=_8xriL5W6LSj00Ly
|
|
Content-Type: image/jpeg;
|
|
name="test.jpeg"
|
|
Content-Transfer-Encoding: base64
|
|
Content-Disposition: attachment;
|
|
filename="test.jpeg"
|
|
|
|
/9j/4AAQSkZJRgABAQAAAQABAAD/4Q3XRXhpZgAASUkqAAgAAAAHAAsAAgAPAAAAYgAAAAABBAAB
|
|
[SNIP 800 lines]
|
|
ze5CdSH2Z8yTatHSV2veW0rKzeq30//Z
|
|
|
|
--Boundary-00=_8xriL5W6LSj00Ly--
|
|
|
|
\endverbatim
|
|
<i>Note: Since the image in this message would be really big, most of it is omitted / snipped here.</i>
|
|
|
|
The above example consists of two parts: A normal text part and an image attachment. Messages that
|
|
consist of multiple parts are called \b multipart messages. The top-level content-type therefore is
|
|
<b>multipart/mixed</b>. \c Mixed simply means that the following parts have no relation to each other,
|
|
it is just a random mixture of parts. Later, we will look at other types, such as \c multipart/alternative
|
|
or \c multipart/related. A \b part is sometimes also called \b node, \b content or <b>MIME part</b>.
|
|
|
|
Each MIME part of the message is seperated by a \b boundary, and that boundary
|
|
is specified in the top-level content-type header as a parameter. In the message body, the boundary
|
|
is prefixed with \c "--", and the last boundary is suffixed with \c "--", so that the end of the message can
|
|
be detected. When creating a message, care must be taken that the boundary appears nowhere else in the
|
|
message, for example in the text part, as the parser would get confused by this.
|
|
|
|
A MIME part begins right after the boundary. It consists of a <b>MIME header</b> and a <b>MIME body</b>, which
|
|
are seperated by an empty line. The MIME header should not be confused with the message header: The
|
|
message header contains metadata about the whole message, like subject and date. The MIME header only
|
|
contains metadata about the specific MIME part, like the content type of the MIME part. MIME header
|
|
field names always start with \c "Content-".
|
|
The example above shows the three most important MIME header fields, usually those are the only ones
|
|
used. The top-level header of a message actually mixes the message metadata and the MIME metadata into one header: In this
|
|
example, the header contains the \c Date header field, which is an ordinary header field, and it contains
|
|
the \c Content-Type header field, which is a MIME header field.
|
|
|
|
MIME parts can be nested, and therefore form a tree. The above example has the following tree:
|
|
\verbatim
|
|
multipart/mixed
|
|
|- text/plain
|
|
\- image/jpeg
|
|
\endverbatim
|
|
The \c text/plain node is therefore a \b child of the \c multipart/mixed mode. The \c multipart/mixed node
|
|
is a \b parent of the other two nodes. The \c image/jpeg node is a \b sibling of the \c text/plain node.
|
|
\c Multipart nodes are the only nodes that have children, other nodes are \b leaf nodes.
|
|
The body of a multipart node consists of all complete child nodes (MIME header and MIME body), seperated
|
|
by the boundary.
|
|
|
|
Each MIME part can have a different content transfer encoding. In the above example, the text part has
|
|
a \c 7bit CTE, while the image part has a \c base64 CTE. The multipart/mixed node does not specifiy
|
|
a CTE, multipart nodes always have \c 7bit as the CTE. This is because the body of multipart nodes can
|
|
only consist of bytes in the 7 bit range: The boundary is 7 bit, the MIME headers are 7 bit, and the
|
|
MIME bodies are already ancoded with the CTE of the child MIME part, and are therefore also 7 bit. This means
|
|
no CTE for multipart nodes is necessary.
|
|
|
|
The MIME part for the image does not specify a charset parameter in the content type header field. This
|
|
is because the body of that MIME part will not be interpreted as a text string, therefore the byte array
|
|
does not need to be decoded to a string. Instead, the byte array is interpreted as an image, by an image
|
|
renderer. The message viewer application passes the MIME part body as a byte array to the image renderer.
|
|
The content type consists of a <b>media type</b> and a <b>subtype</b>. For example, the content type
|
|
\c "text/html" has the media type "text" and the subtype "html". Only nodes that have the media type "text"
|
|
need to specify a charset, as those nodes are the only nodes of which the body is interpreted as a text string.
|
|
|
|
The only header field not yet encountered in previous sections is the \b Content-Disposition header field,
|
|
which is defined in a <a href="http://tools.ietf.org/html/rfc2183">seperate RFC</a>. It describes how
|
|
the message viewer application should display the MIME part. In the case of the image part, is should
|
|
be presented as an attachment. The \b filename parameter tells the message viewer application which filename
|
|
should be used by default when the user saves the attachment to disk.
|
|
|
|
The content type header field for the image MIME part has a \b name parameter, which is similar to the
|
|
\c filename parameter of the \c Content-Disposition header field. The difference is that \c name refers
|
|
to the name of the complete MIME part, whereas \c filename refers to the name of the attachment. The
|
|
\c name paramter of the \c Content-Type header field in this case is superfluous and only exists for
|
|
backwards compatibility, and can be ignored;
|
|
the \c filename parameter of the \c Content-Disposition header field should be prefered when it is present.
|
|
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
To: sebastian@domain.com
|
|
Subject: Help with SPARQL
|
|
Date: Sun, 28 Feb 2010 21:57:51 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Multipart/Mixed;
|
|
boundary="Boundary-00=_PjtiLU2PvHpvp/R"
|
|
|
|
--Boundary-00=_PjtiLU2PvHpvp/R
|
|
Content-Type: Text/Plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Hi Sebastian,
|
|
|
|
I have a problem with a SPARQL query, can you help me debug this? Attached is
|
|
the query and a screenshot showing the result.
|
|
|
|
--Boundary-00=_PjtiLU2PvHpvp/R
|
|
Content-Type: text/plain;
|
|
charset="UTF-8";
|
|
name="query.txt"
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Disposition: attachment;
|
|
filename="query.txt"
|
|
|
|
prefix nco:<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>
|
|
|
|
SELECT ?person
|
|
WHERE {
|
|
?person a nco:PersonContact .
|
|
?person nco:birthDate ?birthDate .
|
|
}"
|
|
--Boundary-00=_PjtiLU2PvHpvp/R
|
|
Content-Type: image/png;
|
|
name="screenshot.png"
|
|
Content-Transfer-Encoding: base64
|
|
Content-Disposition: attachment;
|
|
filename="screenshot.png"
|
|
|
|
AAAAyAAAAAEBBAABAAAAyAAAAA0BAgATAAAAcQAAABIBAwABAAAAAQAAADEBAgAPAAAAhAAAAGmH
|
|
[SNIP]
|
|
YXJlLmpwZWcAZGlnaUthbS0w
|
|
|
|
--Boundary-00=_PjtiLU2PvHpvp/R--
|
|
\endverbatim
|
|
The above example message consists of three MIME parts: The main text part and two attachments.
|
|
One attachment has the media type \c text, therefore a charset parameter is necessary to correctly
|
|
display it. The MIME tree looks like this:
|
|
\verbatim
|
|
multipart/mixed
|
|
|- text/plain
|
|
|- text/plain
|
|
\- image/jpeg
|
|
\endverbatim
|
|
|
|
\subsubsection multipart-alternative HTML Messages
|
|
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
Subject: HTML test
|
|
Date: Thu, 4 Mar 2010 13:59:18 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/alternative;
|
|
boundary="Boundary-01=_m66jLd2/vZrH5oe"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
--Boundary-01=_m66jLd2/vZrH5oe
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Hello World
|
|
|
|
--Boundary-01=_m66jLd2/vZrH5oe
|
|
Content-Type: text/html;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
|
|
<html>
|
|
<head></head>
|
|
<body>
|
|
Hello <b>World</b>
|
|
</body>
|
|
</html>
|
|
--Boundary-01=_m66jLd2/vZrH5oe--
|
|
\endverbatim
|
|
|
|
The above example is a simple HTML message, it consists of a plain text and a HTML part, which are
|
|
in a \b multipart/alternative container. The message has the following structure:
|
|
\verbatim
|
|
multipart/alternative
|
|
|- text/plain
|
|
\- text/html
|
|
\endverbatim
|
|
The HTML part and the plain text part have the identical content, except that the HTML part contains
|
|
additional markup, in this case for displaying the word \c World in bold. Since those parts are in a
|
|
multipart/alternative container, the message viewer application can freely chose which part it displays.
|
|
Some users might prefer reading the message in HTML format, some might prefer reading the message
|
|
in plain text format.
|
|
|
|
Of course, a HTML message could also consist only of a single \c text/html, without the multipart/alternative
|
|
container and therefore without an alternative plain text part. However, people prefering the plain
|
|
text version wouldn't like this, especially if their mail client has no HTML engine and they would see
|
|
the HTML source including all tags only. Therefore, HTML messages should always include an alternative plain text part.
|
|
|
|
HTML messages can of course also contain attachments. In this case, the message contains both a
|
|
multipart/alternative and a multipart/mixed node, for example with the following structure, for a HTML
|
|
message that has an image attachment:
|
|
\verbatim
|
|
multipart/mixed
|
|
|- multipart/alternative
|
|
| |- text/plain
|
|
| \- text/html
|
|
\- image/png
|
|
\endverbatim
|
|
|
|
The message itself would look like this:
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
Subject: HTML message with an attachment
|
|
Date: Thu, 4 Mar 2010 15:20:26 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: Multipart/Mixed;
|
|
boundary="Boundary-00=_qG8jLwWCwkUfJV1"
|
|
|
|
--Boundary-00=_qG8jLwWCwkUfJV1
|
|
Content-Type: multipart/alternative;
|
|
boundary="Boundary-01=_qG8jLfs1FRmlOhl"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
--Boundary-01=_qG8jLfs1FRmlOhl
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Hello World
|
|
|
|
--Boundary-01=_qG8jLfs1FRmlOhl
|
|
Content-Type: text/html;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
|
|
<html>
|
|
<head></head>
|
|
<body>
|
|
Hello <b>World</b>
|
|
</body>
|
|
</html>
|
|
--Boundary-01=_qG8jLfs1FRmlOhl--
|
|
|
|
--Boundary-00=_qG8jLwWCwkUfJV1
|
|
Content-Type: image/png;
|
|
name="test.png"
|
|
Content-Transfer-Encoding: base64
|
|
Content-Disposition: attachment;
|
|
filename="test.png"
|
|
|
|
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAACXBIWXMAAA8SAAAPEgEhm/IzAAAC
|
|
[SNIP]
|
|
eFkXsFgBMG4fJhYlx+iyB3cLpNZwYr/iP7teTwNYa7DZAAAAAElFTkSuQmCC
|
|
|
|
--Boundary-00=_qG8jLwWCwkUfJV1--
|
|
|
|
\endverbatim
|
|
|
|
\subsubsection multipart-related HTML Messages with Inline Images
|
|
|
|
HTML has support for showing images, with the \c img tag. Such an image is shown at the place where
|
|
the \c img tag occurs, which is called an <b>inline image</b>. Note that inline images are different
|
|
from images that are just normal attachments: Normal attachments are always shown at the beginning or
|
|
at the end of the message, while inline images are shown in-place. In HTML, the \c img tag points to an
|
|
image file that is either a file on disk or an URL to an image on the Internet. To make inline images
|
|
work with MIME messages, a different mechanism is needed, since the image is not a file on disk or on
|
|
the Internet, but a MIME part somewhere in the same message. As specified in
|
|
<a href="http://tools.ietf.org/html/rfc2557">RFC 2557</a>, the way this can be done is by refering
|
|
to a \b Content-ID in the \c img tag, and marking the MIME part that is the image with that content
|
|
ID as well.
|
|
|
|
An example will probably be more clear than this explaination:
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
Subject: Inine Image Test
|
|
Date: Thu, 4 Mar 2010 16:54:53 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/related;
|
|
boundary="Boundary-02=_Nf9jLpJ2aGp5RQK"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
--Boundary-02=_Nf9jLpJ2aGp5RQK
|
|
Content-Type: multipart/alternative;
|
|
boundary="Boundary-01=_Nf9jLZ6aPhm3WrN"
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Disposition: inline
|
|
|
|
--Boundary-01=_Nf9jLZ6aPhm3WrN
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Text before image
|
|
|
|
Text after image
|
|
|
|
--Boundary-01=_Nf9jLZ6aPhm3WrN
|
|
Content-Type: text/html;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
|
|
<html>
|
|
<head></head>
|
|
<body>
|
|
Text before image<br>
|
|
<img src="cid:547730348@KDE" /><br>
|
|
Text after image
|
|
</body>
|
|
</html>
|
|
--Boundary-01=_Nf9jLZ6aPhm3WrN--
|
|
|
|
--Boundary-02=_Nf9jLpJ2aGp5RQK
|
|
Content-Type: image/png;
|
|
name="test.png"
|
|
Content-Transfer-Encoding: base64
|
|
Content-Id: <547730348@KDE>
|
|
|
|
iVBORw0KGgoAAAANSUhEUgAAAMgAAADICAIAAAAiOjnJAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAg
|
|
[SNIP]
|
|
AABJRU5ErkJggg==
|
|
--Boundary-02=_Nf9jLpJ2aGp5RQK--
|
|
\endverbatim
|
|
|
|
The first thing you'll notice in this example probably is that it has a \b multipart/related node with
|
|
the following structure:
|
|
\verbatim
|
|
multipart/related
|
|
|- multipart/alternative
|
|
| |- text/plain
|
|
| \- text/html
|
|
\- image/png
|
|
\endverbatim
|
|
|
|
When the HTML part has inline image, the HTML part and its image part both have to be children of a
|
|
multipart/related container, like in this example.
|
|
In this case, the \c img tag has the source \c cid:547730348@KDE, which is a placeholder that refers
|
|
to the Content-Id header of another part. The image part contains exactly that value in its \c Content-Id
|
|
header, and therefore a message viewer application can connect both.
|
|
|
|
The plain text part can not have inline images, therefore its text might seem a bit confusing.
|
|
|
|
HTML messages with inline images can of course also have attachments, in which the message structure
|
|
becomes a mix of multipart/related, multipart/alternative and multipart/mixed. The following example
|
|
shows the structure of a message with two inline images and one \c .tar.gz attachment:
|
|
\verbatim
|
|
multipart/mixed
|
|
|- multipart/related
|
|
| |- multipart/alternative
|
|
| | |- text/plain
|
|
| | \- text/html
|
|
| |- image/png
|
|
| \- image/png
|
|
\- application/x-compressed-tar
|
|
\endverbatim
|
|
|
|
The structure of MIME messages can get arbitrarily complex, the above is just one relativley simply example.
|
|
The nesting of multipart nodes can get much deeper, there is no restriction on nesting levels.
|
|
|
|
\subsubsection encapsulated Encapsulated messages
|
|
|
|
Encapsulated messages are messages which are attachments to another message. The most common example
|
|
is a forwareded mail, like in this example:
|
|
\verbatim
|
|
From: Frank <frank@domain.com>
|
|
To: Bob <bob@domain.com>
|
|
Subject: Fwd: Blub
|
|
MIME-Version: 1.0
|
|
Content-Type: Multipart/Mixed;
|
|
boundary="Boundary-00=_sX+jLVPkV1bLFdZ"
|
|
|
|
--Boundary-00=_sX+jLVPkV1bLFdZ
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Hi Bob,
|
|
|
|
hereby I forward you an interesting message from Greg.
|
|
|
|
--Boundary-00=_sX+jLVPkV1bLFdZ
|
|
Content-Type: message/rfc822;
|
|
name="forwarded message"
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Description: Forwarded Message
|
|
Content-Disposition: inline
|
|
|
|
From: Greg <greg@domain.com>
|
|
To: Frank <frank@domain.com>
|
|
Subject: Blub
|
|
MIME-Version: 1.0
|
|
Content-Type: Text/Plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Bla Bla Bla
|
|
|
|
--Boundary-00=_sX+jLVPkV1bLFdZ--
|
|
\endverbatim
|
|
|
|
\verbatim
|
|
multipart/mixed
|
|
|- text/plain
|
|
\- message/rfc822
|
|
\- text/plain
|
|
\endverbatim
|
|
|
|
The attached message is treated like any other attachment, and therefore the top-level content type
|
|
is multipart/mixed.
|
|
The most interesting part is the \c message/rfc822 MIME part. As usual, it has some MIME headers, like
|
|
\c Content-Type or \c Content-Disposition, followed by the MIME body. The MIME body in this case is
|
|
the attached message. Since it is a message, it consists of a header and a body itself.
|
|
Therefore, the \c message/rfc822 MIME part appears to have two headers; in reality, it is the normal
|
|
MIME header and the message header of the encapsulated message. The message header and the message body
|
|
are both in the MIME body of the \c message/rfc822 MIME part.
|
|
|
|
\subsubsection crypto Signed and Encryped Messages
|
|
|
|
MIME messages can be cryptographically signed and/or encrypted. The format for those messages is
|
|
defined in <a href="http://tools.ietf.org/html/rfc1847">RFC 1847</a>, which specifies two new
|
|
multipart subtypes, \b multipart/signed and \b multipart/encrypted. The crypto format of these new
|
|
security multiparts is defined in additional RFCs; the most common formats are
|
|
<a href="http://tools.ietf.org/html/rfc3156">OpenPGP</a> and
|
|
<a href="http://tools.ietf.org/html/rfc2633">S/MIME</a>. Both formats use the principle of
|
|
<a href="http://en.wikipedia.org/wiki/Public-key_cryptography">public-key cryptography</a>. OpenPGP
|
|
uses \b keys, and S/MIME uses \b certificates. For easier text flow, only the term \c key will be used
|
|
for both keys and certificates in the text below.
|
|
|
|
Security multiparts only sign or encrypt a specifc MIME part. The consequence is that the message headers
|
|
can not be signed or encrypted. Also this means that it is possible to sign or encrypt only some of
|
|
the MIME parts of a message, while leaving other MIME parts unsigned or unencrypted. Furthermore, it
|
|
is possible to sign or encrypt different MIME parts with different crypto formats. As you can see,
|
|
security multiparts are very flexible.
|
|
|
|
Security multiparts are not supported by KMime. However, it is possible for applications to use KMime
|
|
when providing support for crypto messages. For example, the
|
|
<a href="http://api.kde.org/4.x-api/kdepim-apidocs/messageviewer/html/index.html">messageviewer</a>
|
|
component in KDEPIM supports signed and encrypted MIME parts, and the
|
|
<a href="http://websvn.kde.org/trunk/KDE/kdepim/messagecomposer/">messagecomposer</a> library can create
|
|
such messages.
|
|
|
|
Signed MIME parts are signed with the private key of the sender, everybody who has the
|
|
public key of the sender can verifiy the signature. Encrypted MIME parts are encrypted with the public
|
|
key of the receiver, and only the receiver, who is the sole person possessing the private key, can decrypt
|
|
it. Sending an encrypted message to multiple recipients therefore means that the message has to be sent
|
|
multiple times, once for each receiver, as each message needs to be encrypted with a different key.
|
|
|
|
\par Signed MIME parts
|
|
|
|
A multipart/signed MIME part has exactly two children: The first child is the content that is signed,
|
|
and the second child is the signature.
|
|
|
|
\verbatim
|
|
From: Thomas McGuire <thomas@domain.com>
|
|
Subject: My Subject
|
|
Date: Mon, 15 Mar 2010 12:20:16 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/signed;
|
|
boundary="nextPart2567247.O5e8xBmMpa";
|
|
protocol="application/pgp-signature";
|
|
micalg=pgp-sha1
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
--nextPart2567247.O5e8xBmMpa
|
|
Content-Type: Text/Plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
Simple message
|
|
|
|
--nextPart2567247.O5e8xBmMpa
|
|
Content-Type: application/pgp-signature; name=signature.asc
|
|
Content-Description: This is a digitally signed message part.
|
|
|
|
-----BEGIN PGP SIGNATURE-----
|
|
Version: GnuPG v2.0.14 (GNU/Linux)
|
|
|
|
iEYEABECAAYFAkueF/UACgkQKglv3sO8a1MdTACgnBEP6ZUal931Vwu7PyiXT1bn
|
|
Zr0Anj4bAI9JhHEDiwA/iwrWGfSC+Nlz
|
|
=d2ol
|
|
-----END PGP SIGNATURE-----
|
|
--nextPart2567247.O5e8xBmMpa--
|
|
\endverbatim
|
|
|
|
\verbatim
|
|
multipart/signed
|
|
|- text/plain
|
|
\- application/pgp-signature
|
|
\endverbatim
|
|
The example here uses the OpenPGP format to sign a simply plain text message. Here, the text/plain
|
|
MIME part is signed, and the application/pgp-signature MIME part contains the signature data, which in
|
|
this case is ASCII-armored.
|
|
|
|
As said above, it is possible to sign only some MIME parts. A message which has a image/jpeg attachment
|
|
that is signed, but a main text part is not signed, has the following MIME structure:
|
|
\verbatim
|
|
multipart/mixed
|
|
|- text/plain
|
|
\- multipart/signed
|
|
|- image/jpeg
|
|
\- application/pgp-signature
|
|
\endverbatim
|
|
|
|
It is possible to sign multipart parts as well. Consider the above example that has a plain text part
|
|
and an image attachment. Those two parts can be signed together, with the following structure:
|
|
\verbatim
|
|
multipart/signed
|
|
|- multipart/mixed
|
|
| |- text/plain
|
|
| \- image/jpeg
|
|
\- application/pgp-signature
|
|
\endverbatim
|
|
|
|
Signed messages in the S/MIME format use a different content type for the signature data, like here:
|
|
\verbatim
|
|
multipart/signed
|
|
|- text/plain
|
|
\- application/x-pkcs7-signature
|
|
\endverbatim
|
|
|
|
\par Encrypted MIME parts
|
|
|
|
Multipart/encrypted MIME parts also have exactly two children: The first child contains metadata about
|
|
the encrypted data, such as a version number. The second child then contains the actual encrypted data.
|
|
|
|
\verbatim
|
|
From: someone@domain.com
|
|
To: Thomas McGuire <thomas@domain.com>
|
|
Subject: Encrypted message
|
|
Date: Mon, 15 Mar 2010 12:50:16 +0100
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/encrypted;
|
|
boundary="nextPart2726747.j47xUGTWKg";
|
|
protocol="application/pgp-encrypted"
|
|
Content-Transfer-Encoding: 7bit
|
|
|
|
--nextPart2726747.j47xUGTWKg
|
|
Content-Type: application/pgp-encrypted
|
|
Content-Disposition: attachment
|
|
|
|
Version: 1
|
|
--nextPart2726747.j47xUGTWKg
|
|
Content-Type: application/octet-stream
|
|
Content-Disposition: inline; filename="msg.asc"
|
|
|
|
-----BEGIN PGP MESSAGE-----
|
|
Version: GnuPG v2.0.14 (GNU/Linux)
|
|
|
|
hQIOA8p5rdC5CBNfEAf+NZVzVq48C1r5opOOiWV96+FUzIWuMQ6u8fzFgI7YVyCn
|
|
[SNIP]
|
|
=reNr
|
|
--nextPart2726747.j47xUGTWKg--
|
|
-----END PGP MESSAGE-----
|
|
\endverbatim
|
|
|
|
\verbatim
|
|
multipart/encrypted
|
|
|- application/pgp-encrypted
|
|
\- application/octet-stream
|
|
\endverbatim
|
|
|
|
The encrypted data is contained in the \c application/octet-stream MIME part. Without decrypting
|
|
the data, it is unknown what the original content type of the encrypted MIME data is! The encrypted
|
|
data could be a simple text/plain MIME part, an image attachment, or a multipart part. The encrypted
|
|
data contains both the MIME header and the MIME body of the original MIME part, as the header is needed
|
|
to know the content type of the data. The data could as well by of content type multipart/signed, in
|
|
which case the message would be both signed and encrypted.
|
|
|
|
\par Inline cryto formats
|
|
|
|
Although using the security multiparts \c multipart/signed and \c multipart/encrypted is the recommended
|
|
standard, there are other possibilities to sign or encrypt a message. The most common methods are
|
|
<b>Inline OpenPGP</b> and <b>S/MIME Opaque</b>.
|
|
|
|
For inline OpenPGP messages, the crypto data is contained inlined in the actual MIME part. For example,
|
|
a message with a signed text/plain part might look like this:
|
|
\verbatim
|
|
From: someone@domain.com
|
|
To: someoneelse@domain.com
|
|
Subject: Inline OpenPGP test
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Disposition: inline
|
|
|
|
-----BEGIN PGP SIGNED MESSAGE-----
|
|
Hash: SHA1
|
|
|
|
Inline OpenPGP signed example.
|
|
-----BEGIN PGP SIGNATURE-----
|
|
Version: GnuPG v2.0.14 (GNU/Linux)
|
|
|
|
iEYEARECAAYFAkueJ2EACgkQKglv3sO8a1MS3QCfcsYnJG7uYQxzxz6J5cPF7lHz
|
|
WIoAn3PjVPlWibu02dfdFObwd2eJ1jAW
|
|
=p3uO
|
|
-----END PGP SIGNATURE-----
|
|
|
|
\endverbatim
|
|
|
|
Encrypted inline OpenPGP works in a similar way. Opaque S/MIME messages are also similar: For signed
|
|
MIME parts, both the signature and the signed data are contained in a single MIME part with a content
|
|
type of \c application/pkcs7-mime.
|
|
|
|
As security multiparts are preferred over inline OpenPGP and over opaque S/MIME, I won't go into more
|
|
detail here.
|
|
|
|
\subsubsection misc Miscellaneous Points about Messages
|
|
|
|
\par Line Breaks
|
|
|
|
Each line in a MIME message has to end with a \b CRLF, which is a carriage return followed by a
|
|
newline, which is the escape sequence<code>\\r\\n</code>. CR and LF may not appear in other places in
|
|
a MIME message. Special care needs to be taken with encoded linebreaks in binary data, and with
|
|
distinguishing soft and hard line breaks when converting between different content transfer encodings.
|
|
For more details, have a look at the RFCs.
|
|
|
|
While the official format is to have a CRLF at the end of each line, KMime only expects a single LF
|
|
for its in-memory storage. Therefore, when loading a message from disk or from a server into KMime, the CRLFs need
|
|
to be converted to LFs first, for example with KMime::CRLFtoLF(). The opposite needs to be done when
|
|
storing a KMime message somewhere.
|
|
|
|
Lines should not be longer than 78 characters and may not be longer than 998 characters.
|
|
|
|
\par Header Folding and CFWS
|
|
|
|
Header fields can span multiple lines, which was already shown in some of the examples above where
|
|
the parameters of the header field value were in the next line. The header field is said to be
|
|
\b folded in this case. In general, header fields can be folded whenever whitespace (\b WS) occurs.
|
|
|
|
Header field values can contain \b comments; these comments are semantically invisible and have no
|
|
meaning. Comments are surrouned by parentheses.
|
|
\verbatim
|
|
Date: Thu, 13
|
|
Feb 1969 23:32 -0330 (Newfoundland Time)
|
|
\endverbatim
|
|
|
|
This example shows a folded header that also has a comment (<i>Newfoundland Time</i>). The date header is a structured header
|
|
field, and therefore it has to obey to a defined syntax; however, adding comments and whitespace is
|
|
allowed almost anywhere, and they are ignored when parsing the message. Comments and whitespace where
|
|
folding is allowed is sometimes referred to as \b CFWS. Any occurence of CFWS is semantically regarded
|
|
as a single space.
|
|
|
|
\section string-broken-down The two in-memory representations of messages
|
|
|
|
There are two representations of messages in memory. The first is called <b>string representation</b>
|
|
and the other one is called <b>broken-down representation</b>.
|
|
|
|
String representation is somehow misnamed,
|
|
a better term would be <c>byte array representation</c>. The string representation is just a big array of
|
|
bytes in memory, and those bytes make up the encoded mail. The string representation is what is stored
|
|
on disk or what is received from an IMAP server, for example.
|
|
|
|
With the broken-down representation, the mail is <i>broken down</i> into smaller structures. For example,
|
|
instead of having a single byte array for all headers, the broken-down structure has a list of individual headers,
|
|
and each header in that list is again broken down into a structure. While the string representation
|
|
is just an array of 7 bit characters that might be encoded, the broken-down representations contain the
|
|
decoded text strings.
|
|
|
|
As an example, conside the byte array
|
|
\verbatim
|
|
"Hugo Maier" <hugo.maier@mailer.domain>
|
|
\endverbatim
|
|
|
|
Although this is just a bunch of 7 bit characters, a human immediatley recognizes the broken-down structure and
|
|
sees that the display name is "Hugo Maier" and that the localpart of the email address is "hugo.maier".
|
|
To illustrate, the broken-down structure could be stored in a structure like this:
|
|
\verbatim
|
|
struct Mailbox
|
|
{
|
|
QString displayName;
|
|
QByteArray addressSpec;
|
|
};
|
|
\endverbatim
|
|
The address spec actually could be broken down further into a localpart and a domain.
|
|
The process of converting the string representation to a broken-down representation is called \b parsing, and
|
|
the reverse is called \b assembling.
|
|
Parsing a message is necessary when wanting to access or modify the broken-down structure. For example, when sending a mail,
|
|
the address spec of a mailbox needs to be passed to the SMTP server, which means that the recipient headers need to
|
|
be parsed in order to access that information. Another example is the message list in an mail application, where the
|
|
broken-down structure of a mail is needed
|
|
to display information like subject, sender and date in the list.
|
|
On the other hand, assembling a message is for example done in the composer of a mail application, where the mail information
|
|
is available in a broken-down form in the composer window, and is then assembled into a final MIME message that is then sent with SMTP.
|
|
|
|
Parsing is often quite tricky, you should always use the methods from KMime instead of writing parsing
|
|
routines yourself. Even the simple mailbox example above is in pratice difficult to parse, as many things like comments
|
|
and escaped characters need to be taken into consideration.
|
|
The same is true for assembling: In the above case, one could be tempted to assemble the mailbox by simply
|
|
writting code like this:
|
|
\verbatim
|
|
QByteArray stringRepresentation = '"' + displayName + "\" <" + addressSpec + ">";
|
|
\endverbatim
|
|
However, just like with parsing, you shouldn't be doing assembling yourself. In the above case, for example,
|
|
the display name might contain non-ASCII characters, and RFC2047 encoding would need to be applied. So use
|
|
KMime for assembling in all cases.
|
|
|
|
When parsing a message and assembling it afterwards, the result might not be the same as the original byte
|
|
array. For example, comments in header fields are ignored during parsing and not stored in the broken-down
|
|
structure, therefore the assembled message will also not contain comments.
|
|
|
|
Messages in memory are usually stored in a broken-down structure so that it is easy to to access and
|
|
manipulate the message. On disk and on servers, messages are stored in string representation.
|
|
|
|
\section classes-overview Overview of KMime classes
|
|
|
|
KMime has basically two sets of classes: Classes for headers and classes for MIME
|
|
parts. A MIME part is represented by \c KMime::Content. A Content can be parsed from a string representation
|
|
and also be assembled from the broken-down representation again. If parsed, it has a list of sub-contents (in case of multipart contents) and a
|
|
list of headers. If the Content is not parsed, it stores the headers and the body in a byte array, which can be accessed
|
|
with head() and body().
|
|
There is also a class \c KMime::Message, which basically is a thin wrapper around Content for the top-level
|
|
MIME part. Message also contains convenience methods to access the message headers.
|
|
|
|
For headers, there is a class hierachy, with \c KMime::Headers::Base as the base class, and
|
|
\c KMime::Headers::Generics::Structured and \c KMime::Headers::Generics::Unstructured in the next levels. Unstructured is
|
|
for headers that don't have a defined structure, like Subject, whereas Structured headers have a
|
|
specific structure, like Date. The header classes have methods to parse headers, like from7BitString(),
|
|
and to assemble them, like as7BitString(). Once a header is parsed, the classes provide access to the
|
|
broken-down structures, for example the Date header has a method dateTime().
|
|
The parsing in from7BitString() is usually handled by a protected parse() function, which in turn call
|
|
parsing functions for different types, like parseAddressList() or parseAddrSpec() from the \c KMime::HeaderParsing
|
|
namespace.
|
|
|
|
When modifing messages, the message is first parsed into a broken-down representation. This broken-down
|
|
representation can then be accessed and modified with the appropriate functions. After changing the broken-down
|
|
structure, it needs to be assembled again to get the modified string representation.
|
|
|
|
KMime also comes with some codes for handling base64 and quoted-printable encoding, with \c KMime::Codec
|
|
as the base class.
|
|
|
|
\section rfcs RFCs
|
|
|
|
\li <a href="http://tools.ietf.org/html/rfc5322">RFC 5322</a>: Internet Message Format
|
|
\li <a href="http://tools.ietf.org/html/rfc5536">RFC 5536</a>: Netnews Article Format
|
|
\li <a href="http://tools.ietf.org/html/rfc2045">RFC 2045</a>: Multipurpose Internet Mail Extensions (MIME), Part 1: Format of Internet Message Bodies
|
|
\li <a href="http://tools.ietf.org/html/rfc2046">RFC 2046</a>: Multipurpose Internet Mail Extensions (MIME), Part 2: Media Types
|
|
\li <a href="http://tools.ietf.org/html/rfc2047">RFC 2047</a>: Multipurpose Internet Mail Extensions (MIME), Part 3: Message Header Extensions for Non-ASCII Text
|
|
\li <a href="http://tools.ietf.org/html/rfc2048">RFC 2048</a>: Multipurpose Internet Mail Extensions (MIME), Part 4: Registration Procedures
|
|
\li <a href="http://tools.ietf.org/html/rfc2049">RFC 2049</a>: Multipurpose Internet Mail Extensions (MIME), Part 5: Conformance Criteria and Examples
|
|
\li <a href="http://tools.ietf.org/html/rfc2231">RFC 2231</a>: MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
|
|
\li <a href="http://tools.ietf.org/html/rfc2183">RFC 2183</a>: Communicating Presentation Information in Internet Message: The Content-Disposition Header Field
|
|
\li <a href="http://tools.ietf.org/html/rfc2557">RFC 2557</a>: MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)
|
|
\li <a href="http://tools.ietf.org/html/rfc1847">RFC 1847</a>: Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
|
|
\li <a href="http://tools.ietf.org/html/rfc3851">RFC 3851</a>: S/MIME Version 3 Message Specification
|
|
\li <a href="http://tools.ietf.org/html/rfc3156">RFC 3156</a>: MIME Security with OpenPGP
|
|
\li <a href="http://tools.ietf.org/html/rfc2298">RFC 2298</a>: An Extensible Message Format for Message Disposition Notifications
|
|
\li <a href="http://tools.ietf.org/html/rfc2646">RFC 2646</a>: The Text/Plain Format Parameter (not supported by KMime)
|
|
|
|
\section links Further Reading
|
|
\li <a href="http://en.wikipedia.org/wiki/MIME">Wikipedia article on MIME</a>\n
|
|
\li <a href="http://www.joelonsoftware.com/articles/Unicode.html">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a>
|
|
\li <a href="http://www.cs.tut.fi/~jkorpela/chars.html">A tutorial on character code issues</a>
|
|
\li <a href="http://www.motobit.com/util/base64-decoder-encoder.asp">Online Base64 encoder and decoder</a>
|
|
\li <a href="http://www.motobit.com/util/quoted-printable-encoder.asp">Online quoted-printable encoder</a>
|
|
\li <a href="http://www.motobit.com/util/quoted-printable-decoder.asp">Online quoted-printable decoder</a>
|
|
\li <a href="http://www.motobit.com/util/charset-codepage-conversion.asp">Online charset converter</a>
|
|
\li <a href="http://en.wikipedia.org/wiki/Public-key_cryptography">Wikipedia article on public-key cryptography</a>
|
|
|
|
\authors
|
|
|
|
The major authors of this library are:
|
|
\li Christian Gebauer
|
|
\li Volker Krause \<vkrause@kde.org\>
|
|
\li Marc Mutz \<mutz@kde.org\>
|
|
\li Christian Thurner \<cthurner@freepage.de\>
|
|
\li Tom Albers \<tomalbers@kde.nl\>
|
|
\li Thomas McGuire \<mcguire@kde.org\>
|
|
|
|
This document was written by:\n
|
|
\li Thomas McGuire \<mcguire@kde.org\>
|
|
|
|
\maintainers
|
|
\li Volker Krause \<vkrause@kde.org\>
|
|
\li Marc Mutz \<mutz@kde.org\>
|
|
|
|
\licenses
|
|
\lgpl
|
|
|
|
*/
|
|
|
|
// DOXYGEN_PROJECTNAME=KMIME Library
|