|
Web Developer's Journal Archive SectionThis article is out of date, but may still be useful to some readers.A Technical Overview of MIME
(Multipurpose Internet Mail Extensions)
|
||||||
|
IntroductionThe current specifications for the format of e-mail and its transfer on the Internet were defined almost 13 years ago. As e-mail has become more widely used, and the range of activities it can be and is used for has grown, so has the need to extend the capabilities of the basic e-mail message. In June 1992 Borenstein and Freed published RFC-1341 , which defines Multipurpose Internet Mail Extensions, or MIME. MIME defines a standard method for extending the capabilities of the RFC 822 e-mail message without breaking those delivery systems which are based only on RFC 821 and 822 specifications. It also makes use of a number of extensions to e-mail which have been added in the interim, such as the standard for message encapsulation and the Content-Type header field.This document was updated with RFC-1521 , and a draft of another revision has been published. In addition to the MIME specification itself, there are a number of papers, both RFC's and drafts, which discuss additional ways MIME can be used. This paper attempts to give an overview of MIME, considering the ideas in these additional sources as well as in the standard itself. Baseline e-mail: RFC-821 and 822Since MIME is an extension to a pre-existing standard, it's necessary to take a look at that standard to understand what MIME does and how it works.RFC 822 defines the format of e-mail messages as being broken into two parts, the header and the body. The header contains control information - fields which user agents and gateways may examine and use to accomplish their tasks. The body is only of interest to the end user, and is seldom if ever examined by agents or gateways. The only limitation on what can be in the header and body is imposed by RFC 821, which explains the mechanism for transporting e-mail between networks. This standard requires that all data transferred must be 7-bit US-ASCII, and divided into lines of 1000 characters or less. These early RFC's, as well as more recent additions, define certain header fields which software based on those specifications may examine, but allow additional headers to be included. If a program encounters a field which it doesn't understand, it must ignore it. These attributes of the base e-mail standard are what allows MIME to add to e-mail capabilities without breaking older software. The Goals of MIMEThe basic idea of MIME is to allow an e-mail message's body to contain data other than plain, US-ASCII human readable text. The first step is to permit text in other character sets than US-ASCII, perhaps even text which uses special formatting such as enriched text or HTML.The next step is to allow message bodies to consist of data which can be anything, including unformatted bitstreams. In itself this is simple, but MIME provides a systematic way of identifying the nature of this data so that the receiving user's software, referred to as a mail-agent, can decide what to do with this data automatically. An example of this is launching an external program to play an audio message. An additional feature desired by the designers of MIME was to be able to include multiple components, possibly of different types, in a single e-mail message. In order to accomplish these goals, MIME provides two main things. One is a method to identify the data type of the message body. The other is a way to transport data which doesn't obey the restrictions of SMTP. MIME Header FieldsThe first of these requirements is handled with the addition of new header fields to those provided by earlier standards. The first header field is the MIME-Version-Header, which is intended to inform mail processing agents what version of the MIME standard the message has been prepared to comply with. The current version is 1.0, and the current draft does not change this.The Content-Type header field is used to identify what is in the body of the message. The MIME designers borrowed the Content-Type field from RFC 1049 . Although RFC 1521 states that the syntax between the MIME Content-Type and RFC 1049 Content-Type fields are "largely compatible", I'm skeptical that implementations will be compatible unless the software designer specifically takes both versions into consideration. The MIME Content-Type field is divided into two parts, the type and subtype, separated by a slash, with the possibility of parameters depending on the type/subtype. The types are intended to be very general categories, with the possibility that knowing the type may allow the user agent software to decide what to do with even an unknown subtype. The types defined by RFC-1521 are text, image, audio, video, application, message and multipart. Subtypes are supposed to provide a more specific description of the data, enough to determine what external software agent will be capable of handling it properly. This may be assisted by parameters which indicate which version of the data format standard the data complies with. Another example of a useful parameter is the charset parameter to the type text/plain. The charset parameter indicates which character set the text is in, allowing for non US-ASCII character sets. We are still left with the problem of transferring data types which use 8 bit characters, and/or have continuous strings of longer than 1000 characters. This can be handled by representing non-compliant data in a format that meets these criteria, and translating it back to its original form upon receipt. MIME adds the Content-Transfer-Encoding field to indicate what encoding format the data uses. The value for this field may indicate 7 or 8 bit unencoded data, or one of two standard encoding schemes detailed by MIME. The MIME standard discourages the use of other encoding schemes. The MIME schemes are "printed quotable" and base64. Printed quotable results in data that may be largely readable if the original format was text, while base64 is a more efficient algorithm which is unlikely to have any relevance to human readers. Note that MIME prohibits encoding composite messages, which are messages of multipart and message types. If components within a message are not suitable for SMTP transfer, they are to be individually encoded. The rationale for this is to prevent nested encodings, which require a user agent program to perform decoding operations multiple times. The MIME designers decided this would introduce too much complexity and hurt performance. An alternative design would be to set encoding types according to the Content-Type field, which would have made the Content-Transfer-Encoding field unnecessary. For example, all messages of type audio could be declared to use base64 encoding. However, this would have required messages which are transported by systems less restrictive than SMTP to suffer the overhead of compensating for SMTP's restrictions. An audio message being transported over a system which allows bitstream data would be base64 encoded and decoded for no reason. Two additional header fields specified by MIME are Content-ID and Content-Description. The Content-ID field is similar in purpose to the Message-ID field, but makes allowance for different messages to use the same body. The primary reason for this is to permit mail transfer agents to cache content, as we'll see later with the message/external-body message type. Content-Description field is intended for a human readable text description of the contents of the message, such as a caption for an image. Content-Type ValuesThere are a variety of content-types described in MIME, and more are proposed by Internet standards drafts. However, we'll take a look at some of the more interesting fields here.ApplicationApplication types are meant for discrete data which doesn't fit into any other category, especially that which requires processing by some external program before it is presentable to the user. This is used most often for file transfer, including documents in formats specific to some application such as word processors and spreadsheets. Another use is for "active", computational e-mail, which allows messages to be formatted in a special language and automatically executed in the recipient's environment. Obviously this introduces some security concerns.The Octet-Stream subtype indicates arbitrary binary data. Unrecognized subtypes are treated as Application/Octet-Stream, and are recommended to be simply saved to a file after decoding. MessageThe Message Content-Type indicates that the body of the message is an e-mail message itself, including a header and body. The most generic subtype is RFC822, which indicates that the message is a standard Internet e-mail message. A more interesting subtype is partial. Currently, SMTP agents encountering messages too large to forward usually reject the message, often returning the entire message to the sender. The message/partial content type allows messages which are too large to be fragmented and sent on, much like IP datagram fragmentation.The external body subtype is even more exotic. A message of this type doesn't actually include the body of the message, but merely indicates where it is located. It may indicate that the receiving mail agent can retrieve it from a mail server via FTP or some other mechanism. It's possible that an intermediate gateway would retrieve the body and cache it, changing the header information to point to the new, theoretically more accessible location. An example of how this could be useful is if a message is sent to multiple recipients in a distant location, such as Australia. The gateway into Australia could retrieve the body and make it available to the Australian recipients, meaning the data would only be transferred from its origin to Australia once. Messages of type Multipart contain different components. Often these are formatted as e-mail messages, which allow them to have headers defining their content, but this isn't necessary. The mixed subtype specifies that the messages have different types, and the digest subtype indicates a series of e-mail messages within the message. Multipart/alternative indicates that the different components contain the same data in different formats, such as different languages or image formats, so the user or their software can choose the appropriate one to display. Multipart messages may in turn contain multipart messages themselves. MIME in Message HeadersIn RFC-1522 Keith Moore provides a scheme to allow non US-ASCII data to be included in header fields. This is intended for the inclusion of non US-ASCII character sets, rather than for binary data such as sound or video. Obviously some parts of certain fields, such as the address in the To field, should not be encoded. However, MIME header encoding can safely be used wherever standards describe a field's content as being text, such as the Subject field and comments within other fields. The reasoning for this is that data in these areas is intended entirely for human consumption, rather than for use by software.The MIME header standard defines a format for non US-ASCII headers involving a three part string delimited by special characters. The different fields within this string are the character set, which is the identifier for the character set to be used; the encoding, which is either the Base64 encoding or a variation of the quoted printable encoding; and the encoded text itself. Software for composing mail messages should translate the text as entered by the user into encoded form, and receiving software should reverse this encoding, all without the intervention of the user. MIME and Mail TransportAlthough MIME is designed so that RFC-821 compliant SMTP transport mechanisms can pass e-mail using MIME extensions without a hitch, MIME provides a number of opportunities for expanding the utility of these gateways. This is discussed in RFC-1344 , and some of the concepts proposed in that document are included in discussions above.One use of MIME would be to provide an improved handling of returned messages. The common practice of including the returned message as text within a standard message isn't very helpful when the rejected message contains non-textual data. For instance, even a properly MIME formatted message containing an audio clip will be included in the rejection message as unintelligible encodings by the typical SMTP gateway. A gateway which is wise to MIME could send rejection messages as a Content-Type: multipart/mixed, with one part being a plain text explanation of the reasons for rejection, and the other being the rejected message, which can easily be translated to its proper, human recognizable, format by a MIME compliant mail reader. Another useful tool provided to gateways by MIME is message fragmenting. As discussed above, messages which are currently rejected for being too large can be broken into fragments and reassembled at their destination. An additional suggestion in RFC-1344 is the automatic conversion between subtypes, such as GIF to JPEG, either for more efficient transport or due to the unavailability of appropriate external programs to handle a subtype. This seems ill-advised, as most data formats are specifically designed to favor one feature of data over another, and conversion often results in the loss of some parts of the original data. For instance, JPEG is much better suited to photo quality images than GIF, however it uses lossy compression which might degrade an image originally stored with GIF. A user who sends data to another in a specific format is likely to expect it to reach its destination in the same format. Also worthy of note is RFC-1652 , which proposes an SMTP extension to allow the transport of 8 bit octets in lines of 1000 or fewer characters. This would make encoding unnecessary for some types of message data which currently require it. However, it doesn't permit the transfer of unencoded bitstreams, which are likely to comprise an increasing portion of e-mail traffic. Extending SMTP to allow this is the subject of the Internet draft "SMTP Service Extensions for Transmission of Large and Binary MIME Messages." ConclusionBeyond giving e-mail users greatly expanded features, the MIME standard also provides elements which can be applied to other protocols. The application of MIME to Usenet article structure seems obvious, given that structure's close compatibility with RFC-822, although I'm not aware of any movement to do so. The HTTP protocol used for the World Wide Web (WWW) uses the MIME Content-Type specification to identify the nature of documents and files sent to HTTP client software (browsers).Many of the capabilities offered by MIME have been available without it, either through extensions to RFC-822 such as the Content-Type field, or through non-standard mechanisms like UUEncode. What MIME does is define a structured, comprehensive system for identifying the nature and encoding of an e-mail message's body in a standard manner. This standard fits neatly within the previously defined standards which are used by many thousands of users and sites around the world today, offering users greatly expanded features without breaking existing mail transportation systems. ReferencesBorenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September, 1993.Crocker, D., "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, UDEL, August 1982. Postel, J.B., "Simple Mail Transfer Protocol", STD 10, RFC 821, USC/Information Sciences Institute, August 1982. Borenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September, 1993. Borenstein, N., and N. Freed, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", Network Working Group Internet Draft, November, 1994. Sirbu, M., "Content-Type Header Field for Internet Messages", STD 11, RFC 1049, CMU, March 1988. Moore, K., "Representation of Non-ASCII Text in Internet Message Headers", RFC 1522, University of Tennessee, September 1993. Borenstein, N., "Implications of MIME for Internet Mail Gateways", RFC 1344, Bellcore, June 1992. Klensin, J., (WG Chair), Freed, N., (Editor), Rose, M., Stefferud, E., and Crocker, D., "SMTP Service Extension for 8bit-MIME transport", RFC 1652, United Nations University, Innosoft, Dover Beach Consulting, Inc., Network Management Associates, Inc., The Branch Office, February 1993. Vaudreuil, G., "SMTP Service Extensions for Transmission of Large and Binary MIME Messages", Network Working Group Internet Draft, Octel Network Services, August 1994. |
| Advertising Rates & Policies | Contact | Subscribe to Our Mailing List |
|
Web Developer's Journal |
| |||||||||||||||||||||||||||||