Technical features
This page of information is certainly the least accessible of this
web site. It is necessary to have a knowledge of the format of the
messages transmitted by electronic mail (in particular encoding
MIME) to follow all the details.
General
At the beginning, Internet protocols were developed for the United
States of America, before being used in other countries.
This internationalization required to take into account alphabets
richer than the American alphabet.
Moreover, after the basic texts, the protocols of electronic mail
integrated the possibility of sending texts with page-setting and
file of any nature.
Unfortunately, for each encountered problem, several distinct
technical solutions were adopted, without most of the time, such a
diversity had of practical interest.
In addition, the complexity of the standards makes that they are not
exactly respected by the mailers, due to difficulties of application,
or due to very aproximative implementation.
In practice, the majority of the mailers use a limited subset of the
possible formats of messages.
Two choices are thus possible:
- either to try to support as well as possible all the messages with
the risk to carry out a very complex application
- or to limit itself to the most current cases (which can nevertheless
include 99 % of the messages).
It is the second approach which was adopted for Libremail.
However, version after version, Libremail supports more and more particular
cases, without to having become much more complex than at the begining.
Processing of the received mails according to their structure
- If the received mail is limited to a zone of text of the text/plain
type, Libremail will display this text.
- In the case of messages multipart/alternative (at first some then
the equivalent in HTML), Libremail will display the text section of
the mail.
- If the mail is in pure HTML (text/html), Libremail does not make
conversion to display this kind of message in a way easy to read.
However, the tool suphtm is able to detect and remove before
downloading the messages in pure HTML.
- If a mail is of multipart/report type, Libremail will display the
various zones of text which it contains the ones following the
others.
There is no search for files joined in the mails of the
multipart/report type.
- In the case of a mail of the multipart/mixed type, Libremail will
display the text contained in the first section. According to case's,
it will be:
- the text contained in a section of the text/plain type,
- the text/plain under section of a multipart/alternative section,
- the text not converted of the text/html section if there is no
text/plain section
After this display, Libremail will add the list of the attached files
which could be recovered. The sections text/html, having a filename,
however will not be taken into account (since version 1.2.1).
If the mail contains sections message/rfc822, these sections will be
treated like mails, and their text will be added to the principal
mail's one.
On the other hand, if a mail of the multipart/mixed type comprises
several successive zones of text, only the first will be displayed.
In a multi section e-mail, one can choose (since version 1.1.0) to
display the text/html section (without conversion of the beacons) instead
of the text/plain section.
The multipart/related sections appearing inside other multipart
sections are taken into account (since version 1.2.1) for the
edges of sections pro cessing, without however that their presence
involve changes in the analysis of the mail.
Structure of the messages sent
For the sending of mails, Libremail is only limited to two structures
of message:
- mail only made up of one section text of the type text/plain
- mail of the multipart/mixed type composed of a section text/plain
followed one or more attached files.
Encoding of the characters
As specified higher, protocols Internet initially were American
before being internationalized. However, American population has 2
characteristics which distinguish them from the majority of the
people of planet:
- they have a disproportionate stock of massive destruction weapons,
- their language does not include any accent.
For the electronic mail, it is the 2nd point which is most important,
in particular because in the beginning, Internet protocols were
planned for a transmission of the characters on 7 bits.
Under these conditions the characters having the 8th significant bit
(set to 1) had to be encoded.
Remainder, even today where the transmission of the characters on 8
bits spread, the standard of transmission provides that the
characters of the heading of the messages having the 8th positioned
significant bit will be always encoded.
Two formats of encoding exist: the format "quoted printable" and the
format base64.
- For the display of the fields of the heading of the messages,
Libremail supports since the beginning the encoding quoted printable
(quasi universal) and since version 1.0.4 the encoding base64 (much
rarer and without practical interest with a European alphabet, except
may be to prevent a readable display with old mailers, and to
complicate the filtering of the mails directly by the server of
transport starting from the Subject: field).
- For the display of the contents received messages, libremail
accepts since the beginning the texts forwarded directly on 7 or 8
bits (without encoding visible during the reception), and the messages
encoded with the format quoted printable.
The base64 encoded messages are now converted (since version 1.1.0),
but with a jumps of line processing more rudimentary than for the
other formats. In all way, the use of this encoding for the texts
of the mails is very rare, and completely unjustified with a European
alphabet.
- For the recovery of the attached files, the quoted printable and
base64 encodings are both converted since the beginning, (what is
least things).
- For the sending of mails, Libremail carries out an encoding quoted
printable automatically fields of the heading which contain special
characters, whereas the body of the messages is transmitted under
8-bits and thus, without encoding.
- To send attached files, according to the contents of these files,
Libremail chooses between the encoding quoted printable and the
encoding base64, that which is the least cumbersome.
These technical choices are appropriate perfectly in the developed
countries (for example France), but are not can be adapted to other
areas of the world like Africa (to be checked).
If it proved that in these countries, the accentuated characters are
correctly transmitted in the fields of heading (in particular in the
subject of the mail), and in the enclose files, but not in the text
of the message, it would be necessary to create and use a modified
version of "envmail" so that these messages are transmitted with the
encoding quoted printable.
Recognized character sets
- A the origin, Libremail was conceived to work with the default
character set ISO-8859-15 or ISO-8859-1 when the symbol € (euro) is
not necessary.
It can thus display without conversion the mails resulting from a PC
working as well under Windows (up to version 98), as under some
distributions of GNU/Linux and other UNIX.
- A subset of the characters (nondisplayable in this state) included in
the interval 80h at 9Fh (used in particular on Mac) is converted in
its equivalent in the ISO-8859-15 set.
- Libremail also detects UTF-8 encoding and converts the corresponding
characters when they are equivalent to a character present in the A0h
interval at FFh of ISO-8859-15 set.
Those which correspond to a character of the interval 80h at 9Fh in
this ISO set are not converted. On the one hand one would lead to
nondisplayable characters, other share their encoding UTF-8 is much
more anarchistic than the one of the characters starting from A0h.
Nevertheless, other characters UTF-8, probably of the typographical
characters Mac, are also converted when it is possible.
- Since version 2.0 (and beta versions 1.9.2 and 1.9.3), Libremail
analyzes the variable of environment $LANG to detect the character
set (ISO-8859-n or UTF-8) used by the operating system.
The mails written with the same character set that that of the
operating system are posted without conversion, the others are
converted ISO-8859-1 with UTF-8 or UTF-8 with ISO-8859-15 to allow a
good display of the accentuated characters.
In the same way, the seizure of the mails can be done as well with
the character set ISO-8859-15 as the character set UTF-8 .
- Encoding UTF-7 is not suported by Libremail.
Hour and time zone
For the display of the dates and hours of forwarding of the messages,
Libremail did not take account of the time zones until version 2.1.4 .
Since version 2.2, the command vmailsj corrects the variations
(in full hours) between the time zone of the sender and the time zone
of the recipient and account it to make a chronological sorting of
emails using of different time zones.
The tools being used to visualize the contents of the emails continue
to display such as they are in the sender's emails dates and hours and
the time zone of the sender.