Hungary SIG #Hungary * Accents and umlauts continued (a bit long) #hungary

Tom Venetianer <tom.vene@...>

Hello friends,

The handling of accents in computers is VERY tricky. The first reason is
that English has no accents and computers were invented by the Americans
who initially made no provisions for such characters. Also, the first
computers run very short of memory because memory cost was *very* high. To
spare on memory, the so called 7 bits ASCII code was invented which
basically contained the upper and lover case alphabet, the numerals and the
most common signs such as + - . ; ( etc. The advantage was that it
economized 1 bit of memory in each byte (which is equal to 8 bits).

Later the computer geniuses discovered that there was a world outside the
US and decided to introduce the so called 8 bits ASCII code. It did contain
some of the accented characters but NOT ALL of them 9for instance, most of
the upper case accented characters are missing). Those who used DOS must
recall that to obtain an accented character one had to punch the 'Alt' key
with a combination of 3 or 4 numerals.

The above scheme worked well for a while but then the graphical interfaces
(Windows and the VGA graphic board) and laser printers created a new
problem with rendering characters. Now the characters had to be represented
by small dots, no more by the ASCII code which was good only for the so
called alphanumeric devices (such as were the old DOS machines). This
created the need for the so called "digital fonts", which are files
describing the form of each character when rendered by tiny dots.

The graphical interface was an improvement over the previous, allowing to
represent a larger number of accented characters. The Macintosh permits to
represent almost ALL accented characters, upper and lower cases and all
that jazz. The PC platform was stuck with its obsolete DOS base, thus it
implemented a modified 8 bits ASCII code, called ANSI. ANSI is similar to
the Macintosh code but not the same, which results in incompatable
character sets. And as the Unix machines became the standard for Internet
server, which of course handle characters rendering differently too, the
confusion was complete - no compatibility at all in a world which is
supposed to handle platforms transparently (meaning that the user should
not be concerned about what kind of operating systems the zillion computers
hooked to the Internet are using).

E-mail transmission, which is still basically alphanumeric, required a
special standard for transmitting messages. This is the raison d'etre of
the MIME protocol and encoding. Believe it or not, the guys who invented it
committed the same mistake inventing a 7 bits MIME code. The 8 bit MIME
code does exists but many of the email servers do not implement it.

This is why in email headers one receives the message:
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Contrary to common belief, it is NOT enough to configure ones email program
to work with 8 bits transmission and receiving, the email servers (that
which sent the message and the one which received it) must TALK that same
"language", i.e. 8 bits MIME. Notice that the header above says
Conten-TRANSFER-Encoding, meaning that the server SENT a 7 bits message.

Finally, in answer to Margarita's question, there is NO way to represent
the long umlaut (the character she calls the 'double acute accent') UNLESS
one uses a special set of digital fonts which contain such beasts. This is
one reason why Netscape allows for the configuration of different character

my 3 centavos today ;)


It is funny to observe that the message sent by <snip> came
garbled through. The reason of course is that he may have done everything
correctly but it is likely that either the JewishGen email server or his
provider's server is not configured for 8 bits transmission.

>| Subject: Subject: Accents and umlauts
>| From: <snip> <>
>| You may be getting all these formatting glitches when you paste text
>| >from documents created off the 'net with word processing software. I
>| work off the Macintosh platform and to accent vowels, I hold down the
>| "option" and "e" keys then type the vowel: =E1, =F3 =E9tc.
>| For umlauts hold down the "option" and "u" keys then type the vowel: =E4,
>| =F6, =EBtc.

This is <snip> message:

>| Subject: RE: Accents and umlauts
>| From: <>
>| In my last e-mail, I made a mistake. I wrote:
>| > I still did not discover how to make the vowel with the two dots.
>| I meant:
>| I still did not discover how to make the double acute accent on top
of the
>| vowels.
>| <snip>=F3

Join to automatically receive all group messages.