Re: Strange QP
Re: Strange QP
- Subject: Re: Strange QP
- From: email@hidden
- Date: Tue, 24 Sep 2002 13:18:25 -0400
On Mon, 23 Sep 2002 19:26:52 -0700From: Paul Berkowitz <email@hidden>
asked,
>
[...] OK, the
>
character I'm asking about is the bullet, made on a US
>
keyboard with option-8. That's the one whose QP is "=80".
>
ASCII 128 on the Mac is "A" with an umlaut, on on Windows
>
is the Euro symbol, which is why I was wondering whether
>
the 128 code might have been recently assigned to it.
>
On Windows, the bullet's ASCII is 149, on Mac it's 165.
You're getting tricked up by the Windows extension to the
ISO-8859-1 (Latin-1) character set. The row of characters
from 80 to 9f hex, 128 to 159 decimal, are formally "not
defined" in ISO-8859-1. As I understand it, this was done
as a safety precaution, because these characters are the
high-bit-set partners of the ASCII control characters.
Its desirable that if you have a seven bit connection, you
don't get control characters (like "clear screen," or "stop
transmitting" when text was intended. But Microsoft has
added some glyphs. (The superset is called Code Page 1252).
Apple also had their 8-bit character set, MacRoman, which
predated ISO-8859-1, containing some glyphs Windows CP1252
didn't have (like fi and fl ligatures, and heart, while
Latin-1 had fractions like 1/2 and 1/4, and thorn (gotta
keep Iceland happy).
Bullet is not defined in Latin-1, although decimal 183
(B7 hex) is multiplication dot, a bit smaller. Windows
added the bullet at 149 (95 hex). MacRoman has the
bullet at (A5 hex), which would give you a yen sign
in Latin-1.
The Euro symbol was recently added to CP1252 at 128
decimal (80 hex), but it was not added to Latin-1.
A modified version of Latin 1,ISO 8859-15, (confusingly
known as Latin-9 and sometimes Latin 0) adds the Euro,
plus some missing French and Finnish accented characters,
at the expense of marginally useful Latin-1 characters
like the broken vertical bar (hex A6), the fractions, and
naked accent characters. But in Latin 9, the Euro is a
A4 hex (in place of the "generic currency" symbol).
The Quoted-Printable character coded =80 is the Euro symbol
in CP1252, A-umlaut in MacRoman, and an undefined control
character in Latin-1. It all depends on the Content-Type
specified in the header. (If you have a Content-transfer-
encoding: Quoted-Printable header, you gotta have a
Content-Type: text/plain; charset=xxxx as well.)
>
Yucch.
Yup. There are glyphs in MacRoman that don't exist in
Latin-1, and glyphs in Latin-1 and in CP1252 that don't
exist in MacRoman. These are the difficult problems to
solve without Unicode. The code scrambling (such as
different codes for the bullet between MacRoman and CP1252)
is not that challenging, as long as the Content-Type:
MIME header is preserved and has the correct character set
specified.
But maybe I shouldn't say "not that challenging," since
the applescript-users list server can't handle the text
of an AppleScript. I guess its rocket science after all.
--
Scott Norton Phone: +1-703-299-1656
DTI Associates, Inc. Fax: +1-703-706-0476
2920 South Glebe Road Internet: email@hidden
Arlington, VA 22206-2768 or email@hidden
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.