Re: seeking webguru advice on html character encoding
Re: seeking webguru advice on html character encoding
- Subject: Re: seeking webguru advice on html character encoding
- From: Elliotte Rusty Harold <email@hidden>
- Date: Thu, 20 Dec 2001 12:28:59 -0500
At 3:36 PM +0000 12/20/01, has wrote:
Hi all,
I'm working on a spanking new library for encoding ascii characters as html
character entities. I'm wondering if there's anyone who knows about
character sets, fonts and html character entities for discussing how to
handle some of the more awkward stuff. I'm avoiding some of the obvious
pratfalls (none of this –/— for en/em dashes crap as in
DreamWeaver, for example) but I'm lost elsewhere:
It sounds like you need to learn about Unicode. Tony Graham's
Unicode: A Primer might be a good place to start. Unicode is the
foundation for character sets on both the MacOS and Windows these
days. You really can't begin unless you have a grasp of the basics of
Unicode.
Stuff I can't find codes for in html4:
ASCII character 240 - apple
ASCII character 215 - diamond
ASCII character 197 - wavy thing (what is that anyway? approx equal?)
ASCII character 222 - fi ligature
ASCII character 223 - fl ligature
None of these are ASCII characters. No character with a code point
over 127 is an ASCII character.
The trademarked apple with a bite out of it character is only
available on Macs. It should not be put on the wire or exchanged in
documents. The other characters you're looking for are:
MacRoman Unicode
0xC5 0x2248 # ALMOST EQUAL TO
0xD7 0x25CA # LOZENGE
0xDE 0xFB01 # LATIN SMALL LIGATURE FI
0xDF 0xFB02 # LATIN SMALL LIGATURE FL
0xF0 0xF8FF # Apple logo but this is in the private use area
and is not compatible.
Characters that appear in Mac fonts but not in Windows fonts - I bet there
are some, but what are they? Will a Windows-based browser make mince of
them, or will it manage to render them ok? (Ditto Unix.) e.g. I know
IE5/Mac handles things like Þ and Ð which aren't on my machine -
how does it manage to do that?
Aside from the Apple, they all appear on both sides of the platform
divide. Which characters any given font has depends heavily on the
font.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | email@hidden | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
|
http://www.ibiblio.org/xml/books/bible2/ |
|
http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News:
http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News:
http://www.ibiblio.org/xml/ |
+----------------------------------+---------------------------------+