<email@hidden> wrote:
>What I'm finding that I need is to model in HTML what
>swing components' use to encode strings,
That would be Unicode characters, which is the Java 'char' type. The APIs
frequently use Strings or char[]'s, which are sequences built on the 'char'
type.
>as well as
>what the modern operating systems use for their file
>system as well. i.e. filenames, directories, etc.
Modern OSes are extremely diverse in their filesystems. Filesystems
(modern or legacy) are equally diverse in the way they store characters as
filenames. Also, the stored form (normalized or canonical form) may differ
from the input form passed to the OS. It's entirely possible to have
Unicode args that refer to files on an EBCDIC-encoded filesystem format.
Can you explain more precisely what you're modeling?
HTML contains URLs. There are specific encoding rules for URLs, especially
for extended characters and a number of URL-special chars. Refer to any of
the RFCs that define the syntax of URLs.
There are also things in Unicode that have multiple representations
possible, such as accented Latin-alphabet characters. In particular, many
vowels have a single Unicode char that represents a letter combined with an
accent (combined form, aka composed form). The equivalent combining-accent
form is an unadorned letter followed by a combining-accent character, i.e.
a 2-char pair. If you don't handle those in your URLs, bad things will
happen.
Combining accents are a particular issue because Mac OS X always returns
combining-accent forms, but it takes as input both composed and
combining-accent forms. Other platforms do different things.
Not every combination of letter and accent has a composed form, so
sometimes the combining-accent form is the only form possible.
For a small number of composed-accent chars common in MacRoman and 8859-1,
you may find the AccentComposer class useful. It's a part of my open
source MacBinary Toolkit:
<http://www.amug.org/~glguerin/sw/#macbinary>
>I believe swing Unicode utf-8 to do this. I know more
>modern web browsers can support this. Not sure about
>the mac file system. Probably the same..
Swing, Unicode, and UTF-8 are all completely different things:
1) Swing is the Java GUI toolkit found in javax.swing.
2) Unicode is a character encoding (a charset), albeit a large one.
3) UTF-8 is a perfectly invertible byte-encoding of Unicode chars.
If you get these things mixed up, your code will not work.
In addition, HTML has character-entities, signalled by '&' and terminated
by ';'. You may find that to be useful, as well, or it may just confuse
things more.
I suggest investing a little more forethought here, rather than running
full speed into an avoidable disaster.
-- GG
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden
This email sent to email@hidden