Re: Unicode search
Re: Unicode search
- Subject: Re: Unicode search
- From: John Delacour <email@hidden>
- Date: Fri, 21 Mar 2003 14:43:59 +0000
- Mac-eudora-version: 6.0a11
At 1:23 am +0000 21/3/03, has wrote:
Every character in Unicode proper
consists of two bytes (or 4 in the case of UTF-32) and the length is
not variable.
Not quite: in UTF-16, characters may consist of either one or two
two-byte blocks
Yes. If you consider a character to be what you see. The same
displayed character might be underlaid with two bytes or with up to 6
(even 8 ?) bytes. for example the GREEK SMALL LETTER OMEGA WITH
DASIA AND PERISPOMENI AND YPOGEGRAMMENI can be the single
ᾧ
or a plain omega with three combining characters from the 03 table,
which I haven't time to identify at the moment.
ω + combining.. + combining.. + combining..
Note that Safari does not seem to deal properly with the composed
version, though OmniWeb does.
Out of interest, John, do Perl regexes understand Unicode, or are
they strictly old-school one-byte-one-character? (If they do, what's
their performance like?)
From 5.8.0 onwards, yes. I upgraded to 5.8.0 mainly for its better
implementation of Unicode, though I must admit I haven't done as much
work yet with it as I had planned. As to performance, I can think of
no reason why it should suffer.
JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.