• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode search
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode search


  • Subject: Re: Unicode search
  • From: John Delacour <email@hidden>
  • Date: Fri, 21 Mar 2003 14:43:59 +0000
  • Mac-eudora-version: 6.0a11

At 1:23 am +0000 21/3/03, has wrote:

Every character in Unicode proper
consists of two bytes (or 4 in the case of UTF-32) and the length is
not variable.

Not quite: in UTF-16, characters may consist of either one or two two-byte blocks

Yes. If you consider a character to be what you see. The same displayed character might be underlaid with two bytes or with up to 6 (even 8 ?) bytes. for example the GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI can be the single

&#x1fa7;

or a plain omega with three combining characters from the 03 table, which I haven't time to identify at the moment.


&#x3c9; + combining.. + combining.. + combining..

Note that Safari does not seem to deal properly with the composed version, though OmniWeb does.


Out of interest, John, do Perl regexes understand Unicode, or are they strictly old-school one-byte-one-character? (If they do, what's their performance like?)

From 5.8.0 onwards, yes. I upgraded to 5.8.0 mainly for its better implementation of Unicode, though I must admit I haven't done as much work yet with it as I had planned. As to performance, I can think of no reason why it should suffer.

JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Unicode search
      • From: From: (anonymous coward) <email@hidden>
References: 
 >Re: Unicode search [was Re: the Holy Grail of AppleScript lists] (From: has <email@hidden>)

  • Prev by Date: Re: Unicode search
  • Next by Date: RE: scripting printers
  • Previous by thread: Re: Unicode search [was Re: the Holy Grail of AppleScript lists]
  • Next by thread: Re: Unicode search
  • Index(es):
    • Date
    • Thread