• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: 'sort' command-alternative?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'sort' command-alternative?


  • Subject: Re: 'sort' command-alternative?
  • From: bill fancher <email@hidden>
  • Date: Sun, 6 Oct 2002 15:14:09 -0700

On Saturday, October 5, 2002, at 03:32 PM, John Delacour wrote:

At 12:11 pm -0700 5/10/02, bill fancher wrote:

Is it Unicode happy (as in AS's Unicode text)?

I've been hoping some Unicode expert would answer. AFAIK sort isn't Unicode happy, at least in the sense that it doesn't correctly recognize a string in two different encodings, or with two different "spellings", as being the same. But in this particular case, Finder screws things up before sort gets a chance to.

I'm not sure what either of you mean by sorting in Unicode, even if you're talking of the Latin-1 part of the table, let alone other languages. I doubt if there is yet any routine in existence to to an alphabetical or lexical sort of polytonic Greek and the same goes probably even for a list of European words. I must say I've not investigated the possibility very deeply, but I'd imagine special routines would be needed for each language. Hanzi/Kanji are arranged according to stroke count. Whether anyone has even begun writing routines to sort according to pinyin or Japanese Roman, I doubt very much.

The basis for Unicode sorting is The Unicode Collation Algorithm:

"The Unicode Collation Algorithm (UCA) provides a specification for how to compare two Unicode strings while remaining conformant to the requirements of The Unicode Standard, Version 3.0 . The UCA also supplies the Default Unicode Collation Element Table as the data specifying the default collation order." <http://www.unicode.org/unicode/reports/tr10/>

There are built-in routines for this in Mac OS X. That's how Finder can sort file names correctly, regardless of the locale. Unfortunately, developer.apple.com is down at the moment. The basic system call is UCCompareText. If you've got developer tools installed, the docs are at <file://localhost/Developer/Documentation/Carbon/text/UnicodeUtilities/ Unicode_Utilities_Ref/Functions/FunctionGroups.html>.

It's ugly, but not too bad, given the complexity of the problem at hand.

Since the Finder uses combining characters for accents and not pre-composed characters, the sorting of files in the Finder is theoretically that much simpler, but the whole business seems to me extremely complicated and probably quite impossible.

Sorting Unicode is pretty straightforward in Python. To sort a list of three Unicode strings:

l = [u'def',u'ghi',u'abc']
l.sort()
--> [u'abc',u'def',u'ghi']

That should work with pinyin (whatever THAT is).

There are probably pages at the Unicode site where these questions are dealt with -- and for the sake of my mental health I don't plan to visit them.

Don't click the above link then (82k of text to explain comparing two strings).

--
bill
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: 'sort' command-alternative? (From: John Delacour <email@hidden>)

  • Prev by Date: Mac OS X, Classic, Quark XPress and Script Debugger
  • Next by Date: Re: 'sort' command-alternative?
  • Previous by thread: Re: 'sort' command-alternative?
  • Next by thread: do shell script woes
  • Index(es):
    • Date
    • Thread