Re: 'sort' command-alternative?
Re: 'sort' command-alternative?
- Subject: Re: 'sort' command-alternative?
- From: bill fancher <email@hidden>
- Date: Sun, 6 Oct 2002 15:14:09 -0700
On Saturday, October 5, 2002, at 03:32 PM, John Delacour wrote:
At 12:11 pm -0700 5/10/02, bill fancher wrote:
Is it Unicode happy (as in AS's Unicode text)?
I've been hoping some Unicode expert would answer. AFAIK sort isn't
Unicode happy, at least in the sense that it doesn't correctly
recognize a string in two different encodings, or with two different
"spellings", as being the same. But in this particular case, Finder
screws things up before sort gets a chance to.
I'm not sure what either of you mean by sorting in Unicode, even if
you're talking of the Latin-1 part of the table, let alone other
languages. I doubt if there is yet any routine in existence to to an
alphabetical or lexical sort of polytonic Greek and the same goes
probably even for a list of European words. I must say I've not
investigated the possibility very deeply, but I'd imagine special
routines would be needed for each language. Hanzi/Kanji are arranged
according to stroke count. Whether anyone has even begun writing
routines to sort according to pinyin or Japanese Roman, I doubt very
much.
The basis for Unicode sorting is The Unicode Collation Algorithm:
"The Unicode Collation Algorithm (UCA) provides a specification for how
to compare two Unicode strings while remaining conformant to the
requirements of The Unicode Standard, Version 3.0 . The UCA also
supplies the Default Unicode Collation Element Table as the data
specifying the default collation order."
<
http://www.unicode.org/unicode/reports/tr10/>
There are built-in routines for this in Mac OS X. That's how Finder can
sort file names correctly, regardless of the locale. Unfortunately,
developer.apple.com is down at the moment. The basic system call is
UCCompareText. If you've got developer tools installed, the docs are at
<file://localhost/Developer/Documentation/Carbon/text/UnicodeUtilities/
Unicode_Utilities_Ref/Functions/FunctionGroups.html>.
It's ugly, but not too bad, given the complexity of the problem at hand.
Since the Finder uses combining characters for accents and not
pre-composed characters, the sorting of files in the Finder is
theoretically that much simpler, but the whole business seems to me
extremely complicated and probably quite impossible.
Sorting Unicode is pretty straightforward in Python. To sort a list of
three Unicode strings:
l = [u'def',u'ghi',u'abc']
l.sort()
--> [u'abc',u'def',u'ghi']
That should work with pinyin (whatever THAT is).
There are probably pages at the Unicode site where these questions
are dealt with -- and for the sake of my mental health I don't plan to
visit them.
Don't click the above link then (82k of text to explain comparing two
strings).
--
bill
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.