Re: 'sort' command-alternative?
Re: 'sort' command-alternative?
- Subject: Re: 'sort' command-alternative?
- From: Paul Berkowitz <email@hidden>
- Date: Sun, 06 Oct 2002 02:07:24 -0700
On 10/5/02 3:32 PM, "John Delacour" <email@hidden> wrote:
>
At 12:11 pm -0700 5/10/02, bill fancher wrote:
>
>
>> Is it Unicode happy (as in AS's Unicode text)?
>
>
>
> I've been hoping some Unicode expert would answer. AFAIK sort isn't
>
> Unicode happy, at least in the sense that it doesn't correctly
>
> recognize a string in two different encodings, or with two different
>
> "spellings", as being the same. But in this particular case, Finder
>
> screws things up before sort gets a chance to.
>
>
I'm not sure what either of you mean by sorting in Unicode, even if
>
you're talking of the Latin-1 part of the table, let alone other
>
languages. I doubt if there is yet any routine in existence to to an
>
alphabetical or lexical sort of polytonic Greek and the same goes
>
probably even for a list of European words. I must say I've not
>
investigated the possibility very deeply, but I'd imagine special
>
routines would be needed for each language. Hanzi/Kanji are arranged
>
according to stroke count. Whether anyone has even begun writing
>
routines to sort according to pinyin or Japanese Roman, I doubt very
>
much.
>
>
Since the Finder uses combining characters for accents and not
>
pre-composed characters, the sorting of files in the Finder is
>
theoretically that much simpler, but the whole business seems to me
>
extremely complicated and probably quite impossible. There are
>
probably pages at the Unicode site where these questions are dealt
>
with -- and for the sake of my mental health I don't plan to visit
>
them.
>
You could use Unicode's own indexing - Unicode number - to impose a simple
ordering. Within the few European languages I've looked at, it seems to
present the "ordinary" characters first in their usual "alphabetic" order,
where there is one, followed by special characters in who-knows-what order.
I'm sure there must be a few languages where there is no recognized order
and they've used some principles of their own (or maybe there's even an
element of arbitrariness.) But for a consistent, universal sorting, you
couldn't ask for better than the Unicode number.
There are large sections of the Unicode schema reserved for "private
characters" which will never be overwritten or claimed for universal
characters. So you'd always be skipping those, unless you were in one of
these private concerns (in which case you'd be in just one and could fit
your set in along with the universal set.)
Is there any way in AppleScript to get Unicode number of a given character?
'AE Print' in Script Debugger result window gives the double-byte hex form:
perhaps there's a way to do that in AppleScript itself? ('as data' doesn't
do it.) Those numbers are quite sortable after converting to decimal.
--
Paul Berkowitz
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.