Re: characterAtIndex: method and composite characters (SOLVED)
Re: characterAtIndex: method and composite characters (SOLVED)
- Subject: Re: characterAtIndex: method and composite characters (SOLVED)
- From: "Ewan Delanoy" <email@hidden>
- Date: Thu, 5 Apr 2007 09:01:43 +0200 (CEST)
- Importance: Normal
>If you are sorting user-visible strings, it's
>usually best to use the user's preferred sort ordering, which the
>locale-sensitive compare methods will do.
The original reason for my post was that someone asked me to
implement the special character ordering described below (string
ordering actually, for sorting purposes, but that is a detail). As far
as I can see, the -compare: methods of NSString are of no use
in this situation. I append my code here; what is obviously wrong
with it is that it relies heavily on properties of the current
unichar encoding, but I don't see how to avoid this. Any suggestions
are welcome ...
Ewan
The task : to implement a total ordering on Unicode characters
(as a C function of type
NSComparisonResult compareCharacters(unichar c1,unichar c2) )
that meets the following specifications :
1) If the case-insensitive cores of c1 and c2 are different,
then c1 and c2 are in the same order as their cores (the case-insensitive
core of "A" with any number of accents and additional symbols is "a"). Also,
cores must be ordered in such a way that the (uppercase or lowercase)
English alphabet is an increasing sequence.
2) If the cores are identical but one of the characters is lowercase
and the other is uppercase, the lowercase one is smaller.
3) The bare character "a" (or "A") is smaller than any combination of it
with accents, diacritics or other symbols.
4) Behaviour unspecified in all the remaining cases.
How I did it :
NSComparisonResult compareCharactersNaively(unichar c1,unichar c2)
{
if (c1<c2) {
return NSOrderedAscending;
}
if (c2<c1) {
return NSOrderedDescending;
}
return NSOrderedSame;
}
unichar coreCharacter(unichar c)
{
const unichar temp [1]={c};
NSString* s=[NSString stringWithCharacters:temp length:1];
NSString* decomposed_s=[s decomposedStringWithCanonicalMapping];
return [decomposed_s characterAtIndex:0];
}
unichar lowercaseCoreCharacter(unichar c)
{
unichar cc=coreCharacter(c);
// we assume that c is a Latin character with any number of accents and
other symbols.
// Then cc will be one of the 52 characters a,b,c,d,e, ... ,z,A,B, ... Z.
// In the current unichar encoding, cc will be in [97..122] or [65..90]
if (cc<90) { /*if we get here cc is an uppercase Latin character : cc is
between 97 and 122 */
cc +=32; /*make cc lowercase*/
}
return cc;
}
NSComparisonResult compareCharacters(unichar c1,unichar c2)
{
unichar lc1=lowercaseCoreCharacter(c1);
unichar lc2=lowercaseCoreCharacter(c2);
if (lc1!=lc2) {
return compareCharactersNaively(lc1,lc2);
}
unichar cc1=coreCharacter(c1);
unichar cc2=coreCharacter(c2);
if (cc1!=cc2) {
return compareCharactersNaively(cc2,cc1);
}
return compareCharactersNaively(c1,c2); // default
}
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden