Re: Cocoa-dev Digest, Vol 2, Issue 21
Re: Cocoa-dev Digest, Vol 2, Issue 21
- Subject: Re: Cocoa-dev Digest, Vol 2, Issue 21
- From: Ricky Sharp <email@hidden>
- Date: Wed, 5 Jan 2005 21:11:41 -0600
On Jan 5, 2005, at 8:50 PM, Simon alias Trax wrote:
Here's a real-life example :
NSString *mot1 = @"arc";
NSString *mot2 = @"a";
NSString *mot3 = @"à";
Unless things have changed recently, GCC isn't UTF-8 aware and may
be doing strange things to your text. Try reading your strings in
from a file or from the UI and try again.
I think (mostly) everyone here is missing the point. The accented
"a", also known as U+0030, does not compare before the word "arc" as
it should (especially since it compares equal to a non-accented
"a"). The original poster had a legitimate problem which had nothing
to do with file encodings but that seems to be all everyone is
talking about. The question is, is it a bug that the accented "a"
compares after the word "arc" or is it a misunderstanding?
Brendan Younger
That's quite possible. Anyway, if it's the case, I will explain a
little more...
My array of strings is created like this :
dict = [[NSArray alloc] initWithArray:[[NSString
stringWithContentsOfFile:fich] componentsSeparatedByString:@"\r"]];
In the process, another array is created with strings taken from that
big array, and then sorted. I tried with a little array (3 element,
see top of the page for code snippet), and I get it wrong. My case is
French, and I think it's not like Swedish (like å). In French,
accented letters are seen as equivalent to their non-accented
counterparts, but IF a word is an homophone except for its accented
letter(s), then it comes after. Here's another example I tried within
my app and still don't work :
mur
mûr
muse
This is correct alphabetical order. BUT,
sortUsingSelector:@selector(compare:) gives me this :
mur
muse
mûr
Since "u" is supposed to be like "û", "s" comes after "r", but not in
this case. Same with sortUsingSelector:@selector(localizedCompare:).
The computer thinks as if "à" (or any other accented letter) is a
completely different letter and likely comes after "z". But that's not
the case, at least in French. (Unlike Swedish, for example, where "å"
comes after "z"...
I believe this is because compare: is implemented to use a default
option of NSLiteralSearch. Here's what it does:
"Performs a byte-for-byte comparison. Differing literal sequences (such
as composed character sequences) that would otherwise be considered
equivalent are considered not to match. Using this option can speed
some operations dramatically."
Also from the docs:
"Search and comparison are currently performed as if the
NSLiteralSearch option were specified. As the Unicode encoding becomes
more widely used, and the need for more flexible comparison increases,
the default behavior will be changed accordingly."
I made a post (last week?) about how others dealt with sorting Unicode
strings. While one can go to some lengths to achieve the sorting that
Finder does, it was pointed out to just use compare:. According to the
quote above, Apple will most likely modify the default behavior to do a
better job.
Definitely file a bug. I plan to also file a bug/enhancement to
perhaps add a new constant "NSFinderSearch" which would match the
implementation of Finder.
Finally, my observations of using NSLiteralSearch:
Numbers are sorted according to their numeric values (just like
Finder). e.g.
1
2
10
Accented Latin characters always appear after A..Z, a..z.
And, while I haven't dealt with much Unicode data, I've gotten the same
sorting as Finder with a sample of Japanese, Korean and Chinese
strings.
___________________________________________________________
Ricky A. Sharp mailto:email@hidden
Instant Interactive(tm) http://www.instantinteractive.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden