Re: Core Data dog-slow when using first time after boot
Re: Core Data dog-slow when using first time after boot
- Subject: Re: Core Data dog-slow when using first time after boot
- From: "Melissa J. Turner" <email@hidden>
- Date: Thu, 20 Aug 2009 13:38:22 -0700
On Aug 20, 2009, at 02:35, Ruotger Skupin wrote:
Complex locale aware Unicode text queries can be slow. If you
find yourself spending time with such a query, you should consider
some of the techniques shown in the DerivedProperty example
available on ADC.
Isn't all text Unicode?
No. Not all apps are Unicode based, and many of the ones that aren't
will put things on the pasteboard quite happily. The web (and thus
anything copied out of a web browser) is definitely not all Unicode,
especially the older pages. And even within Unicode there are
multiple encoding formats (8. 16, and 32 bit). In addition to the
varying encoding sizes, Unicode also has multiple ways to represent
conceptual characters. Characters that have diacritics for example,
can be represented as either one Unichar ('é') or two ('´' + 'e').
I don't understand. This shouldn't be a special case. But I will
have a look at the sample.
In my case I'd guess that at least half of the objects contain
unicode strings (international names and addresses). What I want to
say: write anything in German or French and you end up with Unicode.
Due to the multiplicity of representations, text comparisons in
Unicode can be slow, since instead of just doing a byte by byte
comparison, you end needing to calculate character sizes, check for
compositions/decompositions, check for analogues between different
symbol systems used to represent a single language (ie kana and
kanjii), recognize and drop punctuation, etc. For apps that do
repeated comparisons against a set of strings, it can be worth it to
preprocess all strings into one canonical format to minimize the
amount of work that needs to be done during a comparison (make all
strings UTF8/16/32, make all characters lowercase, strip all
diacritics or ensure characters that have them are always in either
their composed or decomposed forms, etc) and then use a less expensive
collation for the comparison.
As a side node, if you want to use regular expressions on Unicode
strings, you generally need to do the normalization anyway, since
regex languages operate at the Unichar level rather than at the
conceptual character level.
+Melissa
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden