Re: CoreData async fetch request
Re: CoreData async fetch request
- Subject: Re: CoreData async fetch request
- From: David Melgar <email@hidden>
- Date: Thu, 8 Oct 2009 23:28:11 -0400
I read a little on ICU and now understand that sqlite by default does
not handle case insensitive unicode.
Is there an easy way to make sqlite use ICU on the Mac, or do I have
to build it myself with ICU enabled?
Based on the derived property example, it seems that I would need to
duplicate any text fields I commonly search on where I want to support
case insensitive queries.
How does spotlight manage to index so much text and respond quickly?
Are those queries can insensitive?
I've done a bunch more performance tests. I don't understand some of
the differences.
I hadn't realized that coredata in the data model has a flag to index
a field. When I did that for Coredata, it dramatically sped it up.
Although adding an index to my sqlite test slowed it down.
Coredata, without an index, predicate "name like foo", 20 databases
400k records each. 2m17s.
Coredata with an index, predicate "name like foo", 20 databases 1mil
records each, 25 seconds.
SQLite no index, sql "where name like foošr", 20 databases 1mil
records each 11 seconds.
SQLite index, sql "where name like foošr", 20 databases 1 mil
records each 30 seconds.
As you said, with an index, and testing for equality, both coredata
and sqlite queries responded in <1 sec. SQLite without an index was 9
seconds for same query.
I can only presume SQLite slowed down with an index because the
database file was physically much larger and took longer to read from
disk.
On Oct 8, 2009, at 2:25 AM, Ben Trumbull wrote:
On Oct 7, 2009, at 10:12 PM, David Melgar wrote:
Hello,
I didn't mean to state threads as a requirement when I said
"async", I just meant some way to get partial results, such as a
call to a delegate I referenced in the previous note. And I'm
certainly not seeking complexity of threads if I don't need it.
There isn't a simple way to do this. There is, however, a simple
way to make your queries (in both Core Data, and direct SQLite) 100x
faster.
In my case, == is not the query I need. I really want case
insensitive match as I presumed LIKE would do, or better yet, 'LIKE
foo%'.
Great. The Derived Property sample project on ADC shows you exactly
what you need to do for replacing LIKE "foo" or LIKE "foo%"
In the meantime, I've just completed a test using SQLite.
The query in Coredata using 'LIKE foošr' across roughly 8 million
records didn't return anything until the query was done, and the
query took 2min 14sec,
This runs through ICU, and cannot use an index. It can trivially be
made 100x faster, at least for equality, prefix, and suffix matching.
What's the actual SQL Core Data logs ?
Doing what I interpreted to be equivalent using SQLite across 20
million records returned in 11 seconds
What was the actual SQL you believed equivalent ?
and I was able to get my first result almost immediately without
needing threads. They provide a callback scheme, analogous to
calling a delegate in Cocoa. FYI, this is across 20 different
databases, otherwise it may have been faster.
For what I'm doing, SQLite seems much faster, smaller footprint and
easier to comprehend. I don't understand what I'm losing.
In your note you mention the cost of doing unicode aware regex and
sorting. As far as I know, I don't need any of those, at least not
yet.
All of your data is 100% 7 bit ASCII ? None of your customers will
be Asian, European, or Hispanic ? None of your English customers
will expect Unicode data that they copy and paste into your
application from Safari to be preserved correctly ?
If your answer is no, then you'll need to implement your own string
matching operations for SQLite using either CFString or ICU.
SQLite's built in operators use memcmp(). Once you've done that,
I'd love to see the performance results.
I took a brief look at the derived property example. Seems complex.
What about the example seems complex ? Everything you need is in 1
file, that's got 107 lines of code. You can just copy and paste
that code. Make a searchText column that has preprocessed the text
data into a simpler form that can be used with an index and a
simpler query.
You say you want to do a case insensitive search across 20 million
rows. You could absorb the cost of that, relative to a simple
binary bitwise compare, for each of the 20 million rows. And pay it
again every time you execute a query. Or, you could store a
preprocessed column, and absorb that cost for just 1 string, the
current search term "foo"
Perhaps for ASCII case insensitivity, that doesn't seem particularly
interesting. But once you start dealing with real world text
encodings, which every Mac OS X and iPhone OS customer expects from
every app, this becomes a big deal. Unicode is complex and
difficult to work with.
Something the framework should give me an option to handle rather
than me having to generate such complex code. Does it advocate
creating another field in the database as a normalized version of
the field I really want to search on?
Yes.
Why would this be better than using SQLite directly?
Doing this in SQLite directly will also be 100x faster than what
you're doing now. Regardless of whether or not you decide to use
Core Data, this would be better.
- Ben
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden