Re: CoreData async fetch request
Re: CoreData async fetch request
- Subject: Re: CoreData async fetch request
- From: Ben Trumbull <email@hidden>
- Date: Thu, 8 Oct 2009 22:25:36 -0700
On Oct 8, 2009, at 8:28 PM, David Melgar wrote:
I read a little on ICU and now understand that sqlite by default
does not handle case insensitive unicode.
Is there an easy way to make sqlite use ICU on the Mac, or do I have
to build it myself with ICU enabled?
Probably the easiest thing to do (besides just using Core Data) is
register custom functions with the SQLite API that go through
CFString. The SQLite APIs for this are fairly straight forward. The
ICU dylib on Mac OS X is not public API. There are a variety of
references to building ICU or otherwise addressing this issue on the
web. Should you do that, you can register custom functions leveraging
ICU with SQLite as one would using CFString. I'd strongly recommend
against building and statically linking SQLite yourself.
Based on the derived property example, it seems that I would need to
duplicate any text fields I commonly search on where I want to
support case insensitive queries.
Yes, for data sets of this size. For smaller data sets, in the 10^2
or 10^3 range, you might decide to skip this and accept "slower"
queries since they'll probably be around 5ms - 20ms.
How does spotlight manage to index so much text and respond quickly?
Are those queries can insensitive?
Spotlight supports insensitive queries. Spotlight also duplicates
much of the material for its index. The importing process is not
free. You're certainly welcome to put all your data into Spotlight
and test its query performance for your purposes. Spotlight is
fundamentally a full text search index. It's optimized heavily for
prefix and word searching across very large numbers of documents.
It's very good for its intended usage pattern, but that pattern is not
the same as the usage of relational databases. The technologies are
complementary.
I've done a bunch more performance tests. I don't understand some of
the differences.
I hadn't realized that coredata in the data model has a flag to
index a field. When I did that for Coredata, it dramatically sped it
up. Although adding an index to my sqlite test slowed it down.
Your test results are very odd, in that they are still 50x to 100x
slower than I would expect. These tests have not yet applied the
derived property optimization I suggested, correct ?
As literally noted here, none of those queries are eligible for an
index, so I assume your actual code is doing something slightly
different, or the timing difference are due to configuration issues.
Hot I/O and cold I/O will be very different. For example:
search, without an index (cold I/O)
add an index (page everything into the UBC cache)
search, with an index, but a query that doesn't use the index (hot I/O)
The difference between hot and cold I/O for query performance will be
huge. As much as 100x. You can use /usr/bin/purge to force
everything to be cold I/O. Not necessary representative of real world
performance, as the whole disk cache will be flushed, including
material relevant to the system that would normally exist when you run
your app.
Coredata, without an index, predicate "name like foo", 20 databases
400k records each. 2m17s.
Coredata with an index, predicate "name like foo", 20 databases 1mil
records each, 25 seconds.
SQLite no index, sql "where name like foošr", 20 databases 1mil
records each 11 seconds.
SQLite index, sql "where name like foošr", 20 databases 1 mil
records each 30 seconds.
Why the 20 database files ? These queries will be faster with a
single database file and its unified index.
In any event, the derived property optimization should make all the
prefix/equality queries subsecond. On my laptop, for 20 million
records, that should be in the 500ms ballpark (hot I/O)
As you said, with an index, and testing for equality, both coredata
and sqlite queries responded in <1 sec. SQLite without an index was
9 seconds for same query.
That sounds about right. It'll be faster if you use a single database
with 20 million rows and a unified index table.
I can only presume SQLite slowed down with an index because the
database file was physically much larger and took longer to read
from disk.
I suspect its a flaw in the testing methodology, actually. Do you run
the queries multiple times, throw out the best & worst and average the
rest ? Are you running other apps at the same time you do performance
tests ?
It is possible for the presence of an index to negatively impact a non-
indexed query, but that would be a few %, not 3x slower An indexed
query would read many fewer pages from disk than a non-indexed query,
regardless of the file size. O(lg(N)) instead of O(N) and at 20
million that's a big difference. There are other possible issues in
play, but you said the result set is small, so they seem unlikely.
- Ben
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden