Re: SearchKit vs. Lucene
Re: SearchKit vs. Lucene
- Subject: Re: SearchKit vs. Lucene
- From: Jesse Grosjean <email@hidden>
- Date: Sat, 22 Nov 2003 07:03:19 -0500
Now, anybody compared SearchKit vs. Lucene? I'll be picking one of the
two within the next few weeks and will happily share my findings with
anyone who's interested. I'll be working with modest data sets but many
data formats so my issues are more around flexible analysis and ease of
use than performance.
Stu,
I've been using Lucene java from objective-c cocoa and I'm also
interested in what people can say about SearchKit vs. Lucene.
I've had very limited experience with SearchKit, but here are my
thoughts so far.
SearchKit is very interesting because it comes with the OS (no download
required) and it doesn't require a JVM to run it. Both features that I
very much want. The API is at a lower level then Lucene but I still
think it's easier to get going with it on OS X if you are developing in
a C based language. Of course if you're developing in Java I would go
for Lucene.
Lucene seems more flexible and seems to have a much larger development
community... and of course you also have the source code.
My big question with SearchKit is performance. Take these thoughts with
a grain of salt since I haven't used the API much.
Indexing seemed a little slow to start with, but then I found that if I
flushed the index immediately after indexing each document it seemed
much faster then Lucene. This seems counter intuitive to me...
Search performance on the other hand IS important to me. And in my
initial tests SearchKit's performance was noticeably slower then
Lucenes. With Lucene I can make searches almost instantaneous... it's
plenty fast enough to run a new query for every letter the user types
as they search a very large index. In particular in Lucene when you
search you get back a HIT collection. Loading results from this HIT
collection can take some time (though if you don't store data in the
index this too is very fast), but it's always (for my uses)
instantaneous to get this hit collection. And from this hit collection
you can quickly get the total number of hits and the score for each
hit.
While for search kit there seems to be some overhead somewhere, and
even for a single document in the index I'm finding it relatively slow
to run a new search for every key typed by the user. Here's the code
that I use to perform a search:
CFMutableArrayRef indexes = CFArrayCreateMutable(NULL, 1,
NULL); CFArraySetValueAtIndex(indexes, 0, index);
SKSearchGroupRef searchGroup =
SKSearchGroupCreate(indexes); SKSearchResultsRef results =
SKSearchResultsCreateWithQuery(
searchGroup,
(CFStringRef)@"return",
kSKSearchRanked,
10,
NULL,
NULL);
int searchResultsCount = SKSearchResultsGetCount(results);
My hope is that I'm just not using the API correctly. If anyone has
example code on the standard way to use SearchKit that would be great.
In the end the performance of SearchKit is probably sufficient... it's
just a little disappointing compared to what I was getting with lucene.
PLUG FOR DEMONSTRATION PURPOSES ONLY. To see the results that I'm
getting with lucene you can download my app here
www.hogbay.com/software/notebook.
Jesse
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.