• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: SearchKit vs. Lucene
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SearchKit vs. Lucene


  • Subject: Re: SearchKit vs. Lucene
  • From: Jesse Grosjean <email@hidden>
  • Date: Sat, 22 Nov 2003 07:03:19 -0500

Now, anybody compared SearchKit vs. Lucene? I'll be picking one of the
two within the next few weeks and will happily share my findings with
anyone who's interested. I'll be working with modest data sets but many
data formats so my issues are more around flexible analysis and ease of
use than performance.

Stu,

I've been using Lucene java from objective-c cocoa and I'm also interested in what people can say about SearchKit vs. Lucene.

I've had very limited experience with SearchKit, but here are my thoughts so far.

SearchKit is very interesting because it comes with the OS (no download required) and it doesn't require a JVM to run it. Both features that I very much want. The API is at a lower level then Lucene but I still think it's easier to get going with it on OS X if you are developing in a C based language. Of course if you're developing in Java I would go for Lucene.

Lucene seems more flexible and seems to have a much larger development community... and of course you also have the source code.

My big question with SearchKit is performance. Take these thoughts with a grain of salt since I haven't used the API much.

Indexing seemed a little slow to start with, but then I found that if I flushed the index immediately after indexing each document it seemed much faster then Lucene. This seems counter intuitive to me...

Search performance on the other hand IS important to me. And in my initial tests SearchKit's performance was noticeably slower then Lucenes. With Lucene I can make searches almost instantaneous... it's plenty fast enough to run a new query for every letter the user types as they search a very large index. In particular in Lucene when you search you get back a HIT collection. Loading results from this HIT collection can take some time (though if you don't store data in the index this too is very fast), but it's always (for my uses) instantaneous to get this hit collection. And from this hit collection you can quickly get the total number of hits and the score for each hit.

While for search kit there seems to be some overhead somewhere, and even for a single document in the index I'm finding it relatively slow to run a new search for every key typed by the user. Here's the code that I use to perform a search:

CFMutableArrayRef indexes = CFArrayCreateMutable(NULL, 1, NULL); CFArraySetValueAtIndex(indexes, 0, index);

SKSearchGroupRef searchGroup = SKSearchGroupCreate(indexes); SKSearchResultsRef results = SKSearchResultsCreateWithQuery(
searchGroup,
(CFStringRef)@"return",
kSKSearchRanked,
10,
NULL,
NULL);

int searchResultsCount = SKSearchResultsGetCount(results);

My hope is that I'm just not using the API correctly. If anyone has example code on the standard way to use SearchKit that would be great. In the end the performance of SearchKit is probably sufficient... it's just a little disappointing compared to what I was getting with lucene.

PLUG FOR DEMONSTRATION PURPOSES ONLY. To see the results that I'm getting with lucene you can download my app here www.hogbay.com/software/notebook.

Jesse
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Prev by Date: Re: copying an object instance using NSCopyMemoryPages
  • Next by Date: classifying leading unichar as japanese kana?
  • Previous by thread: Re: SearchKit vs. Lucene
  • Next by thread: Re: SearchKit vs. Lucene
  • Index(es):
    • Date
    • Thread