Re: SearchKit vs. Lucene
Re: SearchKit vs. Lucene
- Subject: Re: SearchKit vs. Lucene
- From: Joseph Heck <email@hidden>
- Date: Thu, 20 Nov 2003 19:18:52 -0800
I've done some research of my own, but YMMV -
Ultimately Lucene is more flexible, simply because you've got it ALL
right there to mess with if you want. SearchKit does some things "out
of the box" that Lucene doesn't: Forward Indicies and "find similiar
documents". SearchKit doesn't have a concept of a custom analyzer that
you can create - at least as of yet. It does include a whole pile of
really wonderful language options (japanese text word boundary stuff,
for example) and a decent selection of "read in the document format and
index it" options: word, html, pdf that Lucene doesn't have "out the
box". SearchKit includes the concept of a prefix search (which I'd
normally think of as a filter function) which is the functionality you
see when using iTunes. SearchKit supports boolean, ranked relevance,
similiarity, inclusion/exclusion, and prefix searches.
Multiple indicies are easily (to me) searched in either.
In the category of "just different", there's a substitutions mechanism
within SearchKit set up such that you define words that should be
treated as identical for the purposes of search.
I haven't hammered on the results enough to know if there's stemming
(Lucene does a porter stemming) happening under the covers. SearchKit
also supports the concept of a document hierarchy, although it's not
clear to me what the benefit is and how it should be utilized. There's
only a few snippets of detail about it in the documentation.
I have not profiled "speed", either of searching or indexing. I've
simply used it and found it acceptable for the small project that was
my testbed.
(Worth noting that a Forward Index mechanism was implemented in MIT's
Haystack code that would prove useful to someone wanting to create a
find-similar mechanism that required it)
-joe
On Thursday, November 20, 2003, at 05:50 PM, Stuart Halloway wrote:
First, thanks to the several people who pointed me to SearchKit.
Now, anybody compared SearchKit vs. Lucene? I'll be picking one of the
two within the next few weeks and will happily share my findings with
anyone who's interested. I'll be working with modest data sets but
many data formats so my issues are more around flexible analysis and
ease of use than performance.
Stu
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.