Re: SearchKit and Spotlight
Re: SearchKit and Spotlight
- Subject: Re: SearchKit and Spotlight
- From: Anton Leuski <email@hidden>
- Date: Wed, 4 May 2005 11:39:46 -0700
On May 4, 2005, at 9:47 AM, Georg Tuparev wrote:
On May 3, 2005, at 7:19 PM, Anton Leuski wrote:
I'm not looking to do search, I'm looking to do clustering, --
arranging the document into groups of similar documents, -- based
on the content. For this I need access to the list of terms from
the documents and compare those lists. SearchKit API would give me
this information if I can use it with the Spotlight index. An
alternative is to create my own index which is rather wasteful.
This type of tasks have nothing to do with either spotlight or
search kit. Document clustering is linguistic problem. There are
few categories of solutions, but in most cases it comes to a
writing a waiting function that is used by a clustering algorithm
of by some more sophisticated algebraic method (e.g. finding RMS
between the longest eigen-vectors etc.).
The key point here is that one needs a document representation. It
has to be done before a _weighting_ function can be applied.
Generally, that would be a vector of document terms (or individual
words; sometimes more sophisticated representations are used). To get
that vector one needs to extract textual information from a document
-- ergo one needs to parse the document and tokenize it. Spotlight is
setup to extract text from a wide variety of formats. SearchKit
indexing process does tokenization. Once I have access to a SearchKit
index, I can _easily_ extract the necessary term information.
Implementing the actual clustering process is trivial by comparison
to all the ground work one has to do to get the data.
-- Anton
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden