• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: SearchKit and Spotlight
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SearchKit and Spotlight


  • Subject: Re: SearchKit and Spotlight
  • From: Anton Leuski <email@hidden>
  • Date: Wed, 4 May 2005 11:39:46 -0700


On May 4, 2005, at 9:47 AM, Georg Tuparev wrote:


On May 3, 2005, at 7:19 PM, Anton Leuski wrote:


I'm not looking to do search, I'm looking to do clustering, -- arranging the document into groups of similar documents, -- based on the content. For this I need access to the list of terms from the documents and compare those lists. SearchKit API would give me this information if I can use it with the Spotlight index. An alternative is to create my own index which is rather wasteful.


This type of tasks have nothing to do with either spotlight or search kit. Document clustering is linguistic problem. There are few categories of solutions, but in most cases it comes to a writing a waiting function that is used by a clustering algorithm of by some more sophisticated algebraic method (e.g. finding RMS between the longest eigen-vectors etc.).



The key point here is that one needs a document representation. It has to be done before a _weighting_ function can be applied. Generally, that would be a vector of document terms (or individual words; sometimes more sophisticated representations are used). To get that vector one needs to extract textual information from a document -- ergo one needs to parse the document and tokenize it. Spotlight is setup to extract text from a wide variety of formats. SearchKit indexing process does tokenization. Once I have access to a SearchKit index, I can _easily_ extract the necessary term information. Implementing the actual clustering process is trivial by comparison to all the ground work one has to do to get the data.


-- Anton _______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden
References: 
 >SearchKit and Spotlight (From: Anton Leuski <email@hidden>)
 >Re: SearchKit and Spotlight (From: Vince DeMarco <email@hidden>)
 >Re: SearchKit and Spotlight (From: Anton Leuski <email@hidden>)
 >Re: SearchKit and Spotlight (From: Georg Tuparev <email@hidden>)

  • Prev by Date: Re: Are there macros for exception-based error checking?
  • Next by Date: Re: NSSelector question
  • Previous by thread: Re: SearchKit and Spotlight
  • Next by thread: Core Data XCode issue
  • Index(es):
    • Date
    • Thread