Re: SearchKit and Spotlight
Re: SearchKit and Spotlight
- Subject: Re: SearchKit and Spotlight
- From: Vince DeMarco <email@hidden>
- Date: Tue, 3 May 2005 09:13:07 -0700
On May 3, 2005, at 9:00 AM, Ondra Cada wrote:
Vince,
since you seem to be the one who does understand Spotlight
excellently, may I please ask you whether there is any chance to
integrate a language-based stemmer into it?
I talked to the other people on my team about your issue.
there are several problems
1) we are only doing a prefix search for text with the search kit.
2) you can assign your own stemmer if you use the search kit
directly, but none of this is exposed to the user
we really need some sort of plugin etc, to allow you do all of this.
But the answer for now unfortunately is that you can't do what you want.
We have the same problem with German (which is a tier 1 language)
Vince
happens anybody know whether at all--and if so, how--is it
possible to use a stemmer with the Spotlight full-text indexing
machine to extend the search for flektive languages?
Some background in case I do not use the proper terminology: there
are languages (like mine) which use flexion -- one and the same
word can take different forms in different circumstances. English
has its "-s" for plural and verbs, but in, say, Czech, it is
*much* more often: actually, nearly each word can occur in a
number of forms. For example, "matka" -- a Czech for "mother" --
can be written in different places as "matek", "matkou", "matky",
and more.
There is a language module named stemmer, which for each word
finds its stem -- its very basic form. Therefore, it comes
*extremely* handy with a full-text search system if it is possible
to index stems instead of the forms which actually occurs in the
text. For example, two Czech documents, the former of which
contains only a word "matek" and the latter only a word "matkou",
should be *both* kept in the Spotlight index with their common
stem "matka", and *not* one for "matek", and the other for "matkou".
Searching the Spotlight API though I haven't been able to find a
way to do this (short of the practically impossible task of
replacing all the standard indexing plugins for RTFs, HTMLs,
DOCs, .... with my own ones which would replace words by stems).
Is there a way to do this? I'd be quite grateful for any insight,
Thank you *very* much,
---
Ondra Čada
OCSoftware: email@hidden http://www.ocs.cz
private email@hidden http://www.ocs.cz/oc
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden