Spotlight and a stemmer (for flektive languages)?§
Spotlight and a stemmer (for flektive languages)?§
- Subject: Spotlight and a stemmer (for flektive languages)?§
- From: Ondra Cada <email@hidden>
- Date: Sun, 1 May 2005 00:22:19 +0200
Hello all,
happens anybody know whether at all--and if so, how--is it possible to
use a stemmer with the Spotlight full-text indexing machine to extend
the search for flektive languages?
Some background in case I do not use the proper terminology: there are
languages (like mine) which use flexion -- one and the same word can
take different forms in different circumstances. English has its "-s"
for plural and verbs, but in, say, Czech, it is *much* more often:
actually, nearly each word can occur in a number of forms. For example,
"matka" -- a Czech for "mother" -- can be written in different places
as "matek", "matkou", "matky", and more.
There is a language module named stemmer, which for each word finds its
stem -- its very basic form. Therefore, it comes *extremely* handy with
a full-text search system if it is possible to index stems instead of
the forms which actually occurs in the text. For example, two Czech
documents, the former of which contains only a word "matek" and the
latter only a word "matkou", should be *both* kept in the Spotlight
index with their common stem "matka", and *not* one for "matek", and
the other for "matkou".
Searching the Spotlight API though I haven't been able to find a way to
do this (short of the practically impossible task of replacing all the
standard indexing plugins for RTFs, HTMLs, DOCs, .... with my own ones
which would replace words by stems).
Is there a way to do this? I'd be quite grateful for any insight,
---
Ondra Čada
OCSoftware: email@hidden http://www.ocs.cz
private email@hidden http://www.ocs.cz/oc
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden