Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Spotlight, Content Indexing, and SearchKit integration questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spotlight, Content Indexing, and SearchKit integration questions

Subject: Re: Spotlight, Content Indexing, and SearchKit integration questions
From: Mark T <email@hidden>
Date: Thu, 02 Jun 2005 11:04:57 -0400

This is from memory, so I may be wrong on a few small points.

On Mon, 2005-05-30 at 14:42 -0500, Dr. Smoke wrote:
> 1. What is the underlying integration of metadata indexing and content
> indexing, e.g. which is done first, which is done second?
> Specifically, when is content indexing performed in the overall
> process of "indexing" documents with Spotlight.  Is it performed on a
> per-document basis as a document is processed by its
> associated .mdimporter?  In other words, does the larger Spotlight
> indexing process first call the .mdimporter to extract the metadata,
> then invoke SearchKit (ContentIndexing.app?) to index the content?

Spotlight performs both content and metadata indexing, relying on
the .mdimporter to perform both functions.

> 2. Is Spotlight or SearchKit using ContentIndexing.app to
> produce /.Spotlight-V100/ContentIndex.db ?  If not, how is the
> ContentIndex.db produced?

Initial indexing of a volume is automatically handled by the mdindex
tool. That creates the files in .Spotlight-V100. After that, the kernel
updates the index at the file I/O level. This means that there is no
indexing process needed after the initial index is created. The index is
updated by the kernel when files are added or changed.

> 3. Have the .mdimporter objects assumed the role of the
> *TextExtractor.bundle objects under Panther?  It appears not and that
> this is somehow handled by SearchKit.

They have.

> 4. Where are the "text extractor plug-ins" employed by SearchKit?
> SearchKit documentation -- "Adding Search to Your Application
> (Preliminary)"
> ( http://developer.apple.com/documentation/UserExperience/Conceptual/SearchKitConcepts/index.html ) -- notes the following:
>
>
> "In general, your application takes responsibility for providing
> document content to Search Kit in the appropriate form, namely a
> CFString object. For local, on-disk files supported by the built-in
> text extractor plug-ins, Search Kit knows how to get content for you.
> The text extractor plug-ins work with RTF, XML, plaintext, and PDF
> files, as well as with Microsoft Word documents."
>
>
> However, I've been unable to find these unless they are somehow
> embedded in the type-specific .mdimporter objects included with Tiger,
> i.e. in the  /System/Library/Spotlight/ and /Library/Spotlight/
> directories.  Examining the contents of these bundles has not given me
> any clues.

The .mdimporter files are, indeed, the "text extractor plug-ins"
mentioned in the documentation.

> 5. How would one create new text extractor plug-ins? For example,
> there is a FBC (Find By Content) Extractor SDK available here
>
>
> http://developer.apple.com/sdk/
>
>
> that is probably outdated and not applicable to SearchKit or
> Spotlight.

You can create text extractors with XCode. It's fully documented.

> 6. In Panther and Jaguar, one could encounter failed index builds if
> ContentIndexing hit "corrupted" documents.  I've documented this here:
> http://www.thexlab.com/faqs/failedindex.html  In my experience,
> sometimes a document ContentIndexing considered corrupted would
> actually open AOK in its associated application, so the problem was
> probably in the *TextExtractor.bundle, or how ContentIndexing handled
> "bad" return codes from such.  For metadata extraction, it appears
> there are provisions that can be made for indicating extraction failed
> or did not produce results, hence the whole "Spotlight" (as seen by
> the user) indexing process should not come crashing down.  However,
> how is this handled with content indexing under Tiger?

The text extractor handles pretty much everything. Detecting corrupt
documents and skipping them is up to whoever writes the importer.

> I'd appreciate answers to these questions from Apple Development.
> Thanks!

It seems like you're confused about what Spotlight and friends really
are. Spotlight handles all aspects of system-wide searching. It isn't
really one thing. Instead, it's a set of technologies for indexing
(again, at the kernel level), and searching for information in any form.
.mdimporter files are run by Spotlight (at initial indexing or
kernel-based index updating) to get information out of whatever format
the file is in and into the plain-text index.
Content Indexing is no longer used by Tiger. It used a user-level
indexing process for updates instead of the kernel-level method used
now.
SearchKit can be used by developers to add indexed searching to their
applications. I think it's what Spotlight uses to search the Spotlight
index, but I may be wrong on that point.

You really need to read more about the technology involved. The Apple
docs are a great place to start. Also, Ars Technica has in in-depth
review of Tiger with plenty of metadata and Spotlight talk. You can find
it here:
http://arstechnica.com/reviews/os/macosx-10.4.ars

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Prev by Date: ScreenSaver Bundles Expanded?
Next by Date: Re: Can a System Services NSKeyEquivalent be a cursor key?
Previous by thread: ScreenSaver Bundles Expanded?
Next by thread: Re: Spotlight, Content Indexing, and SearchKit integration questions
Index(es):
- Date
- Thread