On 2005.06.02, at 08:14 , Mark T wrote:
It looks to me from the docs like the metadata store (ContentIndex.db) is actually a SearchKit index.
I also suspect ContentIndex.db is created by SearchKit, but it is only one part of the "metadata store." Again, the entire "metadata store" -- as I interpret Spotlight -- is the .Spotlight-V100 directory, not just ContentIndex.db.
I just did a ascii-dump of /.Spotlight-V100/store.db, and it looks like that's correct. The store.db files do contain metadata.
Thought so. Did you also dump /.Spotlight/.store.db? I've been trying to figure out why there are two *store.db files of exactly the same size.
I have no idea why ContentIndexing.app is still in your Tiger system, because there's no trace of it on mine. Did you upgrade or archive and install?
I did an Upgrade install. Sure it's not in /System/Library/Find ? Note that Spotlight won't look there since it's a system folder. You have to open the folder or use UNIX find.
I asked another fellow who did an Erase and Install and he said he had /System/Library/Find/ContentIndexing.app.
Perhaps he was mistaken. Can you double-check that directory?
It's in there on my system.
How did you install Tiger? If you used either Archive & Install or Erase & Install, then the earlier info I was provided is correct.
I didn't know about Spotlight not searching /System. I guess that makes sense. No need to have all those odd-looking results confusing Average Joe.
That seems to be the design point. Annoying omission for technical users, but one can force it to index System-folders in several ways, e.g. using mdimport, editing _rules.plist.
I can't figure out what it does, though. I was under the impression that it was used by the old content indexing system. I have no idea how it fits into Spotlight.
As to why it's still in Tiger, that's my question exactly. You are correct in that, under Jaguar and Panther, ContentIndexing.app was used to perform content indexing of folders and volumes for use by Find By Content in Finder's Find (Command-F) function.
I haven't looked much into the procedure used for how indexing is actually done, but it seems like the kernel notifies the mds daemon when a file has been changed. The daemon then spawns an mdimport process (and yes, I got the name wrong. I did say I was going from memory.)
That's how it gets kicked off. I'm more "obsessed" with the sub-atomic detail, e.g. where SearchKit is being called by the .mdimporter, for example.
From the Spotlight Importer Programming Guide:
"When metadata is extracted for a file, the GetMetadataForFile function is called. The function is passed the plug-in interface, a mutable dictionary that you’ll add the metadata attribute keys and values to, the UTI type of the target file, and the full path to the target file...Your implementation of this function should extract the metadata from the file and insert it into the dictionary with the appropriate keys and values. If it successfully returns metadata, the function should return with a value of true. If no metadata was extracted, you should return false."
Looks like Spotlight adds whatever the importer returns for text content to the SearchKit index. The importer never has to know about the SearchKit part of the whole business.
That's a very generic description. It does not address how the bundled mdimporter objects may be calling SearchKit for indexing.
We don't know what algorithm Apple uses for indexing, but it's either very dynamic or very poorly written. As I said, I haven't had any problems with Spotlight yet. As I also said, Apple fixed some Spotlight issues in 10.4.1. Have you repeated these experiments under that version?
Yes, my tests were performed under 10.4.1.
There's actually a very simple example in the Spotlight Importer Programming Guide. I haven't needed to write one yet, so I was just looking at it now. The example is for a property list with Author, Title, and Notes fields. Those all go under the metadata category; there's no example that I've found with any content indexing.
Exactly my point: how the content index, ContentIndex.kb, is being created in .Spotlight-V100 is undocumented and there are no examples of how kMDItemTextContent is handled using the bundled importers. I've reviewed the example in that doc. ;-)
I see a new "SearchKit Reference" was just published:
I'll have to take a detailed look at it. I find it interesting that it states
"You can use Search Kit or Spotlight to provide similar functionality and powerful information-access capabilities within your Mac OS X application."
whereas the quotes I cited before from "Mac OS X Technology Overview" said "don't use Spotlight for content search: use SearchKit" Go figure. ;-)
Again, good luck. And thanks for educating me about Spotlight. That's the funny thing about tech discussions: the person trying to answer questions can actually learn as much (or more) as the person asking them.
Thanks. I keep hoping someone from Apple's Spotlight team will respond to this thread and clear-up the remaining questions:
1. Exactly how is Spotlight using SearchKit, both for content indexing and in search.
2. How kDMItemContent is populated if one has files whose UTIs match the types supported by the existing mdimporter objects.