Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Context of meta data



Harry,

The issue is simple, and as someone working in the humanities, within an organization with first-class experience in massive metadata creation and standards projects I believe I can summarize for you:

The problems have NEVER been, and will NOT be anytime in the near future, technology-addressable.

Yes, once you have proper, usable, controlled metadata that has anticipated all the use-cases of all possible interactions, then metadata standards (such as the future MPEG standards) do have some relevance when it comes to tying in several systems under a single data model.

The real issue, and what David, Charles, and others have been reflecting on, is that the actual process of creating the metadata is about 1000 times more labor intensive than the efforts involved with merely implementing one metadata technical standard or another:


1) you must perform an exhaustive analysis of your uses cases; this is HIGHLY content specific, has massive cultural implications (someone in the US searching for salsa dances may even inflect their searches differently enough from someone in Canada, due to cultural and linguistic biases), and is simply not something to be underestimated. Even making a relatively small (thousands) image catalog effectively searchable requires enormous insight into the minds of the users. Making video segments effectively searchable requires at least an exponentially larger ammount of effort, if you are to be enabling searching on motion attributes, such as dance steps where you must bracket everything in time (another dimension of cataloging).

2) you must translate your findings into effective controlled vocabularies; in other words, develop your own semantic standards for describing all the unique aspects of your content you wish to index.

3) you must hire a lot of folks, or find a generous university with a good library sciences program full of willing potential interns to do your cataloging (indexing). These interns must be rigorously familiar with both your own semantic standards, as well as the subject matter at hand. This means massive training and qualiy control, auditing all along the way, and testing testing testing (usually you run an initial pilot with a tiny subset of your material, test your use cases against that, and then scale and test repeatedly).

4) you must constantly iterate your efforts, recataloging obsoleted content (due to changes in your metadata models to accomodate new elements not anticipated early on), reassessing your data in light of new use cases, etc


NONE of this, as you can see, is yet technologically limited, nor is particularly ripe for technology solutions at this point. I am not a great dreamer of sci-fi scenarios, so let's just say that for the near future, there is no conceivable or even plausible solution to these generalized problems. Computers are simply not good at analyzing content for indexing, beyond simple (and unfortunately over-hyped) applications such as transcriptions of certain TV programming using speech recognition.

Therefore, the answer to your dreams is NOT MPEG7 or MPEG21 - it is brute force, I am sorry to say. The choice of a particular metadata model is trivial compared to the difficulties of even beginning to engineer the processes I describe above; and that was hardly an exhaustive description of a complete process - instead, I just touched on some of the big points and their relative magnitude of effort. The choice of media architectures (Quicktime, MPEG-4, Windows Media), database platforms, and even data encoding is completely superficial to the majority of your task; the generation of your content, and then the appropriate cataloging, is the hard part - and always underestimated and underplanned for.

In fact, if you dream and truly desire such capabilities in the future, the only way you'll get there within your lifetime will likely be to begin now; keep your metadats in a neutral format, and not even think of selecting a final implementation platform for any of the technology components (metadata repository, search engine, media formats, etc) until  you've accomplished 1, 2, and 3. If you plan well, you can always migrate your data to a new standard as it is available - but waiting for that standard to be complete until you even bother will get you exactly nowhere.

If only it were as simple as MPEG-7, MPEG-21, BIFS, Intermedia, or anything else - the problem would just be an issue of bits and wires. But this is a human-level problem, as deep and ancient as it gets. If it were this easy, we'd all be tapped into a massive digital repository of the entire wealth of human knowledge, with the ability to instantly query in  natural language for anything we desire to know or experience.

Unless you believe we'll all be talking to our AI-based wristwatches in 20 years, with complete contextual and semantic flexibility that even many humans seem incapable of, then what you're looking for is not a technology product; it is manual labor. Computers are great at processing data, but simply terrible at creating original and useful content.

Best,

Roger Howard
Digital Media Specialist
The J. Paul Getty Museum

email@hidden
310.440.6908
_______________________________________________
quicktime-talk mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/quicktime-talk
Do not post admin requests to the list. They will be ignored.



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.