The regular reader of this blog would know the team wants to make what’s hard about collection management easy. This week’s meeting asked the question what’s hard about meta-data.
The answer is of course is creating ‘correct’ metadata. For example how would you extract the subject out of a website, PDF, Word doc, audio file etc etc.
We could do it modifying brute force tools such as:
However! Your challenge is to tell us how you think we can do it.
We have a few clues of what we are currently thinking.
Have a look & play with:
- http://images.google.com/imagelabeler/
- http://www.mturk.com/mturk/help?helpPage=whatis
- http://recaptcha.net/
Leave a comment on how you think we can create ‘correct’ metadata.
2 Comments
Brute force might be a decent starting point to get some ’stuff’ into a system, but in the end it’s never going to be very smart. Image Labeler (and recaptcha, which is a very nice idea) are the indicator of the way to go.
Basically you need a Flickr version of everything. Anyone downloading or uploading a paper, say, would tag with meta-data it and it would be taggable by anyone else who reads it. Perhaps you make it a function of being able to download it. It’s the only way to make sense of that amount of data.
There are plenty of audio and video tagging sites out there already (I saw a great video one which I now can’t find).
That still doesn’t rule out junk meta-data, except that if enough people add their meta-data the junk will probably be out-weighed by the ‘correct’ stuff.
This is all pretty obvious though isn’t it?
Viddler was the video tagging/commenting site I couldn’t remember.
Post a Comment