Mike’s messing with my mind again. Here I was, innocently prep’ing the POC SnT and he waltzes in and drops another concept like a smart bomb for the collection management proof of concept.
While we’re trying to find authoritative social bookmarking users, we’ll be finding those bookmarkers whose sheer quantity of bookmarks can create a false signal.
As Mike points out, what would be cool is if we could find the obscure bookmarkers with resources of extreme value that the masses have overlooked - and take advantage of the ‘Long Tail Effect’.
Some useful reading on long tail effect…
- wikipedia: The Long Tail Effect
- The original wired article
- Chris Anderson who first described this internet phenomenon
Both you and I have a specific educational focus (call this ’signal’) and a few differing interests (call these ‘noise’).
So does a third person; however they might have lots of other interests (call this ‘very noisy’).
Our signal to noise ratio would be very similar - unlike that with the third person. The sheer noise they generate in the current web world would make them ‘popular’ and rise in the rankings.
Doesn’t it make sense that you and I should collaborate together more than with the third person?
We have more in common as evidenced by our bookmarks, flickr photos, online subscriptions etc.
I could use this principle by seek out those who are living in the Long Tail online (obscure) but are are highly selective and therefore highly relevant to what my focus is.
What is signal to me is noise to someone else.
This alternative qualitative approach could help capture those people we’ll miss through our quantity based POC metrics.
Discovery implies recognition of value. In the internet world this is likely to create new ecosystems of interest in old or obscure ideas to feed further creativity, culture and/or innovation. For example: who really invented the LED?
The sheer noise they generate in the current web world would make them ‘popular’ and rise in the rankings.
Another ripper idea for the POC carpark. Thanks Mike for the great discussion
6 Comments
Glad to be of service!
(great summary of our interesting conversation)
Fang
That is a good outline of the value of digging for jewels in the Longtail.
“Discovery implies recognition of value” - It’s keeping the context of that value that’s important.
In the Longtail quality is always subjective and so also will the signal be.
Dave
Interesting, Pru and I have been talking about similar ideas. =)
http://eduspaces.net/janeth/weblog/
Peter Shanks from TALO uses this kind of metric for his project which sifts resources by words which are both rare in the general collection and common in the current document. http://tpu.bluemountains.net
I think Peter’s ideas used in the document sorting project have strong parallels with what we are talking about with regard to people sorting.
I was pondering more about this last night and I think it needs to be refined:.
Focus could be considered as what I consider as ‘very high value’ - more so than the other resources.
It will be easier to recognise others obscured in the Long Tail if I understand what ‘very high value’ means to me personally.
So to bring in D’Lk’s point about context. - it is much more than just saying ‘hey - we have things in common’; it’s saying ‘hey - we highly value the same thing’.
so how do you discover ‘very high value’? In tpu I was looking at individual words in a unit of competency and comparing them with a full text index of the contents of every training package so that the ‘rarest’ common words in a unit got the greatest emphasis (it would probably have been more relevant to try for pairs of words, but that’s another project).
For resource discovery you might try comparing the textual elements of the resource with a full text index of the unit or training package you were targeting, giving a ranking for that resource as far as relevancy goes. Somthing like: “SELECT unitID, MATCH(unitText) AGAINST ‘”.$val).”‘) AS score FROM units WHERE MATCH(unitText) AGAINST (’”.$val.”‘) having score > 0.2″ where $val is replaced by individual words from the resource.
If the number of words returned for the resource as a whole crossed a certain threshold for a unit you could then include that resource’s text in the index to refine it further. Once this index grew large enough, you could apply it against even short postings/tags and have a metric of the author’s ‘value’ with regards to the index for that unit or training package.
Hmm, that’s a bugger to put into a text message - I hope it made some sense.
hehe - I think I follow;
As D’Lk points out “very high value” requires context. I need to know which piece of content is of very high value to me personally. That becomes my yardstick (”a recognition of value”).
Remember I am deliberately seeking out those obscured in the long tail.
If I understand correctly about your point about resource discovery you’re estimating relevancy/’very high value’ statistically rather than making a cognitive match.
There is great power in that approach but I am not sure if that would return a match of those things that is relevant to me. I guess I would have to supply the significant values of ‘unitText’.
I’m looking for a match with someone else who:
* has at least 1 item I consider of very high value
* shares a common focus (or other content on topic)
* shares a similar signal / noise ratio
Post a Comment