October 10, 2007 – 12:33 pm
Those following the Collection Improvement PoC might be interested in the edna Content Standards overview [also a PDF version 213Kb], - this describes policy issues that the PoC team are trying to automate.
October 8, 2007 – 12:21 pm
In order to help people to get away from ‘production think‘, I’ve been struggling with trying to precisely define a Proof of Concept. It’s almost easier to define what it’s not…
Back in March this year, I thought:
A high risk, high trust and low governance project that creates a conceptual solution for the client. Focus solely on the concept. Forget security. Forget firewalls. Forget QA etc. What is delivered is a bare-knuckled prototype which demonstrates how it could solve the issues your client deals with.The project has no limits other than Time (approx 3 months) and Resource a small team working part time within a small financial budget. Concluding each POC month is a ‘Show and Tell’ to an audience of sponsors, users and internal teams - each one being more public than the last.
Not satisfied this with mouthful, I had another go in early October:
“A practical process designed to discover and focus on the core of what and how to innovate.
By placing people in a deliberate environment of trust without real-world constraints other than limited resource, timelines and lightweight project management, a PoC nurtures innovative creative solutions.
Chatting with Mike Seyfang - he would also chip in:
- a vehicle for embracing risk
- an environment that creates space in work for play “expect 6/10 to ‘fail’ but 3/10 might just be ‘hum dingers’
Then again he had the urge to spurt out:
3 people - 3 months - $30k
…there’s a certain appeal in the above’s simplicity.
What’s your thought’s about trying to define an innovation process?
October 8, 2007 – 12:05 pm
“We want to buy it” was one reaction from a guest at Show and Tell 2.
This is a common reaction to most Show and Tells; something which I take a sign of success (we’re on the right track) yet cringe with the thought ‘no - it’s a PoC; a prototype rather than a product‘.
Judging by the number of good questions, the audience were engaged in the topic and saw potential in what was been demonstrated.
- Question: “Are we looking to automate the entire resource discovery process”
Answer: “No. We can’t. The IOs are irreplaceable. The skill of the IO to analyse and synthesise simply can’t be effectively replicated by algorithms. The algorithms developed can reduce the number of irrelevant sites but there will be a number of false positives and we will lose a number of false negatives.”
-
- Question: “Are we going to tell the users we are harvesting their tags?”
Answer: “No. By revealing who we are tracking could distort the results - Del.icio.us is just one open, programmable human reviewed repository of websites. We are simply compiling a selective list of candidate websites - not users.”
-
- Question: “Are we going to use the suite of tools on the edna collection”
Answer: “Yes - as an interesting exercise in checking consistency of edna data considering the number of IOs who have entered the data over the years.”
-
- Question: “Is there a User Interface for this?” [TC: interprets as “a programmable via the web”]
Answer: “No. This is for internal use. Nice idea though ~ could expose this to the web programming interface to mash things like websites and feeds.”
Our reaction to how Show and Tell 2 went, the team came to some conclusions:
- Run through was essential, it identified areas requiring
- clarification
- missing elements
- structural changed
- Provided confidence when presenting
- Remember get to the shiny stuff quickly:
- break down each steps
- show what you did and then explain the theory as you go.
- Use a variety of approaches:
- Show it and then Tell it.
- Statistics are interesting if you wrap meaning to it.
- Ground it in reality: create benchmarks for your innovations
- Don’t be afraid of being to technical: but do focus only on what’s core, using plain English.
- Be as visual as possible: avoid unnecessary ‘text on screen’ presentations
- Engage with the audience: ask them questions.
September 24, 2007 – 3:59 pm
Today’s blog post comes courtesy of Sarah
Show and Tell – although most of NCVER people can’t come we will go ahead on Fri 28th. Vaughan will invite the Flinders people. We will take 30 min, from 4 to 4.30, followed by drinks. Sarah will send general invitation to ed au and book Jo’s Café. We discussed what we will show – see 3 below.
Pru reported she has started looking at the data that Nick sent from Edith – approx 2000 records. She has put into an access database with fields to enter decisions and comments. She and Sarah have developed some criteria for rejecting records; the idea is a record will be rejected outright, based on edith fields only, or then sent to IOs for human perusal and evaluation. This is a research exercise to see e.g.
a) can a valid rejection decision be made like this?
b) What criteria are sensible to use?
c) What other rules occur to us as we look at the data?
d) Will different people make the same decision?
e) How would a machine do in comparison?
Pru, Sarah and Nelly will send Vaughan the suggested rejection criteria. We decided that Sarah, Pru, Nelly and Vaughan will look at 100 edith records that Nick will supply. If possible we would like 50 coming from taggers and 50 coming from tags, otherwise as random as possible. They will be sites that are not already in edna. We can also compare a machine that looks at the content of the web site to the human evaluation of the edith record only and the machine evaluation of the edith record only. We can also look at the delicious tags (Nick may be able to harvest these for the 100 records or else we will do it manually, taking a segment each) and see whether the delicious tags would have made a difference to the evaluation decision (or will they only be useful in the 3rd stage of metadata creation).
(Sarah queried what happens if someone has reset the quality vs quantity dial – are all the records streamed from edith changed until the dial is changed again? The answer is yes (there is no “universal” data stream that continues to be harvested). We agreed this might need to be different in a production/implementation of the idea but for the proof of concept at present it doesn’t matter.)
3. For the Show and tell we will:
- Outline the overall concept again as we did in SnT1 – we need to do this to get it back in people’s minds as well as allow for new people there, internal and external.
- Focus then on what we can do to analyse/evaluate/filter the data thus harvested from edith. We can mention again that some records have already been identified (manually) as useful and added to edna. We want in the second stage to analyse the data to evaluate it and see how we can take the best for edna and reject the others, using automation as far as possible.
- We can talk about the comparison and research we did manually.
- Another is the directing of records to the appropriate sector officer. We can mention these. We will hope in this part of the Show and Tell to get some ideas from the audience about ways we could analyse the edith data and questions they may have about it.
- Perhaps flag the 3rd stage: mention the harvesting of delicious tags for metadata, mention the ESP game (or do we want to keep that for a big surprise at the 3rd show and tell? I am inclined to think we should save it for the end).
4. Next meeting
We agreed to meet on Tuesday as Thursday won’t give enough time to prepare the presentation. Nick can’t be there but most of the technical work for this part is Vaughan anyway.
September 14, 2007 – 4:15 pm
Time to get the elves and the shoe maker busy again.
Required research: (a gem via Vaughan)
“A valuable way to gain understanding of a complex network is through the identification of its important, or prominent, nodes.”
To get to Show and Tell 2:
- Lock in 28th September as Show and Tell 2 date
- VH - development
- inclusion and exclusion criteria
- candidate ranking
- ALL - SnT preparation
- NI & SH - Mark Booker and IO Jade Story
- TC - SlideShow
- TC - to create EDITH conceptual map in consultation
- NL - provide PM with a CSV dump for evaluation criteria analysis [Done].
- TC - check edna contract for where EDITH may assist edna KPIs.
- ALL - Place ideas into Carpark
To get to Phase 3 ‘Describe’:
- Schedule developer time with Technical Proejct Manager (done)
- Metadata game game-play refinement.
September 14, 2007 – 3:23 pm
Yesterday’s meeting touched on the benefits of a strong vision. The vision provides a yard stick to measure all ideas against.For the evaluation stage, I started rambling on about ‘there should be an option for a IOs to tick if the resource is an event or document and therefore present a different set of metadata fields’.
Wait - remember the vision says:
“Collection improvement through user engagement and better metadata tools”
EDITH is a metadata Tool which is principally
- automates discovery of the wise in the crowds amongst the masses
- does limited automated metadata description
So….. reflecting on this, EDITH should not require IO intervention for a result to be produced. Therefore it does not require a user interface. It simply provides a list of resources with metadata that can be exploited using other collection management tools such as DSpace, RSS, etc.
Click! Ahhhh! EDITH is a resource metadata automation tool. Any sort of manual intervention in the flow of the process should be eliminated.
This gets us to a simpler, more elegant and closer to what’s really needed.
Thus the vision keeps us honest and guides us along the right POC path.
August 31, 2007 – 2:40 pm
Through EDITH, your candidate educational website will be in the company of a lot of others: currently EDITH is generating over 250 website candidates a day!
With so many websites, which ones should Information Officers add to the collection?
In last week’s POC meeting we realised that the existing EDITH selection path for candidate website would not best reflect the edna collection policy.
Yes it would filter out the rude sites and non-english based resources - however it ignores other collection policy criteria such as focusing on Australian based materials.
We will seek to rank candidate website based on the collection criteria: for example:
- domain: .edu.au and .gov.au provides copyright license information for Australian educators
- extensions: different media and document types can be promoted/demoted
Applications:
- given when a candidate website achieves a specific ranking, an IO can be alerted
- candidate websites are ordered based on collection policy & are most likely to be catalogued
Extensions:
- given a theoretical ’second tier collection’ of non-quality assured resources, users might review & rank resources which would boost their ranking, providing a path to be promoted into the principle edna quality assured collection.
In other thoughts:
- principle issue of the ‘metadata game’ is creating a game play that’s interesting. Facebook could provided a forum to host this.
- We need to prep the Show and Tell 2
August 31, 2007 – 1:07 pm
Last month I challenged you to think how we could describe website and online resources accurately.
Here’s our thinking for the metadata ‘describe’ phase: to turn describing websites into a game that anyone can play online.
Here’s how we got there:
1. http://www.mturk.com/mturk/help?helpPage=whatis Mechanical Turk is probably one of the best known early web based crowdsourcing systems used to get people to describe (metadata) that is normally very difficult for a computer to do well.
2. http://recaptcha.net/ is a cleverly combines completely different tasks
a) checking that a human is completing a online form (and not a computer)
b) improve the accuracy of digitising of books through character recognition
What’s interesting here is how it capitalises on a moment of human skill (recognition) that is ordinarily being wasted on a common website verification task (CAPTCHA) that is occurring on the scale of the internet.
3. http://images.google.com/imagelabeler/ is a simple game that is strangely addictive, helping improve the accuracy of image description.
August 22, 2007 – 6:14 pm
I got very excited about how the structure of the web is changing under the momentum of web 2.0. The penny dropped when I realised Flickr, YouTube, Google and Del.icio.us, Yahoo Pipes aren’t websites - they’re fundamental web 2.0 service infrastructure.
Those organisations who doing great things in the web2.0 space are those who do one thing and doing it very well. The service often is characterised by openness: empowering everyone to put stuff in, describe and get stuff back out in order to mash, learn, share and collaborate.
It used to be considered poor form for websites to sponge media from other website servers. Flickr, YouTube, Google, and Del.icio.us (including RSS and website APIs) have inverted this. Now these sites positively encourage you to embed content and applications stored on their servers.
The websites of each of these are not really useful in themselves. Their true power really is their capacity to
- mash, repurpose and republish content.
- draw connections between people directly and serendipitously.
This is why (in my mind) they are the key building blocks for the new Web 2.0 world.
What’s could be some key Web 2.0 infrastructure services for education? Some initial thoughts:
What do you think are the key Web 2.0 infrastructure elements for education?
The regular reader of this blog would know the team wants to make what’s hard about collection management easy. This week’s meeting asked the question what’s hard about meta-data.
The answer is of course is creating ‘correct’ metadata. For example how would you extract the subject out of a website, PDF, Word doc, audio file etc etc.
We could do it modifying brute force tools such as:
However! Your challenge is to tell us how you think we can do it.
We have a few clues of what we are currently thinking.
Have a look & play with:
Leave a comment on how you think we can create ‘correct’ metadata.