Today’s blog post comes courtesy of Sarah
Show and Tell – although most of NCVER people can’t come we will go ahead on Fri 28th. Vaughan will invite the Flinders people. We will take 30 min, from 4 to 4.30, followed by drinks. Sarah will send general invitation to ed au and book Jo’s Café. We discussed what we will show – see 3 below.
Pru reported she has started looking at the data that Nick sent from Edith – approx 2000 records. She has put into an access database with fields to enter decisions and comments. She and Sarah have developed some criteria for rejecting records; the idea is a record will be rejected outright, based on edith fields only, or then sent to IOs for human perusal and evaluation. This is a research exercise to see e.g.
a) can a valid rejection decision be made like this?
b) What criteria are sensible to use?
c) What other rules occur to us as we look at the data?
d) Will different people make the same decision?
e) How would a machine do in comparison?
Pru, Sarah and Nelly will send Vaughan the suggested rejection criteria. We decided that Sarah, Pru, Nelly and Vaughan will look at 100 edith records that Nick will supply. If possible we would like 50 coming from taggers and 50 coming from tags, otherwise as random as possible. They will be sites that are not already in edna. We can also compare a machine that looks at the content of the web site to the human evaluation of the edith record only and the machine evaluation of the edith record only. We can also look at the delicious tags (Nick may be able to harvest these for the 100 records or else we will do it manually, taking a segment each) and see whether the delicious tags would have made a difference to the evaluation decision (or will they only be useful in the 3rd stage of metadata creation).
(Sarah queried what happens if someone has reset the quality vs quantity dial – are all the records streamed from edith changed until the dial is changed again? The answer is yes (there is no “universal” data stream that continues to be harvested). We agreed this might need to be different in a production/implementation of the idea but for the proof of concept at present it doesn’t matter.)
3. For the Show and tell we will:
- Outline the overall concept again as we did in SnT1 – we need to do this to get it back in people’s minds as well as allow for new people there, internal and external.
- Focus then on what we can do to analyse/evaluate/filter the data thus harvested from edith. We can mention again that some records have already been identified (manually) as useful and added to edna. We want in the second stage to analyse the data to evaluate it and see how we can take the best for edna and reject the others, using automation as far as possible.
- We can talk about the comparison and research we did manually.
- Another is the directing of records to the appropriate sector officer. We can mention these. We will hope in this part of the Show and Tell to get some ideas from the audience about ways we could analyse the edith data and questions they may have about it.
- Perhaps flag the 3rd stage: mention the harvesting of delicious tags for metadata, mention the ESP game (or do we want to keep that for a big surprise at the 3rd show and tell? I am inclined to think we should save it for the end).
4. Next meeting
We agreed to meet on Tuesday as Thursday won’t give enough time to prepare the presentation. Nick can’t be there but most of the technical work for this part is Vaughan anyway.
One Comment
:-)
Post a Comment