Crowdsourcing and Tumblr at the British Library!

Posted by Jonas Öberg on July 8, 2014
British Library, photo by

British Library, photo by Sifter on Flickr via photopin

The sun was shining brightly over the British Library last Friday, when they welcomed guests to their event Working with the British Library’s Digital Collections and Data. Every year, the British Library Labs put on a competition that challenge people to use their digital collections in new and innovative ways: to present them using new techniques, bring out new meanings in them, or play around with them in other ways that are engaging and worthwhile.

After a brief introduction and presentation by the British Library, the first presentation came via Skype from Desmond Schmidt (Queensland University of Technology) and Anna Gerber (University of Queensland) who presented one of the two winning entries in the 2014 competition: Text to Image Linking Tool (TILT). TILT is an interesting tool that brings development forward on technologies that allow you to scroll the text of a transcribed document page, and have the original page image follow along next to it. That way you can more easily move between original and transcription. You can read a full description here and watch the video presentation here.

Hilarity ensued when Dr Bob Nicholson presented the Victorian Meme Machine, which, aside from having – from Commons Machinery’s perspective – a very cool looking logotype, also shows promise for entertainment of the Victorian kind. According to Dr Nicholson, there’s a misconception in the world that living in Victorian times was a very grave affair. “We are not amused”, as Queen Victoria is rumoured to have said. In reality, it turns out that the newspapers of the day were, if not awash with, at least periodically featuring columns with the latest jokes and funnies. The Victorian Meme Machine aims to shine some light on those jokes and bring them into contemporary society as memes. More information is available here and a presentation video here.

The next two presentations I’ll quickly skip over, since they were about the winners of previous years, and you can read up on what they do here (for The Sample Generator), and here (Mixing the Library. Information Interaction and the DJ).

The latter presentation did include Dan Norton (University of Dundee) referencing late information science researcher Don Swanson and his theory of undiscovered public knowledge, essentially saying that when combining two separate areas of studies, new discoveries can be made in the cross-section, and that it’s the relation between separate objects that are interesting. Dan made this a seminal part of his presentation, and went on to say that the “meaning [of an object] is made by its context.”

A large part of Dan’s work has focused on making tools that people can use to annotate connections between objects so as to better understand their relations, and those connections – which may vary from person to person – help build new narratives. For example, an annotated relation between Queen Victoria and “India” supports the narrative of how Queen Victoria also used the title of Empress of India late in her reign.

In one of the breaks, I had a fascinating talk with Dr Richard A. Hawkins from the University of Wolverhampton, and we reflected on how easy access to research material has become over the years. Not specifically the research itself, but the added value of digitizing and making available the research material itself: newspapers, magazines, photographs and other material that researchers base their work on.

Sara Wingate Gray presented her work with Curatorial, a new way of visualising the British Library collections, using their metadata. It’s interesting (especially for us) to note that Sara started her presentation with a comment on how the metadata of any collection is not currently being digitally exploited. Her work built very much on making use of the metadata that is already available, something that we’re also trying to do with Commons Machinery (and also to develop some strategies for how to improve the quality of the available metadata).

After a brief talk on the British Library’s mechanical curator and work in relation to the library’s publishing of images on Flickr Commons, Ian Harvey came on stage to present the Cardiff University’s work on Lost Visions, a project that seeks to solve some of the issue of working with big data and making it more accessible to people. What I found interesting with their work is the relation between crowdsourcing and machine learning.

We explored this further in the breaks as well, in relation to one of the projects I’m working on, which will seek to find ways to visually compare images and determine whether they are identical or not. What I’m aiming at seems very much in line with their practice: using crowdsourcing to feed a database with information, then using that information to figure out the parameters and algorithms to use for automatically making the same selection. There’ll undoubtedly be edge cases which can’t be processed automatically, and those will be brought into a crowdsourcing platform again, and so on to refine the process.

However, one of the challenges with crowdsourcing seems to be that it often does require some expert moderation regardless. For instance, Leeds Castle isn’t actually in Leeds, but a crowd might easily misinterpret it as being in Leeds if looking to crowdsource geographic coordinates. It’s reasonable to assume that people in the UK with this geographic knowledge would, on a global scale, be in a minority and as such, the prevailing opinion in a crowdsourced collection would probably indicate that Leeds Castle rightly belongs in Leeds.

Peter Balman presented his work on Visibility. It was formerly known as Content Usage Dashboard, which is a bit more descriptive: his idea is to take images from the British Library’s collection, feed that through Google Research Image search, do content analysis on the web pages which are returned, and through that analysis, be able to display a dashboard showing the British Library where their images are being re-used and in what contexts. One of his challenges is that he’ll undoubtedly end up with some false hits from Google, and it’ll be interesting to see from our side how this works out in practice. We know from our own work how difficult this can be, but on an overall level, the accuracy of Google may be high enough that the analysis would still give meaning and valuable insights into re-use.

Peter’s presentation was the last one to go before we summarised the day, both at the British Library and later, over a pint or two at the pub across the road. As I was leaving London this time, the city gave me a traditional farewell with rain trickling down outside, and so, all was well in the world again.

photo credit: Sifter via photopin cc