Trove: mapping Australia’s culture where Google fears to tread


What do you get when you combine an open-source search engine five dedicated software engineers and the combined artistic resources of over 1000 libraries museums and other cultural institutions? In this case you get Trove – a National Library of Australia (NLA) initiative whose creators are calling it the Google of cultural heritage institutions.

Three years in the making Trove is an offshoot of the Australian Newspapers Digitisation Program a massive effort that has digitised and made available online 17 million historical articles from Australian newspapers between 1831 and 1954 since it began in March 2007. That program is set to catalogue 40 million articles by next year.

Trove developed by many of the same staff manages metadata on over 90 million historically significant items including pictures unpublished manuscripts books oral histories music videos research papers diaries letters maps archived Web sites and newspapers from 1803 to 1954.

“In the same way that Google would harvest Web sites we’ve set up harvesting across cultural heritage institutions” explains Trove project manager Rose Holley. “The technology behind the scenes is similar to Google but we’ve achieved a single search and Google isn’t quite there yet.”

A search for Banjo Paterson for example turned up 337 books journals magazines and articles; 145 pictures and photos; 193 music sound and video results; 4474 mentions in Australian newspapers; 487 archived Web sites; 18 diaries letters and archives results; and 49 biographical results.

Gallipoli was even better represented (see the search results in image below) with over 38000 newspaper entries and 8000 images among the treasures indexed in the site.

Trove isn’t archiving the content itself but manages metadata about indexed content. Thanks to a complex array of back-end connections with participating institutions – as well as with online databases like Amazon Flickr Google Books and the Australian National Bibliographic Database – new content added to those institutions’ collections is automatically referenced in Trove.

Users wanting to access the actual items will be taken directly to their location online and the NLA has also effected agreements with a range of booksellers to help users source certain available materials. “There should be no dead-ends” says Holley.

Trove is based on the open-source Apache Lucene search engine an offshoot of the Apache Web server project built from the ground up in Java. The NLA’s implementation organises content into eight conceptual categories to help users narrow down their searches.

“This is aimed at the general public of Australia and anyone who wants to find information on by or about Australians” Holley explains. “It has been quite challenging getting it to cope with someone like a schoolchild doing simple keyword searching. But we’ve had really really positive feedback from the public.”

The site is only officially launching now but has been in soft-launch status since December and is attracting about 500000 unique visitors per month – reflecting both demand for the information it’s providing and acceptance of the site as it has evolved so far.

But Trove is a moving feast with new builds every two weeks “guided by the feedback we’re getting from the public” says Holley. “Our big thrust with Trove is to get as much digital stuff as possible: we’re on a drive for more activity from museums archives and art galleries in particular and we’re talking with the ABC about making its fantastic resources available through Trove.”

“Libraries and archives have been digitising letter maps diaries newspapers and so on for a long time and we know that public information-seeking happens. Many people just want to see what they can get – and Trove provides access to unique Australian resources on the deep Web that you wouldn’t find elsewhere.”