Connect to share and comment
Some of Europe’s oldest depositories are tackling a new challenge: everything on the web.
OXFORD, UK — The Duke Humfrey reading room at Oxford University’s Bodleian Library doesn’t look much different than it did at its founding in 1612.
Weak light trickles in through windows adorned with 17th-century Dutch stained glass. Shelves of leather-bound and gilt-edged manuscripts rise to a wood-paneled ceiling inscribed with the benefactor’s crest and university motto “Dominus Illuminatio Mea”: The Lord is my light.
Scholars still use this room and its 500-year-old volumes. Elsewhere in the library’s stone complex, however, work is underway that could change institutions such as the Bodleian more than anything since the invention of the printing press.
In April, parliament granted the Bodleian and five other libraries in Britain and Ireland the authority to archive not only all print publications in those countries, but digital ones as well.
That means that every piece of digital content produced and publicly available in the UK — every tweet, every Tumblr, every e-book and online magazine — will be swept from the web and archived for future generations.
The legislation, for which the libraries lobbied more than a decade, allows them to update their collections to include the digital ephemera that nowadays document great turns in history as well as the minutiae of everyday life.
It also presents a new challenge for archivists who must corral a torrent of data that’s growing by the second, and organize it in a way that will be useful to future researchers. It will transform institutions whose architecture was designed to preserve ink and paper, not servers and terabytes.
“It’s an awful lot of data and we don’t know how people will use it yet, and that’s an exciting thing,” said Susan Thomas, digital archivist at the Bodleian Library.
The new digital collection has roots in a 400-year-old agreement. In 1610, a wealthy diplomat named Thomas Bodley looking to boost Oxford’s collections struck a deal with the Stationer’s Guild: give us a copy of everything you publish, and we’ll promise to preserve it for the public.
Bodley’s handshake has grown into a system known as legal deposit. Six libraries in the UK and Ireland — the British Library, National Library of Scotland, National Library of Wales, the Bodleian Libraries at Oxford, Cambridge University Library and Trinity College Library Dublin — have the right to a copy of every book, magazine and newspaper produced in those countries.
The British Library takes a copy of pretty much everything. The other five choose what they want. In Oxford, a single week’s intake of books covers a long table in a human-high stack.
They will now be adding to that haul a digital copy of every website ending in .uk and other digital material produced here in Britain. That’s an estimated 4.8 million websites with more than a billion pages at the moment.
Before the April legislation, libraries had no rights to archive digital material — even publicly available stuff — without obtaining a license or express permission of the copyright holder.
“Clearly, when you’re talking about 4.8 million websites, that’s not feasible on a large scale,” says Richard Gibby, legal deposit project officer at the British Library.
The new legislation closes a gap through which much documentation of modern life was falling — the posts and photos that evaporate from the public web almost as quickly as they’re created.
“It lets us capture the kind of material we haven’t had before,” Gibby said. “Evidence of the way life was like in 2013 — what people cared about, what made us laugh.”
The British Library will gather the data in an annual “crawl” of the web and store it in four server farms around the country. The first batch of material is set to go live in early 2014.
Just because it’s digital doesn’t mean it will be available anywhere and to anyone, however. Most of the new electronic material will be accessible only to users inside the library buildings.
As some libraries have already discovered, turning the vast haul of data into something useful will be no small task.
The US Library of Congress signed an agreement with Twitter in April 2010 that bequeathed every public tweet since the company’s inception.
The library has since acknowledged struggles to find a searchable, comprehensive way to organize a collection that’s now 170 billion tweets strong.
“It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data,” the library wrote in aJanuary report.
Beyond the technological challenges, librarians must also predict how future researchers will want to engage with the information.
Thirty years ago, a historian interested in public reactions to an election or a linguist researching a word’s evolution couldn’t have imagined a resource like Twitter. Factor in such things as metadata, and the possibilities for how future scholars might use the collections begin to seem endless.
For people in the information-preservation business, these are heady times. When prime ministers or other notables used to donate their “papers” to the British Library or the Bodleian, the library received boxes of, well, paper: letters, books and other print materials.
Today, digital archivist Thomas says, such bequests include five-inch floppy disks, three-inch floppy disks, USB sticks and hard drives along with handwritten and print materials — the records of life during a media revolution.
More from GlobalPost: Britain plans to charge foreign visitors $4,500
Whether in a letter or a tweet, a Polaroid tucked between pages or an Instagram photo, there’s also always the possibility people may be leaving permanent records of indiscrete or embarrassing moments they may not have intended to preserve for posterity.
That’s where archivists’ jobs haven’t changed, Thomas says.
“This is one of the moral roles of the archivist,” she says. “It’s not so different than with paper — things slip in.”