The internet is like the weather. It changes constantly – and unless it’s properly archived, each old day in web history is lost forever. But archiving such an ever changing, ever growing, ever-renewing and ever-disappearing universe is a huge and difficult task. Our speakers today helped guide us over some of this rough terrain, as well as asking vital questions about what should be preserved, who should be preserving it and why it’s even important to create web history archives in the first place.
Chairing the session, Neil Grindley asked who stands to benefit from web archives, who should be making the archives, and what good practice techniques individuals and institutions can employ to ensure their web pages are properly archived. Most tellingly, he also tried to give an impression of the extent of the issue – a particularly difficult job, because, as he noted “it’s fathomless.” It’s impossible to even get a grasp the quantity of material outputted onto the internet because so much of the “deep” web is hidden behind passwords. One small measure of scale comes from the fact that 71,600,000 pages come up if you do a google search for *.ac.uk
Following on from this idea, Paul Cunnea from the National Library of Scotland described the web as the “final frontier”. He gave a historical overview of the James T Kirk’s who have been out in this ever expanding galaxy trying to keep tabs on it from web-archive projects in California in the early 1990s to more recent projects such as UKWAC and the 40+ organisations worldwide who are in the process of gathering archive material.
He then touched on the thorny legal issues relating to web archives, especially in relation to copyright and legal deposit questions and the issue of whether institutions are even entitled to harvest material from the web – which at the moment seems to be a murky grey area in UK law.
He finished with a clarion call to join the coalition and make sure that you are preserving your material properly – and raised the interesting question of who else should be working towards creating archives. Most museums, it seems, aren’t at the moment – which has the potential to create huge gaps in their collections if they want to be complete overviews of their subject areas.
Kevin Ashley from the University of London also didn’t claim to have any definite answers about archives: but again, he had plenty of interesting questions. Mainly these related to why institutions should care about web-archiving – and how useful it can be given the poor quality of what actually ends up in these archives. He employed the telling metaphor of a library book that has been taken off the right shelf, ripped up, smeared with jam, had its cover thrown away and then jumbled back onto a new shelf. Thanks to restrictions of size, technology and the dynamic nature of “live” web product, archived pages are not the same as the pages we see when browsing through the web. Future web historians, much like today’s medievalists will be left with fragments and broken narratives.
Joking about “a word from our sponsors” Kevin did point out that the POWR handbook does at least contain some useful information on protocol, how to archive and what it might be useful to look at.
The final speaker was Niels Brugger, an associate professor in media studies from Aahaus university, who started by calling himself a “strange Dane” and apologising for the mutilation he was about to do the English language. In fact, he spoke with eloquence and fluency, even when hacking his way through the thorny challenges faced by anyone hoping to make use of archived web pages for their research.
He pointed out that the process of deciding which areas of the web should be archived is subjective, saying “history has to be anticipated already at the point of the archive.” The archived pages too are reconstructions, subjectively created, depending on the different archiving strategies relating to the maintenance of, say, graphics, sound and images (most of which would generally be missing). He also made the fascinating point that websites can change even in the time it takes to archive them (giving the example of a page he wanted to archive previewing a sports event, which had the final result of the contest included on it by the time the archive harvesting was completed). So you can never be sure that everything is in your archive – and there’s always a danger too of getting a sample that never existed on the web.
True to the nature of the talk, when it was opened up to the floor, there were more intriguing questions – but ones which we are as yet unable to answer. Is it effort well spent to preserve old websites vs say, building a new one? What happens to archives when organisations disappear and institutions close down? What happens when individuals who have been maintainging their own records shuffle off this mortal coil? Who should be responsible for the archives? (at the moment, it just isn’t clear) What happens to all the material (like google docs) that exists in ‘the cloud’?
Big questions that will have to be answered soon if the single biggest record of our era isn’t to be lost to the historians of the future.