Archive What I See Now
By Rashad McDowell
Most people don’t think about what happens to websites that they no longer visit. It’s taken for granted that the internet is eternal, a place where information thrives and lives without end. That’s not really the case. Old information on the net runs the risk of disappearing if the demand for it drops. This adds a level of mortality to the internet that most people would never consider.
Luckily, there are people like Michele Weigle, Ph.D, and Michael Nelson, Ph.D. Together, Weigle and Nelson are working on a project that will add another layer of longevity to information stored on the web: the Archive What I See Now project.
This project, spear headed by Weigle, operates with the same principle as the Internet Archive, the largest data base of archived websites in the world. The goal is to allow the everyday person to make a record of the websites they view in real time. Right now, the target audience is humanities researchers, but the future applications are limitless. For their efforts, Weigle and Nelson have received a grant from the National Endowment for the Humanities.
Currently, anyone who wants to make a record of how the web changes on a day to day or even year to year basis do so using screenshots and/or “save page as”. This creates several problems in the long run. For one, once the website is captured as a screenshot, the ability to interact with the various hyperlinks and embedded media is lost. The second issue is even more practical, it takes up a bunch of space. Over the course of a year, a researcher can amass a folder filled with thousands of images that need to be precisely organized to make any sense.
Weigle’s solution to this problem is threefold. The piece of the puzzle is WARCreate. This program, still under development, allows researchers to click a button on any webpage and create a .warc file, which is the same format used in the Internet Archive. All a user needs to manipulate theses files is a copy of Wayback. WARCreate has three different modes it can operate in: record mode, countdown mode and event mode. Record mode captures each page a user visits while it is active. Countdown mode refreshes and adds a new capture of a page on an interval. Event mode focuses on dynamic changes to a page, only capturing a new copy under these circumstances.
This is taken care of using WAIL. This provides the user with a very simple installation of Wayback, which isn’t all that user friendly to install.
Mink ties everything together. This program informs the user if the page they want to use WARCreate on has been archived before and how many times. This isn’t to prevent redundancy as much as it is about comparing any damage these “mementos” suffer. The web is mortal and pages can be damaged because information isn’t captured or because the live web seeps into older pages.
Mink implements the Memento protocol, which was co-developed by Nelson.
“The trick with memento is it leverages web archives you didn’t even know exist,” Nelson said.
In essence, memento is a macro version of Mink, looking beyond WARCreate archives out into the Internet Archive and public archives of different nations. Together, all these piece coalesce into an impressive system that can prove to be the salvation of the web.
More information on the many projects and research being conducted by the Weigle and her colleagues can be found at https://ws-dl.cs.odu.edu/ .