The Meta-Archive: Project Summary & Final Report

The Library of Congress, National Digital Information Infrastructure Program (Washington, DC). Oct. 2010.
J.A. Smith.
No download available.

For over 5 years, the National Digital Information Infrastructure and Preservation Program (NDIIPP) initiative supported the development of a significant project in digital preservation: The MetaArchive. Based on LOCKSS (Lots of Copies Keep Stuff Safe), the project’s preservation targets were institutionally-unique materials rather than vendor-owned digital products. The LOCKSS model depends on a partnership of several institutions to act as remote archivers for portions of each partner’s digital publications. A key goal of such distributed archiving is to overcome several of the common problems posed by every digital preservation effort. For example, bit degradation and general media fragility can be mitigated through periodic bit refreshment and replication of content across multiple locations. This is the central idea behind "Lots of Copies Keep Stuff Safe"

When the MetaArchive project was proposed to the NDIIPP in 2003, the LOCKSS program had already demonstrated its success as a tool for preserving vendor-copyrighted digital publications, notably the journals Science, the British Medical Journal, and the Proceedings of the National Academy of Sciences. The University Library at Emory University was one of the original LOCKSS partners, together with Indiana University and the New York Public Library. In 2002-2003, Stanford University had proposed and secured support from the Andrew H. Mellon Foundation to further develop the archiving tools used in a LOCKSS network. Recognizing that scholarly content may exist in formats other than the public eJournal form, Emory University Library sought to expand the toolset for broader applicability. Martin Halbert and Katherine Skinner successfully proposed this LOCKSS derivative project to the NDIIPP program of the Library of Congress.

MetaArchive’s plan was to target specific examples of the heritage materials of the project’s part- ner institutions as a proof of concept of the general applicability of the LOCKSS paradigm. This would be the first Private LOCKSS Network or PLN, in contrast with the vendor-supported systems previously implemented. Together with the Georgia Institute of Technology, Florida State University, Auburn University, Virginia Polytechnic University, and the University of Louisville, Emory developed and matured a set of tools to enable the archiving of a variety of materials across the distributed caches of the partner institutions. In short, a Meta-Archive of the content was established via the networked infrastructure of the partners, with content submitted and provided using the generalized tools.

All of the partner institutions have a role in the curation and preservation of southern heritage materials of the United States. Because of this, the MetaArchive’s primary targets were the southern cultural and historical digital records that were most in need of preservation. A conspectus (defined by OCLC as “an inventory of research libraries’ existing collection strengths and current collecting intensity”) was created by the member institutions. Materials included several common con- tent types, from PDF to JPEG, PNG, and other image types, and even the source code to reproduce complete scholarly websites.

The project has enjoyed considerable success in many areas and made important strides toward a practical digital preservation paradigm. Several enhancements were contributed to the LOCKSS system by the project team during the 5 years of the project. They include various security modules, plugins for adding targeted content to the archive, and utilities for the monitoring and recov- ery of MetaArchive assets. Project members have been active in promoting digital preservation strategies through conferences and publications (cf. Appendix A for examples). Further evidence of the project’s success can be seen in the creation and continued expansion of the MetaArchive Cooperative.

As a Private LOCKSS Network (PLN), the MetaArchive is not a preservation panacea, of course. There are costs associated with maintaining such a partnership, and the risks – while distributed – are not eliminated. Content can still disappear from a cache or fail to be effectively replicated. Security, network capacity, innovation that obsolesces existing file formats, and simply locating the preserved object are among the many continuing challenges that every preservation program must successfully address. As the content becomes distributed across more and more partners (institutions) of a MetaArchive, identifying vulnerabilities becomes more complex. Even Google, which has what is probably the most effective distributed digital cache in the world, routinely experiences individual system failures, affecting cached information, and must refresh the content at the source. Partners may also choose to cease participation, and mechanisms need to be in place that assure the removal of their content from the distributed cache as well as removal of the partners’ content from the ex-partner’s share of the system. These removal mechanisms and guidelines have not yet been established. Such problems do not invalidate the benefits of a Private LOCKSS Network, but rather serve as a checklist for ensuring the long-term integrity of the MetaArchive preservation partnership as a whole.