Lazy Preservation: Reconstructing Websites for the Lazy Webmaster.
Proceedings of ACM WIDM 2006. November 2006
F. McCown, J.A. Smith, M.L. Nelson, and J. Bollen.
Download: lazyp-widm06.pdf
Backup of websites is often not considered until after a catastrophic event has
occurred to either the website or its webmaster. We introduce “lazy
preservation” – digital preservation performed as a result of the normal
operation of web crawlers and caches. Lazy preservation is especially suitable
for third parties; for example, a teacher reconstructing a missing website used
in previous classes. We evaluate the effectiveness of lazy preservation by
reconstructing 24 websites of varying sizes and composition using Warrick, a
web-repository crawler. Because of varying levels of completeness in any one
repository, our reconstructions sampled from four different web repositories:
Google (44%), MSN (30%), Internet Archive (19%) and Yahoo (7%). We also
measured the time required for web resources to be discovered and cached
(10-103 days) as well as how long they remained in cache after deletion (7-61
days).
@article{mccown:lazyp,
author = {Frank McCown and Joan A. Smith and Michael L. Nelson and Johan Bollen},
title = {Lazy Preservation: Reconstructing Websites for the Lazy Webmaster},
journal={Proceedings of the eighth {ACM}
international workshop on web information and data management ({WIDM})},
year = {2006},
month = {November}
}