| Project Summary
This proposal seeks $50,000 to support a user needs assessment to assist the U.S. Federal government documents community migrate from a print-based to a digital collection program. In particular, we request funds to analyze the bibliographic structure of document types, the social workings of collection development, the legal requirements, and the economic viability of adapting the LOCKSS peer-to-peer distributed archiving technology to the Depository Program. We will use this information to assess what further technical research and development will be needed. This technical work will be the subject of a prospective e-government grant proposal to be submitted October 2003. LOCKSS is designed to preserve access to web-published documents such as e-journals. Currently in beta test for materials that are intended to be stable over time, LOCKSS allows individual libraries to take custody of stable content delivered via HTTP and safeguard their community's access. LOCKSS ensures that such locally held content maintains its integrity through a polling and reputation system. LOCKSS is designed to run on inexpensive, consumer-grade hardware and to require almost no technical administration. The software is distributed as open source on www.sourceforge.net. Funding of this proposal would allow Stanford, in concert with selected members of the government documents community to explore the technical, economic, social and legal viability of various LOCKSS architecture models for the GPO depository program. The nature and significance
of its potential impact on the field Project Description Introduction LOCKSS is designed to preserve access to web-published documents such as e-journals. Currently in beta test for materials that are intended to be stable over time, LOCKSS allows individual libraries to take custody of stable content in all formats delivered via HTTP and safeguard their community's access. LOCKSS ensures that such locally held content maintains its integrity through a polling and reputation system. LOCKSS is designed to run on inexpensive, consumer-grade hardware and to require almost no technical administration. The software is distributed as open source on www.sourceforge.net. Funding of this proposal would allow the Stanford LOCKSS team and selected members of the government documents community to explore the technical, economic, social and legal viability of various LOCKSS architecture models for the GPO depository program. Problem Statement Congress established the Federal Depository Library Program (FDLP) in 1860 to ensure that the United States public has access to its government's information. Authorized by 44 US Code Section 1902, the program involves the acquisition, format conversion, and distribution of depository materials and the coordination of Federal depository libraries in the 50 states, the District of Columbia and U.S. territories. The mission of the FDLP is to disseminate information products from all three branches of the Government to more than 1,350 libraries nationwide. Libraries that have been designated as Federal depositories maintain these information products as part of their existing collections and are responsible for assuring that the public has free access to the material provided by the FDLP. The program operates on a two-tiered structure. Fifty-three of the 1350 libraries have "Regional" status: these libraries automatically receive every document distributed under the program and are expected to maintain access to the material in perpetuity. The remaining 1300 libraries have "Selective" status. Selective libraries acquire a selection of materials that reflect the needs and interests of their local constituencies, and after five years have elapsed individual libraries may withdraw material after it has been offered to other selectives in the region. Selective libraries do not acquire documents at the individual title or piece level. Rather, they identify broad categories of material from specific agencies in which they are interested as designated by "item numbers" listed in a GPO basic document, the List of Classes of US Government Publications Available for Selection by Depository Libraries. As a result, each of the 1300 selective libraries in the program possess "profiles" involving receipt of as low as 2 and as high as 99 percent of the available material. The FDLP as described ensures long-term public access through a geographically dispersed network of repositories to government information on print, microform and tangible electronic media (e.g., CD-ROMs, VideoDiscs, magnetic tape). These repositories serve as guardians and trusted repositories for the content and the public's right to unfettered and efficient access to the content. Increasingly, government agencies are producing less paper and relying on digital versions of documents made available solely on government servers. At present, there is no program for the systematic distribution of these electronic documents through the depository program. Long-term public access to this increasingly significant body of content is shifting from the model of a distributed set of trusted repositories (including public, academic, and government agency libraries) to a centralized model. For example, rather than distributing materials to depository libraries the Government Printing Office instead provides access to a large electronic collection through its own server, GPO Access. No longer is content distributed among a number of autonomous, trusted repositories of content, but content is increasingly geographically concentrated, with a small number of Federal servers located largely in the Washington, DC area serving everyone via the Internet. The shift from a highly organized system of distribution is further compounded by the fact that government departments, agencies, and in some cases individuals working within agencies maintain a considerable amount of autonomy in populating and managing the many websites that comprise the government Internet domains. As a consequence there is very little consistency across or even within government agencies with respect to the way in which materials are created, made available, and maintained. At the agency level activities such as content refreshment (replacing old with new), content organization, budget reductions, loss of interest, or shifting politics, have a significant impact on the public's ability to find and use government information. An informal search on Google highlights the problems of access and persistence. Some government documents can be found through Google, others cannot. There seems to be no pattern as to which are indexed. Some government documents are available through academic institutions and seem to be no longer available, or at least are not easily located on U.S. government sites, even using such specialized search engines as GPO Access, Google Uncle Sam, Firstgov, US GovSearch, SearchGov.com and Fedworld. To assure their communities retain long-term access to this important literature, depository libraries must identify and adapt an inexpensive, robust, and independent mechanism for securing government e-documents. Such a mechanism could establish for the digital age the advantages (both operational and philosophical) of the distributed model that served the paper document depository program successfully for so long. We believe that LOCKSS may well be that mechanism. Proposed Work The LOCKSS project has proven the concept that a distributed peer-to-peer archiving system can capture and maintain scientific e-journals. However, this application of the LOCKSS principles makes a number of assumptions that are not true in the realm of government documents. The purpose of this planning grant is to determine whether or how well the LOCKSS technology can be adapted to the U.S. Federal Government Depository Program by conducting a yearlong user needs assessment. Members of the "govdocs" community will provide the assessment in consultation with the technical team. There will be two deliverables from this grant: A document specifying the needs of the community in the three areas outlined below and a technical grant proposal based on these needs to be submitted October 2003: I. Bibliographic Content
Structure
II. Social and Economic
Aspects of Collection Development
III.. Legal Aspects of Collection
Development
Project Participants Stanford University Libraries
Staff Government Document Partners This project will involve eight government documents partner institutions representing a range of community interests. Six of the eight partners are Federal depository libraries with significant print collections and a strong interest in long-term preservation and local management of content and access. These six institutions include two regional depository libraries (Colorado and Minnesota) and four selective depository libraries (North Texas, CSU San Bernardino, US National Agricultural Library and Stanford). Although not a formal depository, the California Digital Library (CDL) represents and supports a network of twelve selective depositories. The CDL is currently investigating technical models that would support long-term preservation and access to government information. The National Agricultural Library (NAL) is one of four national libraries. NAL is managed by the US Department of Agriculture and provides a unique dual role as both an active "publisher" of government information as well as a "library". The US Government Printing Office (GPO) will serve in the critical role as agency sponsor for this project. GPO is the Federal agency designated by statute to manage the Federal Depository Library Program. The GPO has in recent years begun developing and managing electronic government documents on in-house servers, with access to this Electronic Collection provided via the GPO Access interface. As a result of these roles, GPO is uniquely positioned to explore legal and technical aspects of the extension of the LOCKSS model to support distributed archiving of electronic government documents. All parties will be active participants in the project, helping to define the technical requirements for adapting LOCKSS technology to government documents content as well as identifying the legal and community issues involved.
Other Relevant Projects Other relevant projects (CDL, other NSF, Mellon, CRL, OCLC/GPO). ). The Stanford Libraries staff are aware that other players are looking into development of solutions in the area of government documents. According to the 1 July 2002 "Proposal to review technologies for acquiring, assembling into meaningful research collections, and persistently managing the web-based documents of the US Federal and State Governments," submitted to the Andrew W. Mellon Foundation:
Ours is a parallel development, informed by and in communication with the CDL team and others, based on quite different technological models. We have initiated a dialogue (as of late July 2002) with CDL to explore areas of common interest and sharing of issues and challenges. Similarly, the Government Printing Office has announced a collaboration with the Online Computer Library Center (OCLC) to explore vehicles to manage electronic documents. Given the key players, SDSC and OCLC particularly, we can safely predict that their approaches will be oriented toward small numbers of large databases, centrally managed, as opposed to the LOCKSS model of many decentralized instances of smaller systems. Though LOCKSS' different philosophical, social, as well as technological models, we intend, at a minimum, to provide a different perspective on the problem set and to contribute to a broader, possibly more future oriented, analysis of the problem space and palette of available solutions to them. About LOCKSS The LOCKSS (Lots of Copies Keep Stuff Safe) project was initiated in October 1998. Based on Java[tm] technology and Linux, the LOCKSS system is an open-source, easy to use, distributed system, which runs on low-cost computers without central administration. Designed as an Internet appliance, the LOCKSS system preserves access to authoritative versions of web-published materials, applying contemporary automation to the old idea of preventing loss by multiplying copies. The PC runs an enhanced web cache that collects new issues of the e-journal and continually compares its contents with other caches on other participating computers. If files have been corrupted or altered, they can be repaired or replaced with intact copies from the publisher or from other caches. The LOCKSS program is currently
in a worldwide beta test focused on integrity, usability, and software
performance, including impact on network traffic. The beta software has
been released as open source, and is available on www.sourceforge.net.
More information about LOCKSS and the beta test can be found at http://lockss.stanford.edu Currently, a total of 42 Publishers and 56 Libraries - including the Library of Congress-are testing the system to protect the integrity of, and maintain permanent access to valuable electronic data. The LOCKSS system makes it feasible and affordable, even for smaller libraries, to preserve access to the e-journals to which they subscribe, and safeguard their community's access to it. Individual libraries can also monitor the level of redundancy within the system. The LOCKSS system makes it feasible and affordable, even for smaller libraries, to preserve access to the e-journals to which they subscribe, and safeguard their community's access to it. Individual libraries can also monitor the level of redundancy within the system.
|