Elephants
and dung-trucks
Nicholson
Baker’s controversial book Double Fold shone an unwelcome spotlight on
preservation activities in libraries. In the last issue of
Information Today,
however, I suggested that rather than shoot the messenger, librarians should
recruit him to their cause.
What
is surely certain is that preservation is set to become ever more controversial.
Baker’s primary concerns relate to the use of microfilm for reformatting
brittle books and newspapers, but as librarians begin to grapple with digital
preservation they will become the target for much greater and more critical
public scrutiny.
Why?
Because preserving digital materials is far more difficult than dealing with
brittle paper. Moreover, no one yet knows how to do it effectively. As Jeff
Rothenberg, a senior research scientist at the RAND Corporation, pointed out in
a 1999 report for the Council on Library and Information Resources (CLIR),
unless the matter is addressed urgently “our increasingly digital heritage is
in grave risk of being lost.”
Far
more fragile
The
nub of the matter is that digital materials are far more fragile than brittle
paper. As Abby Smith, director of programs at CLIR, puts it: “Brittle paper
was a problem of a fragile medium, but you can rescue the information by
reformatting it on to another medium. Digital information, by contrast, has so
many more dependencies that paper-based materials do not. You have hardware that
is always becoming obsolete; and you have software that is also becoming
obsolete.”
That
is, unlike paper or microfilm — where the meaning is transparently inscribed
on the surface of the medium — digital documents are opaque bit streams only
understandable to humans when interpreted by a machine. The hardware and
software needed to do this interpretation, however, is constantly superseded.
There have, for instance, been more than 200 digital storage formats alone
deployed since the 1960s, with none lasting more than 10 years.
As
Baker expresses it: “If you put some books and papers in a locked storage
closet and come back fifteen years later, the documents will be readable without
the typesetting systems and printing presses and binding machines that produced
them; if you lock up computer media for the same interval (some once-standard
eight-inch floppy disks from the mid-eighties, say), the documents they hold
will be extremely difficult to reconstitute.”
And
it is not just text: images, video and multimedia files are all at risk. The
extent of this threat was graphically demonstrated to the British Broadcasting
Corporation when, fifteen years ago, it decided to celebrate the 900th
anniversary of the 1086 Domesday book by creating a huge digital archive to
depict life in the 1980s. Costing £2.5 million, the project involved around a
million people in Britain.
Once
complete the results were stored on 12in videodiscs designed to be read by the
Acorn BBC computer. A decade and half later, however, the discs were obsolete
and unreadable, and the Acorn computer a museum piece. While the data was
eventually recovered, it required a time-consuming digital archaeology effort to
do so.
Moreover
had this work been put off indefinitely, at some point the BBC data would have
become irrecoverable — since without constant nurturing, digital files
eventually become as unreadable as the Linear B and Cuneiform scripts were to
the modern age before archaeologists deciphered them. As Baker put it in Double
Fold: “We will certainly get more adept at long-term data storage, but
even so, a collection of live book-facsimiles on a computer network is like a
family of elephants at a zoo: if the zoo runs out of money for hay and bananas,
for vets and dung-trucks, the elephants will sicken and die.”
The
good news is that governments and libraries are beginning to act. In February,
for instance, the Librarian of Congress received approval for the National
Digital Information Infrastructure and Preservation Program (NDIIPP) — a
project for which Congress appropriated $100 million in funding.
The
aim of the NDIIPP, says Guy Lamolinara, confidential assistant to the associate
librarian for strategic initiatives at the Library of Congress, is to “develop
a national strategy to collect, archive and preserve the burgeoning amounts of
digital content, especially materials that are created only in digital formats,
for current and future generations.”
The
bad news is that there is still no known long-term solution for preserving
digital resources, although the quantity of material is growing day by day.
Moreover, despite initiatives like the NDIIPP, there remains a serious funding
shortage.
In
addition, more sophisticated techniques are needed. This can include migration
(“porting”) in which files are updated, or sometimes entirely rewritten;
emulation, where older hardware is mimicked in order to allow old software and
files to run on new machines without having to be re-written; and encapsulation,
where electronic files are wrapped in a digital envelope that describes how the
files are stored, and how to re-create the software, hardware or operating
systems to decode the contents. However, none of these somewhat complicated
approaches has yet been successfully implemented.
Some,
therefore, are exploring alternative solutions. Raymond Lorie, a research fellow
at IBM’s Almaden Research Centre, for instance, has proposed the development
of a “universal virtual computer”. This would require that every time a
digital file was saved (whatever hardware or software was being used) a separate
file was simultaneously saved in a format understandable to the universal
computer. To stave off obsolescence, the specifications of the universal
computer would be compacted into around 10 to 20 pages of text, and distributed
as widely as possible. However, the universal computer too remains at concept
stage.
“There
are many technological issues still to be solved,” says Janet Gertz, director
for preservation, Columbia University Libraries (but speaking in a personal
capacity), “and just as many fiscal issues, since maintaining and preserving
digital files is much more expensive that maintaining and preserving paper
copies.”
In
the meantime, adds Gertz: “Information already has been lost and will continue
to be lost.”
Huge
catastrophes
That
digital preservation is a huge challenge for libraries is incontrovertible. Yet
today the KB is the only library in the world with an operational system focused
on the deposit and long-term preservation of digital publications. The library
has been collecting electronic material since 1994, and at the end of last year
introduced a full-scale electronic deposit process using IBM’s newly developed
Digital Information Archiving System (DIAS) as a “dedicated archiving
environment” into which electronic materials can be transferred. There is,
however, a huge amount of work still to be done before the library can feel
confident about the future, says Steenbakkers.
And
as we head breakneck into the digital future, and more and more library
resources are born digital (with no analogue equivalent), the lack of effective
preservation techniques will become an increasingly serious problem. Not only is
there an explosion of data on the web, but a flood of new eBooks and e-journals
— and increasingly the latter have no hard-copy version. A report
commissioned last year by the UK’s Joint Committee on Voluntary Deposit (JCVD)
estimated that by 2005 the number of “pure” e-serials (with no print
equivalent) alone will have grown from 3,220 to 7,032 (equating to 192,672
e-serial issues annually)
At
the same time, libraries are getting involved in the creation of institutional
archives — an activity enthusiastically promoted by organisations like the
Scholarly Publishing and Academic Resources Coalition (SPARC). The challenge
here, points out Michael Day, of UK-based UKOLN, in a recent JISC-funded report
is that “institutions that set up repositories may not always be aware of
their responsibility to ensure the long-term preservation of content. Even when
they are, they may not have the organisational infrastructure or technical
knowledge to do this successfully.”
What
is undeniable is that an increasing proportion of a library’s role will
consist of the provision and maintenance of digital data. Yet too few have
adequately grasped the preservation nettle. What libraries must appreciate, says
Steenbakkers, is that “the effective preservation of born-digital resources is
not a matter of choice, but necessity.”
Preserving
the published record
But
this is not about the preservation of library holdings alone. As Double Fold
demonstrated, the public also expects libraries to preserve the published
record.
Indeed
national governments specifically task certain libraries to do this. All works
under copyright protection published in the US, for instance, are subject to
mandatory deposit — requirements similar to those in most other countries.
While
historically these deposit requirements were confined to print publications,
many countries are extending the law to cover electronic materials too. In the
UK, for instance, a Private Members Bill was passed unopposed in March that will
make the deposit of digital materials compulsory.
“One
thing we are very, very heavily tied up with thinking about at the moment is web
archiving,” says Shenton. “After all, the remit of the British Library is to
preserve and care for the national published archive, and the web is a form of
quasi-publishing.”
Some,
such as the National Library of Australia, are taking a selective approach.
Every few months the library archives “significant” national websites, most
notably the 2000 Sydney Olympics. The Royal Library in Sweden, by contrast, has
adopted a more comprehensive approach. Since 1996, its Kulturarw3
project has been regularly archiving everything with a Swedish web address.
But
can web preservation be treated as an isolated national activity? How can a
seamless, linked record of our times be salami-sliced by geographical borders
for archival purposes?
Librarians,
however, prefer a more traditional approach. “At the moment there is a debate
as to scope,” agrees Shenton, but adds: “No society has ever collected
absolutely everything. So what we are trying to do is to get collection
development, selection, and retention policies in place.”
To
avoid another Double Fold, however, librarians might be advised to
consult with the public before finalising their digital-age selection policies.
New
responsibilities
And
what about the preservation of electronic publications such as eBooks and
e-journals? Traditional legal deposit assumes publishers file new publications
with the deposit library and walk away, leaving libraries to care for them.
But
as Baker points out, the digital environment introduces new responsibilities.
Rather than simply warehousing digital material, someone is going to have to
actively manage it. After all, what benefit would it be to society if deposited
electronic publications were unreadable within ten
years?
But
who, I asked Leo Voogt, director of global library relations at Elsevier, will
be responsible for the vital preservation work? “The National Library will be
making the technical preservation decisions,” he replied. “That is really
both their expertise and their mission.”
What
these agreements signal is that it will not be publishers who take on the
responsibility of nurturing Baker’s elephants, or emptying their dung-trucks,
but librarians. Commenting on the agreements with KAP and Elsevier, Steenbakkers
says: “The parties have agreed that each will bear their own costs with regard
to the depositing and the preservation of the electronic publications.”
Nor
is digital preservation a matter only for deposit libraries. As more and more
libraries become involved in the creation of institutional archives, so they too
will need to engage in the nurturing and dung-shifting activities demanded by
digital preservation.
Too
little, too late?
But
given the financial pressures libraries already labour under, how will they fund
these new activities? The NDIIPP’s $100 million suddenly begins to look like
being too little, too late. Certainly libraries are going to need a far greater
injection of funding if they are to avert the threatened digital Dark Ages.
They
will also need appropriate funding, says Steenbakkers. “To date
governments have provided very little money for the library and archive
community to get to grips with this issue, either in Europe or the US. In
addition, agencies granting research subsidies are simply playing safe, and
providing only small amounts of money for consultants to write more and more
reports on the problem, rather than allocating the substantial funds that are
essential if we are to develop real-life practical solutions.”
To
add to the urgency, financial pressures are increasingly causing libraries to
abandon print in favour of digital products. “As prices continue to
sky-rocket,” explains Gertz, “it becomes more and more difficult financially
to justify the cost of keeping a paper copy in addition to a digital copy when
almost no one is using the paper copy.”
This
will inevitably exacerbate the preservation challenge, since it will accelerate
the growth of digital-only publications, leading to more and more of the
published record being produced on media for which there is currently no
long-term preservation solution. A development best characterised as alarming.
Librarians
should not forget that it is on their heads that public ire will fall if things
go awry. As Smith commented to Technology Review last October: “People
count on libraries to archive human creativity. It’s important for people to
know, though, that libraries are at a loss about how to solve this problem.”
To
obtain the money they need to find a solution, libraries are going to have to
engage more with the public — as well as with governments and other funding
bodies — and convincing them of the severity of the threat we face.