This Century's Review: Weeding the Digital Library?

A Privileged Source of Information

. home
. authors
. goals
. about us
. archive

. contact us
. search

Weeding the Digital Library?

Mirabelle Madignier

For Paul Valéry, the motto of the library is more about selecting rather than reading (“élire plus que lire”). To withdraw is a critical rejection.

In traditional libraries, this function of de-selection is known as ‘weeding’. As the term suggests, weeding involves removing books that are no longer used. Librarians perform this operation regularly to free up space for more recent acquisitions. Beyond its purely physical functions (removal of damaged works, clearing of stacks), weeding is above all a tool for putting into operation a particular collection policy. Libraries exist, on the one hand, to acquire new works with the aim of responding to users’ demands, and, on the other, to conserve an essential cultural inheritance, that of the written word. In a system in which libraries are no longer isolated from each other but form an organised network, collection policies are ideally diverse but complementary. In the best of all possible worlds, one can thus speak of ‘shared conservation’, where a number of establishments decide to seek an agreement with the aim of distributing among themselves the different responsibilities of conservation and weeding.

But in the era of the digital library, are these modern concepts of librarianship of any relevance?

Although this may not seem to be a question of any great importance, it deserves some consideration. Naturally, there are different perspectives. Digital libraries do not replace traditional libraries and, although digital collections are already beginning to supplement print collections, they are no substitute for them. Nevertheless, digital library projects are increasingly common, and it is legitimate to question the principles of this digitisation.

Of the current crop of digital library projects, some are organised by book production’s established players – authors, publishers and libraries – who have been involved in digitisation for several years. Others, of more recent origin, have emerged on the initiative of corporations connected with the internet, particularly those running search engines. These corporations, which by now have left the university computing labs far behind, have in a few short years become the leaders of the internet; true commercial organisations with a considerable turnover thanks to a business model founded on advertising revenue. It so happens that access to content (whether cultural or some other kind) is one of the crucial stakes in the bitter war currently being waged between the search engines.

It is for this reason that they are running ambitious digitisation projects that libraries alone do not have the resources to undertake. Certain agreements have therefore been reached between the established players of publishing and the search engines. Google’s project (Book Search), certainly the one that has received the most media attention, began at the end of 2004 and envisages the digitisation of 15 million books by 2010.

The project was launched on the basis of agreements with five major library systems: the libraries of Harvard University, Stanford University, the University of Michigan and Oxford University, and the New York Public Library. Google is continuing to canvass potential new partners all across the world, such as the National Library of China, which has just signed up to the project and which will provide 80 million pages (Livres Hebdo 628, p.68).

But Google uses a contestable interpretation of copyright law that has already put its gigantic project at odds with publishers, authors and other copyright-holders. In short, copyright-holders are not consulted about the digitisation of those of their works held by the libraries involved. A number of court cases have already been launched in the USA and in Europe (the libraries concerned hold extensive European collections).

Less well known, the Open Content Alliance (OCA) is another (American) digitisation project, carried out by a powerful consortium that includes the other two search engines (Yahoo and MSN Search).

Learning from the mistakes of its competitor, the OCA intends in theory to respect copyright by using the ‘Creative Commons’ licence, which allows authors to specify the ways in which documents can be used. The other major difference with Google Book Search is that OCA claims to be an open access tool, accessible from any search engine.

These gigantic projects have given rise to worries in Europe about the possible creation of information-access monopolies. Numerous European digital library projects have been launched, causing fragmentation, and the European Commission is now trying to address this situation. For the moment the challenge is to create a critical mass of digitised documents. The national digitisation programmes carried out thus far have been insufficient and a mass digitisation policy is essential over the coming years. The need has arisen for a specialised industry to respond to the demand for digitisation, indexing, storage and preservation.

Digitisation is an extremely expensive process that involves defining precisely not only which documents to digitise, but also what the priorities for such a choice should be, and in what form the documents should appear in order both to respect copyright and to fulfil the needs of the user.

The choices being made should not be either simply commercial or technological, therefore, but also cultural and political.

In conclusion, it is important to remember that digital library projects are still in their infancy.

No one can say what will become of them in 10, 50 or 200 years, but their future looks bright. The digital economy that these library projects reflect has the power to rekindle in the collective imagination the myth of the universal library, containing all the world’s knowledge and accessible to all. It would seem that in passing from the local library, the university library and the general or specialist library to the digital library, direct access is being given to Borges’ ideal: the total library, encompassing all possible worlds. However, this does not prevent us from proceeding methodically and putting the user first when making choices concerning digitisation. Faced with the technological and commercial race underway, it is important to establish clear priorities in those areas in which a demand has already been identified. Moreover, it would be desirable to create a dialogue between the different actors, without which there is a risk of infinite duplication, both costly and pointless (except in those cases where it has been deemed necessary that it be maintained). The concepts of diversification and complementarity must govern digitisation; without this the digital library could become a field with no borders where the weeds grow undisturbed, to the extent, perhaps, of choking the cultures of the written word…

“I suspect that the human species – the unique species – is about to be extinguished, but the Library will endure: illuminated, solitary, infinite, perfectly motionless, equipped with precious volumes, useless, incorruptible, secret.” (Jorge Luis Borges, “The Library of Babel” (Translation James E Irby) in Borges, Labyrinths. Harmondsworth: Penguin, 1970).

[ - ] [ A ] [ + ]