Save the Libraries – With Open Source

by Glyn Moody

For some in the world of free software, libraries are things that you call, rather than visit. But the places where books are stored – especially those that make them freely available to the public – are important repositories of the world's knowledge, of relevance to all. So coders too should care about them alongside the other kind, and should be concerned that there is a threat to their ability to provide ready access to knowledge they have created themselves. The good news is that open source can save them.

The story begins even before RMS had his idea about the benefits of hackers sharing code, back in 1967, when the Online Computer Library Center (OCLC) was created:

OCLC was founded in 1967 by Fred Kilgour, a pioneering Ohio librarian, with a simple idea: Instead of having every library in the country separately catalog a book -- laboriously entering its title, author, and subjects in just the right format -- why not have one person enter the cataloging information, upload it to a central computer, and then let everyone else download a copy from there?

The kinship with free software and many collaborative content projects like Wikipedia is evident. But the OCLC's WorldCat has not followed the same development path as Stallman's GNU project:

Today [WorldCat] has around 50 million book records. But OCLC, the group that owns and operates it, has been a different story. It started small -- a little office in Ohio, a set of membership dues to share the cost of running the servers. But OCLC's control passed from librarians and academics to business people (its senior executive comes from consulting firm Deloitte & Touche). They realized they had a monopoly on their hands and as costs for running servers have gone down, their prices have gone up. They charge you once to get your records added to WorldCat and charge you again to get them back out and charge you a third time for a whole series of additional fees and services.

And these prices are high. A friend who runs a small public library with around 5000 cardholders was asked to pay $5400 to contribute his records and $700 to get records out, plus a whole series of "User Support" and "New Member Implementation" fees -- all far more than he could afford.

Clearly, the original kinship with GNU has long gone: now, users are expected to cough up not just to use WorldCat's records, but even to contribute. In other words, WorldCat has moved to the Microsoft model, where you have to pay for the program, and also for support in order to file bug reports to improve the program.

As the rest of the post quoted above explains, the situation looks like it is going to get even worse. In particular, it seems that it is going to get harder to export bibliographic data without constraints to other catalogues, including free alternatives such as Open Library.

The open source community has been here before, when the communally-created CDDB database was bought and the terms governing its use were modified, essentially making it hard to export data to other, free alternatives – just as is happening in the world of libraries today:

Things changed dramatically when the open CDDB.com server was bought by a company that wanted to make money from the contributions that users had made. The index file created by the Internet community could no longer be copied. Patents were obtained and granted. A large public outcry resulted, and led to the start of several projects to create an Open Source competitor for the commercial CDDB.com (now Gracenote).

In other words, the open source community simply routed around the damage by not submitting data to the closed CDDB, and by supporting instead free alternatives like Freedb and MusicBrainz, both of which are thriving.

Against this background, an obvious solution to the libraries' dependence on OCLC's central repository of bibliographic information is to start up an open equivalent along the lines of Freedb and MusicBrainz: basic hosting costs are now small, and altruists willing to run mirrors would doubtless start popping up given that the benefit to the community as a whole is so great. Users will rapidly switch their contributions to a truly free database that all can access, and this will soon surpass any legacy system, which will, in any case, find itself cut off from the community it has hitherto depended on (and taken for granted).

Unfettered access won't be the only way that users benefit. As has become evident in recent years, releasing large quantities of primary data for free allows all kinds of innovative secondary uses to be developed, some of which can be exploited commercially – but without needing to close off the primary materials, or charge for them. As a rich and diverse ecosystem arises, users will gain new tools and capabilities that simply were not possible with a locked-down database, and companies will gain a host of new business opportunities.

What's striking about the current discussions swirling around the OCLC saga is that they are being conducted in something of a vacuum, despite the fact that open source has a rich store of relevant experience that librarians could usefully refer to. The only problem is that little of it is to be found in books.

Glyn Moody writes about openness at opendotdotdot.

Load Disqus comments