Skip to content

Conditions for the Digital Library of Alexandria

I have been in the middle of a major rethink of search engines’ efforts to digitize books. As it started I was ebulliently enthusiastic–I even wrote an article celebrating their potential to tame information overload. But major research librarians have been raising some important questions about search engines’ practices:

Several major research libraries have rebuffed offers from Google and Microsoft to scan their books into computer databases, saying they are put off by restrictions these companies want to place on the new digital collections. The research libraries, including a large consortium in the Boston area, are instead signing on with the Open Content Alliance [OCA], a nonprofit effort aimed at making their materials broadly available.

As the article notes, “many in the academic and nonprofit world are intent on pursuing a vision of the Web as a global repository of knowledge that is free of business interests or restrictions.”

As noble as I think this project is, I doubt it can ultimately compete with the monetary brawn of a Google. And why should delicate old books get scanned 3 or 4 times by duplicative efforts of Google, Microsoft, the OCA, and who knows what other private competitor? I also worry that a fragmented archiving system might create a library of Babel. So what is to be done?

My new position is: leverage current copyright challenges to Google’s book search program to guarantee that it serves the public interest. Here’s how that might work:

Google’s plans to scan and index hundreds of thousands of copyrighted books have provoked extraordinary public controversy and private litigation. This project aims to archive and provide text-based indexing for an enormous number of books. Google’s scanning of copyrighted books is prima facie infringement, but Google is presently asserting a fair use defense. The debate has largely centered on the rival property rights of Google and the owners of the copyrights of the books it would scan and edit.

Given Google’s alliance with some of the leading libraries in the world, journalistic narratives have largely portrayed the Google Book Search project as an untrammeled advance in public access to knowledge. However, other libraries are beginning to question the restrictive terms of the contracts that Google strikes when it agrees to scan and create a digital database of a library’s books. While each library is guaranteed access to the books it agrees to have scanned, it is not guaranteed access to the entire index of scanned works.

Those restrictive terms foreshadow potential future restrictions on and tiering of their book search services. Well-funded libraries may pay a premium to gain access to all sources; lesser institutions may be left to scrounge among digital scraps. If permitted to become prevalent, such tiered access to information would threaten to rigidify and reinforce existing inequalities in access to knowledge, and life chances. Such tiering divides society into two groups–those who can afford to access the information, and those who cannot. To the extent that the latter group’s relative poverty is not its own fault, information tiering inequitably subjects it to yet another disadvantage, whereby others’ wealth can be leveraged into status, educational, or occupational advantage.

Given the diciness of the fair use case for projects like Google Book Search, courts should condition the legality of such archiving of copyrighted content on universal access to the contents of the resulting database. Landmark cases like Sony v. Universal have set a precedent for taking such broad public interests into account in the course of copyright litigation. Given the importance of “commerciality” in the first of the four fair use factors, suspicion of tiered access could also be figured into that prong of the test. A more ambitious (if less likely) solution would require Congress to set such terms in a legislative settlement of the issue.

However the matter is ultimately settled, any outcome in favor of dominant categorizers should be conditioned on their maintaining open access to search results. Such a condition would help assure that the type of “tiered access” common for legal resources would not further pervade the networked world. If Google’s proposed extension of the fair use defense succeeds, such a holding should be limited to current versions of the services that conduce to a common informational infrastructure. To the extent it or other search engines limit access to parts of their index, their public-spirited defenses of their archiving and indexing projects are suspect.

PS: For more thoughts on the future of digital archiving, see Diane Leenheer Zimmerman’s Can Our Culture Be Saved?

PPS: This is crossposted from Co-Op, and is part of a series, which starts here.