Home' Technology Review : May 2005 Contents TECHNOLOGY REVIEW
But a digital book needn t be loaned out to be shared. And Ox-
ford s various libraries have already created digital images of
many of their greatest treasures, from ninth-century illuminated
Latin manuscripts to 19th-century children s alphabet books.
Most of these images can be examined at high resolution on the
Web. The only catch is that scholars have to know what they re
looking for in advance, since very few of the digital pages are
searchable. Optical character recognition (OCR) technology can-
not yet interpret handwritten script, so exposing the content of
these books to today s search engines requires typing their texts
into separate les linked to the original images. A three-person
team at Oxford, in collaboration with librarians at the University
of Michigan and 70 other universities, is doing just that for a large
collection of early English books, but the entire e ort produces
searchable text for only 200 books per month. At that rate, mak-
ing a million books searchable would take more than 400 years.
That s where Google s resources will make a di erence.
Susan Wojcicki, a product manager at Google s Mountain View,
CA, campus and leader of the Google Print project, puts it bluntly:
"At Google we re good at doing things at scale."
Google has already copied and indexed some eight billion
Web pages, which lends credibility to its claim that it can digitize
a big chunk of the 60 million volumes (counting duplicates) held
by Harvard, Oxford, Stanford, the University of Michigan, and
the New York Public Library in a matter of years. It will be a
complex task, but one that is in some ways familiar for the com-
pany. "It s not just feeding the books into some kind of digitiza-
tion machine, but then actually taking the digital les, moving
those les around, storing them, compressing them, OCR-ing
them, indexing them, and serving them up," points out Wojcicki.
"At that point it becomes similar to all of Google s other busi-
nesses, where we re managing large amounts of data." But the
entire project, Wojcicki admits, hinges on those digitization ma-
chines: a eet of proprietary robotic cameras, still under devel-
opment, that will turn the digitization of printed books into a
tr ue assembly-line process and, in theory, lower the cost to about
$10 per book, compared to a minimum of $30 per book today.
Neither Google nor its partner libraries have announced ex-
actly how the process will work. But John Wilkin, associate uni-
versity librarian at the University of Michigan, says it will go
something like this: "We put a whole shel ul of books onto a
cart, keeping the order intact. We check them out by waving them
under a bar code reader. Overnight, software takes all the bar
codes, extracts machine-readable records from the university s
electronic catalogue, and sends the records to Google, so they
can match them with the books. Then we move the cart into
Google s operations room."
This room will contain multiple workstations so that several
books can be digitized in parallel. Google is designing the ma-
chines to minimize the impact on books, according to Wilkin.
"They scan the books in order and return the cart to us," he con-
tinues. "We check them back in and mark the records to show
they ve been scanned. Finally, the digital les are shipped in a
raw format to a Google data center and processed to produce
something you could use."
The Book Web
Exactly how readers will be able to use the material, however, is
still a bit foggy. Google will give each participating library a copy
of the books it has digitized while keeping another for itself. Ini-
tially, Google will use its copy to augment its existing Google
Print program, which mixes relevant snippets from recently pub-
lished books into the usual results returned by its Web search
tool. A user who clicks on a Google Print result is presented with
an image of the book page containing his or her keyword, along
with links to the sites of retailers selling the print version of the
book and keyword-related ads sold to the highest bidders
through Google s AdSense program.
Does it bother librarians that Moby-Dick might be ser ved up
alongside an ad for the latest Moby CD? "To say we haven t wor-
ried about it would be wrong," says Wilkin. "But Google has a
good citizen pro le. The way they use AdSense doesn t trouble
me. And if suddenly access were controlled, and there was a cost
to view the materials, we could still o er them for free ourselves,
or at least the out-of-copyright materials."
In fact, Google may put the entire texts of these public-domain
materials online itself. In the future, Google could even use those
materials to create a kind of literary equivalent of the Web, says
Wojcicki. "Imagine taking the whole Harvard library and saying,
Tell me about every book that has this speci c person in it. That
in itself would be very powerful for scholars. But then you could
start to see linkages between books"---that is, which books cite
other books, and in what contexts, in the same way that websites
refer to other sites through hyperlinks. "Just imagine the power
that that would bring!"
(Wojcicki s example shows how history can, indeed, come
full circle. Google founders Larry Page and Sergey Brin devel-
No one thinks the library is disappearing
as a physical space. The real question is,
what's the 'value proposition' they offer
in a digital future?" ---Abby Smith
Links Archive June 2005 April 2005 Navigation Previous Page Next Page