Home' Technology Review : March April 2008 Contents FEATURE STORY 79
WWW. TECHNOLOGYREVIEW. COM
matching visual elements thus gathers steam until a whole path can
be re-created from those paving stones. The more images the sys-
tem starts with, the more realistic the result, especially if the original
pictures were taken from a variety of angles and perspectives.
That's because the second computational exercise, Snavely says,
is to compare images in which shared features are depicted from
di erent angles. "It turns out that the first process aids the second,
giving us information about where the cameras must be. We're able
to recover the viewpoint from which each photo was taken, and
when the user selects a photo, they are taken to that viewpoint." By
positing a viewpoint for each image---calculating where the cam-
era must have been when the picture was taken---the software can
mimic the way binocular vision works, producing a 3-D e ect.
As Szeliski knew, however, the human eye is the most fickle of
critics. So he and his two colleagues sought to do more than just
piece smaller parts into a larger whole; they also worked on transi-
tion e ects intended to let images meet as seamlessly as possible.
The techniques they refined include dissolves, or fades, the char-
acteristic method by which film and video editors blend images.
In a demo that showed the Trevi Fountain in Italy, Photo Tourism
achieved a stilted, rudimentary version of what Photosynth would
produce: a point cloud assembled from images that represent dif-
ferent perspectives on a single place. More impressive was the soft-
ware's ability to chug through banks of images downloaded from
Flickr based on descriptive tags---photos that, of course, hadn't been
taken for the purpose of producing a model. The result, Szeliski
remembers, was "surprising and fresh" even to his veteran's eyes.
"What we had was a new way to visualize a photo collection, an
interactive slide show," Szeliski says. "I think Photo Tourism was
surprising for di erent reasons to insiders and outsiders. The
insiders were bewildered by the compelling ease of the experience."
The outsiders, he says, could hardly believe it was possible at all.
And yet the Photo Tourism application had an uncertain future.
Though it was a technical revelation, developed in Linux and able
to run on Windows, it was still very much a prototype, and the road
map for developing it further was unclear.
In the spring of 2006, as Snavely was presenting Photo Tourism
at an internal Microsoft workshop, Blaise Agüera y Arcas, then a
new employee, walked by and took notice. He had arrived recently
thanks to the acquisition of his company, Seadragon, which devel-
oped a software application he describes as "a 3-D virtual memory
manager for images." Seadragon's eye-popping appeal lay in its
ability to let users load, browse, and manipulate unprecedented
quantities of visual information, and its great technical achieve-
ment was its ability to do so over a network. (Photosynth's ability
to work with images from Flickr and the like, however, comes from
technology that originated with Photo Tourism.)
Agüera y Arcas and Snavely began talking that day. By the sum-
mer of 2006, demos were being presented. The resulting hybrid
product---part Photo Tourism and part Seadragon---aggregates
a large cluster of like images (whether photos or illustrations),
weaving them into a 3-D visual model of their real-world subject.
It even lends three-dimensionality to areas where the 2-D photos
come together. Each individual image is reproduced with perfect
fidelity, but in the transitions between them, Photosynth fills in
the perceptual gaps that would otherwise prevent a collection of
photos from feeling like part of a broader-perspective image. And
besides being a visual analogue of a real-life scene, the "synthed"
model is fully navigable. As Snavely explains,"The dominant mode
of navigation is choosing the next photo to visit, by clicking on
controls, and the system automatically moving the viewpoint in
3-D to that new location. A roving eye is a good metaphor for this."
The software re-creates the photographed subject as a place to be
appreciated from every documented angle.
Photosynth's startling technical achievement is like pulling a
rabbit from a hat: it produces a lifelike 3-D interface from the 2-D
medium of photography. "This is something out of nothing," says
Alexei A. Efros, a Carnegie Mellon professor who specializes in
computer vision. The secret, Efros explains, is the quantity of photo-
graphs. "As you get more and more visual data, the quantity becomes
quality," he says. "And as you get amazing amounts of data, it starts
to tell you things you didn't know before." Thanks to improved pat-
tern recognition, indexing, and metadata, machines can infer three-
dimensionality. Sooner than we expect, Efros says, "vision will be
the primary sensor for machines, just as it is now for humans."
WHAT IT MIGHT BECOME
Microsoft's work on Photosynth exemplifies the company's strategy
for the 100-person-strong Live Labs. Part Web-based skunk works,
part recruiting ground for propeller-heads for whom the corporate
parent is not a good fit, Live Labs aims in part to "challenge what
people think Microsoft is all about," says Gary Flake, a 40-year-
old technical fellow who is the lab's founder and director. Its more
immediate aim is to bring Web technologies to market.
Photosynth's startling technical achievement is like pulling
a rabbit from a hat: it produces a lifelike 3-D interface from
the 2-D medium of photography.
Links Archive May June 2008 January February 2008 Navigation Previous Page Next Page