The Google Legacy 59
Chapter Three: Google Technology
elimination of such troublesome jobs as backing up data, Google’s hardware innovations give
it a competitive advantage few of its rivals can equal as of mid-2005.
PageRank with its layering of additional computations added over the years is a software
problem of considerable difficulty. The Google system must find Web pages and perform
dozens, if not hundreds of analyses of those Web pages. Consider the links pointing to a Web
page. Google must keep track of them for more than eight billion Web pages. For a single Web
page with one link pointing to it, the problem is trivial. One link equals one pointer. But what
happens when a site has 10,000 links pointing to it? The problem becomes many times larger
and more computationally demanding. Some of these links are likely to come from sites that
have more traffic than others. Some of the links may come from sites that have spoofed
Google for fun or profit. The calculations to sort out the “value” of each of these links adds to
computational work associated with PageRank. Keeping track of these factors is a big job.
Sizing up different factors against one another for a single page can be hard without a
calculator to help. Take the same task and apply it by a couple of billion Web pages, and the
computing task becomes one for a supercomputer.
Yet this task is everyday stuff for Google and its PageRank process. Users do not give much
thought to what technology underpins a routine query or the 300 million queries Google
handles each day. In a single second, Google’s technology handles around 340 queries in
dozens of languages from users worldwide.
Google’s technology cannot be separated from search. Search was the prime mover in the
Google universe. Once Messrs. Brin and Page were able to fiddle with a limited number of
commodity computers and make their PageRank algorithm work, Google was headed down a
road that it still follows.
The software requires a suitable hardware and network infrastructure in which to operate.
Without Google’s hardware and software, there would be no Google. Hardware and software
are inextricably linked at Google. With each new advance in software, Google’s engineers
must make correspondingly significant advances in hardware. And when hardware engineers
come up with an advance, the software engineers greedily use that advance to up the
functionality of their software.
What Google owns is its own snappy, turbocharged supercomputer, interesting software tools,
and several thousand people trying to figure out what else the Googleplex can do. Some of the
tinkerers come at the problem from bits and bytes, writing code, and weaving applications out
of the available functions. The result is a brilliant product.
Others come at the problem from the soldering iron and screwdriver angle. These engineers
look for ways to build hardware and physical systems that can perform the calculations needed
to make PageRank work. Google’s approach to data centers, the racks in the data centers, and
the devices in the racks in the data centers is as clever as the company’s search system. The
hardware has to be more than clever. The hardware has to work 24x7, under continuous load,
and in locations from Switzerland to Beijing. The synergy between software and hardware is
perhaps one of Google’s major accomplishments.