can be easily partitioned both within a request and across different requests.
Similarly, whereas Web email transactions do modify user data, requests from
different users are essentially independent from each other, creating natural units of
data partitioning and concurrency.
● Workload churn—Users of Internet services are isolated from the service’s
implementation details by relatively well-defined and stable high-level APIs (e.g.,
simple URLs), making it much easier to deploy new software quickly. Key pieces of
Google’s services have release cycles on the order of a couple of weeks compared to
months or years for desktop software products. Google’s front-end Web server
binaries, for example, are released on a weekly cycle, with nearly a thousand
independent code changes checked in by hundreds of developers—the core of
Google’s search services has been reimplemented nearly from scratch every 2 to 3
years. This environment creates significant incentives for rapid product innovation
but makes it hard for a system designer to extract useful benchmarks even from
established applications. Moreover, because Internet services are still a relatively
new field, new products and services frequently emerge, and their success with users
directly affects the resulting workload mix in the datacenter. For example, video
services such as YouTube have flourished in relatively short periods and may present
a very different set of requirements from the existing large customers of computing
cycles in the datacenter, potentially affecting the optimal design point of WSCs in
unexpected ways. A beneficial side effect of this aggressive software deployment
environment is that hardware architects are not necessarily burdened with having to
provide good performance for immutable pieces of code. Instead, architects can
consider the possibility of significant software rewrites to take advantage of new
hardware capabilities or devices.
● Platform homogeneity—The datacenter is generally a more homogeneous
environment than the desktop as a target platform for software development. Large
Internet services operations typically deploy a small number of hardware and system
software configurations at any given time. Significant heterogeneity arises primarily
from the incentives to deploy more cost-efficient components that become available
over time. Homogeneity within a platform generation simplifies cluster-level
scheduling and load balancing and reduces the maintenance burden for platforms
software (kernels, drivers, etc.). Similarly, homogeneity can allow more efficient
supply chains and more efficient repair processes because automatic and manual
repairs benefit from having more experience with fewer types of systems. In contrast,
software for desktop systems can make few assumptions about the hardware or
software platform they are deployed on, and their complexity and performance
characteristics may suffer from the need to support thousands or even millions of
hardware and system software configurations.
● Fault-free operation—Because Internet service applications run on clusters of
thousands of machines—each of them not dramatically more reliable than PC-class
hardware—the multiplicative effect of individual failure rates means that some type
of fault is expected every few hours or less (more details are provided in Chapter 6).
As a result, although it may be reasonable for desktop-class software to assume a
fault-free hardware operation for months or years, this is not true for datacenter-