Nightcore: Eicient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices ASPLOS ’21, April 19ś23, 2021, Virtual, USA
66
,
106
]. Once built with monolithic architectures, interactive on-
line services are undergoing a shift to microservice architectures [
1
,
4
,
5
,
42
,
47
], where a large application is built by connecting loosely
coupled, single-purpose microservices. On the one hand, microser-
vice architectures provide software engineering benets such as
modularity and agility as the scale and complexity of the application
grows [
36
,
49
]. On the other hand, staged designs for online services
inherently provide better scalability and reliability, as shown in
pioneering works like SEDA [
105
]. However, while the interactive
nature of online services implies an end-to-end service-level objec-
tives (SLO) of a few tens of milliseconds, individual microservices
face more strict latency SLOs ś at the sub-millisecond-scale for leaf
microservices [100, 110].
Microservice architectures are more complex to operate com-
pared to monolithic architectures [
22
,
35
,
36
], and the complexity
grows with the number of microservices. Although microservices
are designed to be loosely coupled, their failures are usually very de-
pendent. For example, one overloaded service in the system can eas-
ily trigger failures of other services, eventually causing cascading
failures [
3
]. Overload control for microservices is dicult because
microservices call each other on data-dependent execution paths,
creating dynamics that cannot be predicted or controlled from the
runtime [
38
,
48
,
88
,
111
]. Microservices are often comprised of ser-
vices written in dierent programming languages and frameworks,
further complicating their operational problems. By leveraging fully
managed cloud services (e.g., Amazon’s DynamoDB [
6
], Elastic-
Cache [
7
], S3 [
19
], Fargate [
12
], and Lambda [
15
]), responsibilities
for scalability and availability (as well as operational complexity)
are mostly shifted to cloud providers, motivating serverless microser-
vices [20, 33, 41, 43ś45, 52, 53].
Serverless Microservices. Simplifying the development and man-
agement of online services is the largest benet of building microser-
vices on serverless infrastructure. For example, scaling the service
is automatically handled by the serverless runtime, deploying a new
version of code is a push-button operation, and monitoring is inte-
grated with the platform (e.g., CloudWatch [
2
] on AWS). Amazon
promotes serverless microservices with the slogan łno server is eas-
ier to manage than no serverž [
44
]. However, current FaaS systems
have high runtime overheads (Table 1) that cannot always meet
the strict latency requirement imposed by interactive microservices.
Nightcore lls this performance gap.
Nightcore focuses on mid-tier services implementing stateless
business logic in microservice-based online applications. These
mid-tier microservices bridge the user-facing frontend and the data
storage, and t naturally in the programming model of serverless
functions. Online data intensive (OLDI) microservices [
100
] repre-
sent another category of microser vices, where the mid-tier service
fans out requests to leaf microservices for parallel data processing.
Microservices in OLDI applications are mostly stateful and memory
intensive, and therefore are not a good t for serverless functions.
We leave serverless support of OLDI microservices as future work.
The programming model of serverless functions expects func-
tion invocations to be short-lived, which seems to contradict the
assumption of service-oriented architectures which expect services
to be long-running. However, FaaS systems like AWS Lambda al-
lows clients to maintain long-lived connections to their API gate-
ways [
8
], making a serverless function łservice-likež. Moreover,
because AWS Lambda re-uses execution contexts for multiple func-
tion invocations [
13
], users’ code in serverless functions can also
cache reusable resources (e.g., database connections) between invo-
cations for better performance [17].
Optimizing FaaS Runtime Overheads. Reducing start-up laten-
cies, especially cold-start latencies, is a major research focus for FaaS
runtime overheads [
57
,
64
,
67
,
89
,
90
,
98
]. Nightcore assumes su-
cient resources have been provisioned and relevant function con-
tainers are in warm states which can be achieved on AWS Lambda
by using provisioned concurrency (AWS Lambda strongly recom-
mends provisioned concurrency for latency-critical functions [
40
]).
As techniques for optimizing cold-start latencies [
89
,
90
] become
mainstream, they can be applied to Nightcore.
Invocation latency overheads of FaaS systems are largely over-
looked, as recent studies on serverless computing focus on data
intensive workloads such as big data analysis [
75
,
95
], video analyt-
ics [
59
,
69
], code compilation [
68
], and machine learning [
65
,
98
],
where function execution times range from hundreds of millisec-
onds to a few seconds. However, a few studies [
62
,
84
] point out that
the millisecond-scale invocation overheads of current FaaS systems
make them a poor substrate for microservices with microsecond-
scale latency targets. For serverless computing to be successful in
new problem domains [
71
,
76
,
84
], it must address microsecond-
scale overheads.
3 DESIGN
Nightcore is designed to run serverless functions with sub
-
millisec-
ond
-
scale execution times, and to eciently process internal func-
tion calls, which are generated during the execution of a serverless
function (not by an external client). Nightcore exposes a serverless
function interface that is similar to AWS Lambda: users provide
stateless function handlers written in supported programming lan-
guages. The only addition to this simple interface is that Nightcore’s
runtime library provides APIs for fast internal function invocations.
3.1 System Architecture
Figure 2 depicts Nightcore’s design which mirrors the design of
other FaaS systems starting with the separation of frontend and
backend. Nightcore’s frontend is an API gateway for serving ex-
ternal function requests and other management requests (e.g., to
register new functions), while the backend consists of a number
of independent worker servers. This separation eases availability
and scalability of Nightcore, by making the frontend API gateway
fault tolerant and horizontally scaling backend worker servers. Each
worker server runs a Nightcore engine process and function con-
tainers, where each function container has one registered serverless
function, and each function has only one container on each worker
server. Nightcore’s engine directly manages function containers
and communicates with worker threads within containers.
Internal Function Calls. Nightcore optimizes internal function
calls locally on the same worker server, without going through the
API gateway. Figure 2 depicts this fast path in Nightcore’s runtime
154