4
The third event, timeout, is the problematic one: this
event is actually never emitted by http.IncomingMessage
objects, so the listener on line 10 is dead code. There is
a timeout event on http.ClientRequest, however, so
presumably the event should have been registered on req,
not res. We contacted the author of min-req-promise,
who confirmed our analysis of the issue.
Note that there are no compile-time or runtime diag-
nostics to alert the developer to this problem: not only is it
very difficult to infer precise types for variables in JavaScript
in general, but there is not even anything semantically
wrong with registering a handler for a timeout event on
http.ClientRequest. While the http library will never
emit this event, client code could do so itself by calling the
emit method (although in this case it does not). Moreover,
since dead-listener bugs do not cause a crash at runtime,
they may go undetected for a long time: in the case of
min-req-promise, the bug had been present since its initial
version (released in March 2018).
At present, the only way for a developer to detect this
sort of problem is to carefully reason about types and the
events they support (as we have done above), or to write
extensive unit tests to ensure all events are handled as
expected. In the above example, this would require adding
a test involving a request that times out, which is an edge
case that is easy to overlook.
Clearly, a more automated approach is desirable.
2.3 Automatically detecting dead listeners
We have argued that the dynamic nature of the JavaScript
event-driven APIs makes it unrealistic to detect dead listen-
ers at runtime. However, an approach based on static analy-
sis faces the usual dilemma of having to trade off precision
against performance: an imprecise analysis is likely to report
many false positives, while a very precise analysis will not
usually scale to realistic code bases.
Ideally, a static analysis would analyze client code as
in Figure 2 along with the implementation of the Node.js
standard libraries and any other third-party libraries it
depends on, derive a precise model of which types support
which events, and then flag dead listeners based on this
information. In practice, we know of no static analyzer for
JavaScript precise enough to derive such a model that scales
to the size and complexity of the libraries involved. As a
comparatively benign example, the Node.js http package
transitively depends on more than 60 modules, for a total of
around 20,000 lines of code. While this is quite manageable
for, say, type inference or taint tracking, it is out of reach for
techniques that precisely model event dispatch, such as that
of Madsen et al. [7].
The usual answer is to instead provide the analysis
with simplified models of the libraries involved. This is
indeed a good approach for frequently used and well-
documented packages like http, but the modern JavaScript
library landscape is vast, with npm alone hosting well over
one million packages. While many of these are very rarely
used, the number of popular packages is still too large to
allow manual modeling, especially since packages tend to
go in and out of style quite frequently.
2.4 Approach
Our proposed solution to this dilemma is to turn the
size of the JavaScript ecosystem to our advantage in a
two-step approach illustrated in Figure 1: first, we mine
large amounts of open-source code from GitHub and other
hosting platforms for real-world examples of event-listener
registrations; then we perform a statistical analysis to deter-
mine whether a certain pattern is rare and hence suggestive
of incorrect API usage, or whether is common and therefore
likely to be a correct use. This allows us to automatically
derive models instead of writing them by hand.
In the next two sections we explain the data mining and
classification steps in more detail.
3 DATA MINING
The mining step is implemented as a simple, context and
flow-insensitive static analysis that finds event-listener reg-
istrations and records them as listener-registration pairs of
the form ha, ei where a represents the object on which
the listener is registered, and e the event for which it is
registered.
It is important that both a and e are represented in a
code base-independent way to enable the classification step
to meaningfully collate results obtained on many different
code bases.
For events, this is easy: e is the event name annotated
with the emitter package. For instance, timeout events on
a’s rooted in the http package are considered to be different
from timeout events rooted in the process package. This
is important, as events with the same name in different
packages may behave differently.
To represent objects, we use a notion of access paths sim-
ilar to the one proposed by Mezzetti et al. [8]: starting from
an import of a package, the access path records a sequence
of property reads, method calls and function parameters
that need to be traversed to reach a particular point in
the program. More precisely, a conforms to the following
grammar:
a ::= require(m) an import of package m
| a.f property f of an object repre-
sented by a
| a() result of a function represented
by a
| a(i) ith argument of function repre-
sented by a
| a
new
() instance of a class represented by
a
Note that access paths are always rooted at a package
import, so we can always tell which package any program
element derives from.
For instance, in Figure 2, the access path associated
with the variable req would be require(http).request(),
meaning that req is initialized to the result of calling the
method request on the result of importing the http mod-
ule.
2
2. Note that the argument to the request method is not recorded
in the access path; see the Discussion subsection below for more on this
point.