408 The CMS Collaboration / Physics Letters B 767 (2017) 403–430
and H
miss
T
[90], although the acceptances differ. An advantage of
the γ + jets process is its much larger production cross section
compared to the Z → νν +jets process.
In the case of events with N
b
≥ 2, the μ + jets sample is also
used to estimate the small Z → νν + jets background because of
the limited event counts in the μμ +jets and γ +jets control sam-
ples.
The method relies on the use of W → μν +jets events to pre-
dict
the Z →μμ +jets background [25,27,28]. The method corrects
for tt contamination in the μ + jets sample, which can be signifi-
cant
in the presence of jets identified as originating from b quarks.
However, while the tt contamination increases with increasing N
b
,
the Z → μμ +jets background is reduced to a sub-dominant level
relative to other backgrounds. The method is validated in data con-
trol
regions defined by samples of events categorised according to
N
b
. In summary, only the μ + jets sample is used to estimate the
total SM background for events with N
b
≥2, whereas all three data
control samples are used for events with N
b
≤1.
To maximise sensitivity to new-physics signatures with a large
number of b quarks, a method is employed that allows event yields
for a given b quark jet multiplicity to be predicted with a higher
statistical precision than obtained directly from simulation, partic-
ularly
for events with a large number of b quark jets (N
b
≥2) [28].
The method relies on generator-level information contained in the
simulation to determine the distribution of N
b
for a sample of
events categorised according to N
jet
and H
T
. First, simulated events
are categorised according to the number of jets per event that
are matched to underlying b quarks (N
gen
b
), c quarks (N
gen
c
), and
light-flavoured quarks or gluons (N
gen
q
). Second, the efficiency
with which b quark jets are identified, and the misidentification
probabilities for c quarks and light-flavour partons, f
c
and f
q
, re-
spectively,
are also determined from simulation, with each quantity
averaged over jet p
T
and η per event category. Corrections to ,
f
c
, and f
q
are applied on a jet-by-jet basis as a function of p
T
and η so that they match the corresponding quantity measured in
data [71]. Finally, N
tag
b
, N
tag
c
, and N
tag
q
are, respectively, the num-
ber
of jets identified (“tagged”) as originating from b quarks per
event when the underlying parton is a b quark, c quark, or a light-
flavoured
quark or gluon, and P(N
tag
b
; N
gen
b
, ), P(N
tag
c
; N
gen
c
, f
c
),
and P(N
tag
q
; N
gen
q
, f
q
) are the binomial probabilities for this to hap-
pen.
These quantities are sufficient to estimate how events are
distributed according to N
b
per (N
jet
, H
T
) category when sum-
ming
over all relevant combinations that satisfy the requirements
N
jet
= N
gen
b
+ N
gen
c
+ N
gen
q
and N
b
= N
tag
b
+ N
tag
c
+ N
tag
q
.
The
event yields determined with the method described above
are subsequently used to determine the transfer factors binned ac-
cording
to N
b
(in addition to N
jet
and H
T
). The uncertainties in the
transfer factors obtained from simulation are evaluated through
sets of closure tests based on events from the data control re-
gions [28].
Each set uses the observed event counts in up to eleven
bins in H
T
for a given sample of events, along with the corre-
sponding
(H
T
-dependent) transfer factors obtained from simula-
tion,
to determine H
T
-dependent predictions N
pred
(H
T
) for yields
in another event sample. The two samples are taken from differ-
ent
data control regions, or are subsets of the same data control
sample with differing requirements on N
jet
or N
b
. The predic-
tions
N
pred
(H
T
) are compared with the H
T
-binned observed yields
N
obs
(H
T
) and the level of closure is defined by the deviation of
the ratio (N
obs
− N
pred
)/N
pred
from zero. A large number of tests
are performed to probe key aspects of the modelling that may in-
troduce
an N
jet
-or H
T
-dependent source of bias in the transfer
factors [28].
Systematic
uncertainties are determined from core sets of clo-
sure
tests, of which the results are shown in Fig. 2. Five sets of
tests are performed independently for each of the two N
jet
cat-
egories,
and a further three sets that are common to both N
jet
Fig. 2. Ratio (N
obs
− N
pred
)/N
pred
as a function of H
T
for different event categories
and/or control regions for (upper) events with two or three jets, and (lower) events
with four or more jets; “b tag” refers to a reconstructed b quark candidate. Error
bars represent statistical uncertainties only, while the grey shaded bands represent
the N
jet
-and H
T
-dependent uncertainties assumed in the transfer factors, as deter-
mined
from the procedure described in the text.
categories. The tests aim to probe for the presence of statistically
significant biases that could arise due to limitations in the method.
For each N
jet
category, the first three sets of closure tests are per-
formed
using the μ + jets sample. The first set probes the mod-
elling
of the α
T
distribution for events containing genuine
p
miss
T
from neutrinos (open circle markers). Two sets (crosses, squares)
probe the relative composition between W + jets and top events
and the modelling of the reconstruction of b quark jets. The fourth
set (triangles) validates the modelling of vector boson production
by connecting the μ + jets and μμ +jets control samples, which
are enriched in W + jets and Z + jets events, respectively. The
fifth set (swiss crosses) deals with the consistency between the
γ + jets and μμ + jets samples, which are both used to provide
an estimate of the Z → μμ + jets background. Three further sets
of closure tests (stars, inverted triangles, diamonds), one per data
control sample, probe the simulation modelling of the N
jet
distri-
bution
for a range of background compositions.
The
closure tests reveal no significant biases or dependency on
N
jet
nor H
T
. Systematic uncertainties in the transfer factors are
determined from the variance in (N
obs
− N
pred
)/N
pred
, weighted
to account for statistical uncertainties, for all closure tests within
an individual H
T
bin in the range 200 < H
T
< 375 GeV and for
each N
jet
category. For the region H
T
> 375 GeV, all tests within
200 GeV-wide intervals in H
T
, defined by pairs of adjacent bins,
are combined to determine the systematic uncertainty, which is as-
sumed
to be fully correlated for bins within each interval, and fully
uncorrelated for different H
T
intervals and N
jet
categories. The
magnitudes of the systematic uncertainties are indicated by shaded
grey bands in Fig. 2 and summarised in Table 2. The same (uncor-
related)
value of systematic uncertainty is assumed for each N
b
category. An independent study is performed to assess the effect
of uncertainties in the simulation modelling of the efficiency and