The number of fake or non-prompt leptons satisfying the tight criteria can then be
calculated by inverting the matrix defined by the two equations:
N
l
= N
l
r
+ N
l
f
, N
t
= ε
r
N
l
r
+ ε
f
N
l
f
,
where N
l
(N
t
) is the number of events in data satisfying the loose (tight) lepton selection,
N
l
r
(N
l
f
) is the number of events with a real prompt (fake or non-prompt) lepton in the
loose lepton sample, and ε
r
(ε
f
) is the efficiency for these events to fulfil the tight lepton
selection. By generalizing the resulting formula to extract ε
f
N
l
f
, a weight is assigned to
each event selected in the loose lepton data sample, providing a prediction for both the
yields and the kinematic distributions of the fake and non-prompt lepton background.
When applying the matrix method in the case of high jet and b-tagged jet multiplicities,
the number of events in data satisfying the loose and tight lepton selections is significantly
reduced, leading to large fluctuations in the background predictions. In order to mitigate
this problem, instead of tagging the jets by applying the b-tagging algorithm, their proba-
bilities to be b-tagged are parameterized as a function of the jet p
T
. This allows all events
in the sample before b-tagging is applied to be used in predicting the normalization and
shape of the background from fake or non-prompt leptons after b-tagging. The tagging
probabilities are derived using an inclusive sample of fake or non-prompt leptons and the
resulting predictions of this background estimate are in agreement with those obtained by
applying the b-tagging algorithm and have greatly reduced statistical uncertainties.
In the dilepton channel, the background contribution from fake or non-prompt leptons
is very small and is estimated from simulation and normalized to data in a control region
with two same-charge leptons.
5 Event categorization
Events satisfying the object selection are categorized into analysis regions according to the
number of leptons, jets and b-tagged jets. The regions enhanced in signal H → aa → 4b
events relative to the backgrounds are referred to as signal regions (SRs). The other regions,
referred to as control regions (CRs), are used to constrain the background predictions and
related systematic uncertainties (see section 7) through a profile likelihood fit to the data
(see section 8). The signal and backgrounds are derived consistently in the signal and
control regions in a combined fit. The discrimination of signal from background is further
enhanced in the signal regions by using multivariate techniques, as described in section 6.
The H → aa → 4b decay chain is expected to have multiple b-tagged jets, often three
or four, satisfying the object selection. The dominant background arises from t
¯
t events in
the single-lepton channel and Z + jets events in the dilepton channel, which can also have
different jet and b-tagged jet multiplicities or leptons of different flavour in the case of the
dilepton channel. The regions are referred to as (n
`
`, n
j
j, n
b
b) indicating n
`
leptons, n
j
selected jets and n
b
b-tagged jets. The SRs contain at least three b-tagged jets and are (1`,
3j, 3b), (1`,4j, 3b) and (1`, 4j, 4b) for single-lepton events, and (2`, 3j, 3b), (2`, ≥4j, 3b)
and (2`, ≥4j, ≥4b) for same-flavour dilepton events. The CRs are (1`, 3j, 2b), (1`, 4j, 2b),
(1`, ≥5j, 3b) and (1`, ≥5j, ≥4b) for single-lepton events, (2`, ≥3j, 2b) for same-flavour
– 8 –