模糊射流：红外安全聚类算法在高能物理实验中的应用与优势

32 浏览量更新于2024-07-16 收藏 1.2MB PDF 举报

模糊射流是一种创新的聚类算法，应用于高能物理学实验中的粒子束处理，特别是在大型强子对撞机（LHC）的实验研究中。传统的聚类方法通常采用顺序重组（Sequential Recombination）这样的层次聚类策略，但模糊射流引入了新的思路，将红外和共线安全的混合模型融入其中。这些混合模型通过最大似然技术进行聚类，使得射流的特性，如大小，能够动态地适应数据，提供更丰富的信息。不同于传统的凝聚式聚类，模糊射流允许射流的边界变得模糊，而非严格的分割，从而增加了对喷嘴（jet）定义的灵活性。这种模糊性被量化为一种附加信息，尤其是在增强型拓扑分析中，它能够提升对喷嘴标签变量的理解，有助于更精确地解析复杂粒子轨迹。研究还探讨了堆积效应，即在高堆积交互作用多重性情况下，粒子密集区域的处理问题。模糊射流展示了其在这些极端条件下的稳定性，通过微调算法，能够在保持聚类性能的同时，有效应对数据密集带来的挑战。这一发现对于理解和解释高能物理实验结果，尤其是在寻找新物理现象时，具有重要意义。论文《模糊射流》由Lester Mackey、Benjamin Nachman、Ariel Schwartzman和Conrad Stansbury等人合作完成，他们在2016年6月发表于《Journal of High Energy Physics》(JHEP06)，并在Springer平台上开放获取。他们的工作不仅推动了高能物理数据分析技术的进步，也为其他领域的聚类问题提供了新的可能，尤其是在处理大量复杂数据集时，模糊射流算法展现出了强大的实用价值。

JHEP06(2016)010

every particle can belong to every jet with some probability.

This can be seen explicitly

in ﬁgure 1 where the densities of all three clusters are everywhere nonzero, so q

> 0 for all

j. The idea of probabilistic membership was recently studied in the context of the Q-jets

algorithm [18] in which the same event is interpreted many times by injecting randomness

into the clustering procedure. Unlike Q-jets, fuzzy jets allocates the soft membership

functions deterministically throughout the clustering procedure. However, like Q-jets, there

is an ambiguity in how to assign kinematic properties to the clustered jets. Fuzzy jets are

deﬁned by their shape (and location), not their constituents. This is in contrast to anti-k

jets, which are deﬁned by their constituents without an explicit shape determined from the

clustering procedure. One simple assignment scheme is to deﬁne the momentum of a fuzzy

jet j as

jet j

i=1

(

1 j = argmax

0 else

)

. (2.2)

In other words, this procedure assigns every particle to its most probable associated jet.

This scheme will be known as the hard maximum likelihood (HML) scheme, but is not

the only possible assignment algorithm. The dual problem in sequential recombination

is the jet area, which must be deﬁned [19], whereas the jet kinematics are the ‘natural’

coordinates.

We now specialize the likelihood in eq. (2.1) to the case of clustering particles into jets

at a collider like the LHC. Consider a mixture model in two dimensions

with x

= ρ

. The

resulting mixture model (MM) jets are inherently not IR safe: particle p

does not appear

in the likelihood and therefore arbitrarily low energy particles can inﬂuence the clustering

procedure. Therefore, we add a modiﬁcation to the log likelihood:

log L({p

T,i

, ρ

}|θ) =

i=1

T,i

log





j=1

f(ρ

|θ

)





, (2.3)

where α is a weighting factor. Equation (2.3) is the log of eq. (2.1) with the term p

T,i

inserted in the outer sum. For α > 0, the resulting modiﬁed mixture model (mMM) jets

are IR safe, and when α = 1, the jets are C safe. Therefore, for α = 1, the jets are IRC safe.

Diﬀerent choices of component densities f in eq. (2.3) give rise to diﬀerent IRC safe MM jet

algorithms. We have studied several possibilities for f, but for the remainder of this paper

will specialize to (wrapped

) Gaussian

f = Φ. The resulting fuzzy jets are called modiﬁed

Gaussian Mixture Model jets (mGMM) and are parameterized by the locations µ

, the

covariance matrices Σ

, and the cluster weights π

. We initialize π

= 1/k and Σ

= I.

Since practical procedures for maximizing the modiﬁed likelihood in eq. (2.3) may

converge to stationary points that are not globally optimal, the output of a fuzzy jet

Soft assignments for jets during clustering was studied in the context of the “optimal jet ﬁnder” [17]

which maximizes a function of the soft assignments.

One must take care in selecting a class of densities appropriate for the angular quantity φ. For more

details on the wrapped Gaussian distribution and motivation for its use in this context, see appendix A.

When f is a circular step function, the algorithm is related to the Snowmass iterative cone algorithm [20]

via the ‘Snowmass Potential’ [21].

– 4 –

JHEP06(2016)010

algorithm will depend on an initial setting of the cluster parameters θ and π. One simple

procedure, used exclusively for the rest of the paper, is to seed fuzzy jets based on the

output of a sequential recombination jet algorithm. This guarantees an IRC safe initial

condition and therefore the entire procedure is IRC safe. We now discuss practically how

one can ﬁnd the maximum of the fuzzy jets likelihood.

3 Clustering fuzzy jets: the EM algorithm

One iterative procedure for maximizing the mixture model likelihood in eq. (2.1) is the

Expectation-Maximization (EM) algorithm [22–24]. After initializing the cluster locations

and prior density π, the following two steps are repeated:

Expectation Given the current values of θ

, compute the fuzzy membership probabilities

= π

Φ(~ρ

|µ

, Σ

Φ(~ρ

|µ

, Σ

Maximization Given q

, maximize the expected modiﬁed complete log likelihood over the

parameters π, µ, Σ.

The expected modiﬁed complete log likelihood has the form

i=1

j=1

T i

log Φ(~ρ

; ~µ

, Σ

) + q

log π

). (3.1)

Note that the expected modiﬁed complete log likelihood is not the same as the expected

modiﬁed log likelihood, shown in eq. (2.3). They diﬀer in that the complete log likelihood

has the second sum outside the logarithm while eq. (2.3) has the sum inside the logarithm.

The power of the EM algorithm is that maximizing the complete log likelihood results in

ﬁxed point iteration to monotonically improve the original log likelihood. This desirable

property of the EM algortihm is still true when α > 0; for a proof, see appendix B. Many

choices for f have closed form maxima for the M step; in the Gaussian f = Φ case outlined

above, the updates are given by

∗

i=1

˜q

∗

i=1

˜q

− µ

)(x

− µ

)

∗

i=1

T i

i=1

T i

˜q

, (3.2)

where ˜q

= q

T i

l=1

T l

. The well-known k-means clustering algorithm [25] can

be recovered as the limit of expectation-maximization in a Gaussian mixture model with

Σ = σ

I, σ

→ 0. Figure 2 illustrates GMM clustering using the EM algorithm with k = 2

clusters. The EM algorithm readily accommodates constraints on the model parameters.

One constraint we will consider throughout the rest of the paper is Σ

= σ

I for all j,

which requires the curves of constant likelihood in (y, φ) to be circular. We will see in the

next section that the learned value of σ

is useful for distinguishing jets originating from

diﬀerent physics processes. Note that since the modiﬁed complete log likelihood is IRC

safe, the EM algorithm does not break the IRC safety of the original log likelihood.

– 5 –

剩余25页未读，继续阅读

weixin_38690830

粉丝: 4

模糊射流：红外安全聚类算法在高能物理实验中的应用与优势

气体射流冲击干燥系统温度模糊PID控制研究

三维聚能射流数值模拟研究：模糊方法在界面处理中的应用

带电水冲洗射流压力控制新技术及其方法研究

气体射流冲击干燥烘焙系统温度控制研究

聚能射流三维数值模拟 (2005年)

火箭燃气射流中颗粒衍射散射对莫尔偏折图的影响

回转窑火焰图像煤粉区分割：光照补偿与模糊增强策略

射流管式伺服阀设计与应用从零开始

提升控制精度：射流管式伺服阀调试方法详解

液压伺服控制基础与射流管式两级电液伺服阀：入门篇

最新资源