‘missing heads’ [10]. We favor the term truncated over the term incomplete as the latter
is often used for the concept of ‘event log incompleteness’, referring to the fact that an
event log will most likely not contain all the combinations of behaviors that are possible
because there are too many of them [12]. For instance, when there is a loop in the
process model, the number of unique combinations is infinite. Event logs will most
likely be incomplete while they may not contain truncated traces.
截断的轨迹是指一个正在进行的轨迹,其过程的终点被遗失。截断的轨迹有时被
称为 "不完整的案例"[7,8],"不完整的轨迹"[5],或 "丢失的头"[10]。我们更倾
向于使用截断的术语而不是不完整的术语,因为后者经常用于 "事件日志不完整
"的概念,指的是一个事件日志很可能不包含所有可能的行为组合,因为有太多
的行为组合了[12]。例如,当过程模型中有一个循环时,独特组合的数量是无限
的。事件日志很可能是不完整的,而它们可能不包含截断轨迹。
There are several reasons to explain the existence of incomplete traces. They might
exist because of a flawed event log extraction process that cuts the traces at a fixed date,
leaving the traces that finish after truncated. This issue–named ‘the snapshots
challenge’–has been identified by van der Aalst as one of the five challenges that occurs
when extracting event logs [6, chapter 5.3]. This type of truncated trace could be
avoided by extracting only the traces where no event happens after the extraction date.
However, once the data is extracted, we cannot know which traces are truncated. As
another example, incomplete traces can exist because the events have not happened yet.
This is especially relevant when working with streaming data. Finally, truncated traces
can result from a wrong execution (e.g., the ticket was supposed to be closed but the
agent forgot to do it) or when the information system fails. In the next section, we
introduce a classifier to automatically detect truncated traces.
有几个原因可以解释不完整轨迹的存在。它们的存在可能是由于一个有缺陷的事
件日志提取过程,在一个固定的日期切断了痕迹,留下了截断后的痕迹。这个问
题--被称为 "快照挑战"--已经被 van der Aalst 确定为提取事件日志时出现的五个
挑战之一[6, 5.3 章]。这种类型的截断轨迹可以通过只提取在提取日期后没有事件
发生的跟踪来避免。然而,一旦数据被提取出来,我们就无法知道哪些痕迹被截
断了。另一个例子是,不完整的轨迹可能存在,因为事件还没有发生。这在处理
流数据时尤其重要。最后,截断的痕迹可能是由于错误的执行(例如,票据应该
被关闭,但是代理忘记了),或者当信息系统发生故障时。在下一节中,我们将
介绍一个分类器来自动检测截断轨迹。
3 Truncated Trace Classifier
3 截断的轨迹分类器
A TTC inputs the current execution of a trace and predicts whether it is truncated. As
shown in Table 1, we generate one input sample and one target for each prefix length
of each trace. The input sample represents the current state of the process on which we
apply a TTC. The target is a binary label that is ‘true’ when the trace is truncated or
‘false’ otherwise.
一个 TTC 输入一个轨迹的当前执行情况,并预测它是否被截断。如表 1 所示,
我们为每个轨迹的每个前缀长度生成一个输入样本和一个目标。输入样本代表我
们应用 TTC 的进程的当前状态。目标是一个二进制标签,当跟踪被截断时为 "
真",否则为 "假"。