Fallout：利用微架构漏洞从用户空间读取内核写入的安全威胁

168 浏览量更新于2024-08-25 收藏 480KB PDF 举报

"Fallout: Reading Kernel Writes From User Space" 是一篇于2019年5月29日发布（版本号1905.12701）的计算机科学论文，由 Marina Minkin、Daniel Moghimi等人，分别来自美国密歇根大学、伍斯特理工学院、格拉茨技术大学、imec-DistriNet（KU Leuven）、阿德莱德大学和Data61等机构共同撰写。该研究关注的是现代高性能处理器中的一个重要性能优化——乱序执行（Out-of-Order Execution），这一优化在提升计算效率的同时，也被揭示出可能对系统安全构成重大威胁。乱序执行原本是为了提高处理器性能而设计的一种机制，它允许指令在完成依赖条件满足后无需等待前面指令执行完毕即可立即执行。然而，这种特性在Melttdown攻击中被滥用，使得操作系统内核的信息能够泄漏到用户空间，严重削弱了系统的安全性。为了对抗此类攻击，Intel在其最新的Coffee Lake R处理器中引入了硬件层面的防御措施。然而，作者们发现这些新近的硬件防御措施并不足够有效。他们提出了名为"Fallout"的新颖瞬态执行攻击，这是一种针对此前未被充分探索的微架构特性进行的信息泄漏手段。Fallout利用了处理器内部的某些微妙细节，能够在硬件层面绕过防护，将内核的敏感信息暴露给用户空间。这表明，随着技术的发展，攻击者不断寻找新的方法突破安全边界，而保持系统的安全性和性能之间的平衡仍然是一个持续挑战。这篇论文不仅深入剖析了乱序执行对安全的影响，还揭示了当前硬件防御的漏洞，并提出了新的攻击模型，为后续的安全研究和防御策略提供了重要启示。对于IT专业人士和安全研究人员来说，这是一个值得密切关注的话题，因为它提示了在追求性能的同时，如何有效地防止此类新型威胁，确保系统的稳定性和隐私保护。

and found that ARM CPUs are not vulnerable to the attacks

described in this paper.

IBM.

Finally, we also notiﬁed IBM security about the

ﬁnding reported in this work. IBM had responded that none

of their CPUs is affected, including System-V and PowerPC.

The RIDL Attack.

In a concurrent independent work

, the

RIDL attack [56] analyzes additional buffers present inside

Intel CPUs, with speciﬁc attention to the Line Fill Buffer

(LFB) and load ports. There, they show that faulty loads from

the LFB or load ports leak information across various security

domains. We note however that Fallout is different from (and

complementary to) RIDL. This is since the two attacks exploit

different microarchitectural elements (LFB and load ports for

RIDL and Store Buffer and WTF optimization for Fallout). In

particular, RIDL can be used to recover values recently placed

in the LFB while Fallout allows the attacker to recover the

value of a speciﬁc attacker-chosen writes in the store buffer.

2 Background

In this section, we provide the background required to under-

stand our attack, including a description of caches and cache

attacks, transient execution attacks, and Intel Transactional

Synchronization Extensions.

2.1 Caches and Cache Attacks

Caches are an essential part of modern processors. They are

small and fast memories where the CPU stores copies of

data from the main memory to hide the main memory access

latency. Modern CPUs have a variety of different caches and

buffers for various purposes. The main cache hierarchy is the

instruction and data cache hierarchy consisting of multiple

levels, which vary in size and latency. The L1 is the smallest

and fastest cache. The L3 cache, also called the last-level

cache (LLC), is typically the largest and slowest.

Cache Organization.

Modern caches are typically set-

associative, i.e., a cache line is stored in a ﬁxed set, as deter-

mined by part of its virtual or physical address. Addresses

that map to the same set are called congruent. On modern

processors, the last-level cache is typically physically indexed

and shared across cores. It is also often inclusive of L1 and L2,

which means that all data stored in L1 and L2 is also stored in

the last-level cache. The cache hierarchy exposes the latency

difference between the main memory access (cache miss) and

the cache access (cache hit), i.e., exactly the latency differ-

ence that caches introduce. This can be used in side channels

on a non-colluding victim or in covert channels where sender

and receiver collude to transmit information.

Both teams made contact on May 7th, provided each other with an

overview of their ﬁndings, and coordinated public disclosure as well as

communication with Intel. For a complete timeline describing the ﬂow of

information related to this disclosure, see mdsattacks.com.

Cache Attacks.

Different cache attack techniques have

been proposed in the past, such as Prime+Probe [45, 47] and

Flush+Reload [58]. Flush+Reload attacks and its variants [17,

19, 36, 60] work on shared memory at a cache-line granularity.

The attacker repeatedly ﬂushes a cache line and measures

how long it takes to reload it. The reload time will always

be high unless another process has reloaded the cache line

back into the cache. In contrast, Prime+Probe attacks work

without shared memory, and only at a cache-set granularity.

The attacker repeatedly accesses a set of congruent memory

addresses, ﬁlling an entire cache set with its own cache lines,

and measures how long that takes. As this is repeated in a loop,

the cache set is always ﬁlled with the attacker’s cache lines.

Hence the access time will always be rather low. However,

if another process accesses a memory location in the same

cache set, it will evict one of the attacker’s cache lines and

the access time will increase.

Cache attacks have been used to break cryptographic

implementations [11, 12, 38, 45, 47, 58, 59], infer user in-

put [19,36,48], and break system-level security [18,24]. Both

Prime+Probe and Flush+Reload have also been used in high-

performance covert channels [17, 38, 42], also as a building

block of transient execution attacks such as Meltdown [37],

Spectre [32], and Foreshadow [55, 57] that we detail below.

2.2 Superscalar Processors

To achieve their high performance, modern processors are

often superscalar, that is, they perform multiple operations

in parallel. In current implementations, e.g., in modern Intel

processors (refer Fig. 1), execution of a program is divided

between two main parts. The frontend is responsible for pro-

cessing the machine-code instructions of the program, decod-

ing them to a stream of micro-ops (

OPs) that are sent to the

Execution Engine for execution.

Out-of-order Execution.

The execution engine consists

of multiple execution units, which can execute various

OPs.

To allow superscalar execution, the execution engine follows

a variant of Tomasulo’s algorithm [54], which executes

OPs

when the data they depend on is available, rather than follow-

ing strict program order. Once executed, the

OPs arrive at

the reorder buffer whose purpose is to retire

OPs in program

order, ensuring that architecturally-visible effects of

OPs

execute in the order the programmer speciﬁed.

Speculative Execution.

The stream of

OPs that the

frontend generates does not necessarily correspond to the

sequence of instructions in the program. A major cause of

deviation is branch prediction. When the frontend reaches a

branch instruction, it often does not yet know where execution

will proceed. Instead of waiting, the frontend attempts to pre-

dict the outcome of the branch and proceed from there. In the

case that the prediction is correct, the generated

OPs match

the program and can be processed. Otherwise, at some later

stage, the processor notices the misprediction. The frontend

剩余12页未读，继续阅读

weixin_38684806

粉丝: 4
资源: 896

Fallout：利用微架构漏洞从用户空间读取内核写入的安全威胁

Pipboy-from-fallout-4

Fallout-Shelter-Avatar:PNG格式的Fallout-Shelter-Avatar

fallout pnp 下载

代码实现redisson实现的布隆过滤器

给我推荐几个好玩的RPG

1985 经典数据结构题

JAVA+access综合测评系统毕业设计(源代码+论文+开题报告+任务书).zip

33企业财务风险分析及防范——以永辉超市股份有限公司为例.docx

JAVA物业管理系统设计与实现(论文+源代码).zip

基于springboot的车辆充电桩管理系统设计与实现.docx

最新资源