没有合适的资源?快使用搜索试试~ 我知道了~
首页《硬件安全与信任》:威胁环境下的集成电路设计与部署
"《硬件安全与信任:在威胁环境中的集成电路设计与部署》是一本深入探讨硬件安全领域的重要著作,由Nicolas Sklavos、Ricardo Chaves、Giorgio Di Natale和Francesco Regazzoni四位专家共同编辑。该书针对过去十年间日益增长的国家安全关注焦点,全面涵盖了ASICs(专用集成电路)、COTS(商业-off-the-shelf)组件、FPGAs(可编程逻辑门阵列)、微处理器/DSPs(数字信号处理器)以及嵌入式系统的安全与信任问题。作者们站在最前沿的研究角度,详细讨论了在现代社会中,依赖微电子支持基础设施的安全性和信任度的重要性。 书中不仅介绍了硬件安全的基础概念和技术,还涵盖了如何在面临各种威胁如硬件攻击(Hardware Trojans)的环境中进行有效的设计和部署。硬件Trojan是一种恶意植入在芯片中的设计缺陷或功能,能够在不知情的情况下被激活,对系统的安全造成潜在危害。理解并防止这些威胁对于保障现代电子设备和系统免受未经授权的访问、篡改和破坏至关重要。 本书适合研究人员、工程师、安全专业人员以及政策制定者阅读,提供了一个综合性的参考框架,帮助读者提升对硬件安全的认识,推动安全措施的发展和实施。通过深入剖析当前面临的挑战和最佳实践,读者能够更好地应对日益复杂的威胁环境,确保微电子设备在关键领域的可靠性和安全性。版权信息表明,该书受到Springer International Publishing的保护,并强调所有权利保留,包括翻译和再版的权利。"
资源详情
资源推荐
1 AES Datapaths on FPGAs: A State of the Art Analysis 7
Fig. 1.4 The SRL16 (previous Xilinx FPGAs) and SRL32 (current Xilinx FPGAs) LUT modes
typically not requiring any additional functional logic components. This specific
routing is performed when mapping, placing, and routing the structure onto the
FPGA. However, ShiftRows and InvShiftRows (used on encryption and decryption,
respectively) have opposite shifting directions. Thus the routing path of each opera-
tion cannot be shared.
Performing the (Inv)ShiftRows operation through routing is often the preferred
choice in several proposed 128-bit datapaths such as Bulens et al. [2] and Liu et al.
[17]. However, this implies that a particular implementation can only handle one
ciphering mode. With this approach, two AES cores need to be deployed when
supporting encryption and decryption, as used in HELION Standard and HELION
Fast AES cores [13]. In order to support both encryption and decryption on a single
AES design, both routing options need to coexist. If properly designed, and given
the similarity of the remaining computations, only minimum multiplexing logic is
needed, as presented in Chaves et al. [4].
In smaller datapaths of 32 and 8-bit widths, performing the (Inv)ShiftRows
through routing is not viable, since the 16 bytes of the State are not available at
the same time. The predominant state of the art s olution for the (Inv)ShiftRows in
compact FPGA structures is using addressable memory, as introduced in Chodowiec
and Gaj [5]. These authors show how a RAM memory can be used to temporarily store
the State matrix between rounds, and perform either the ShiftRows or InvShiftRows
by properly addressing the writing and reading operations of the consecutive 32-bit
columns, or 8-bit cells, of the State [8, 11]. The authors further optimize this byte
shift operation by eliminating the need to specify the writing address. This approach
is optimized on Xilinx FPGAs using particular LUTs. On these devices, several LUTs
have an operational mode called SRL32 (SRL16 in older versions). This mode allows
for a single LUT to work as a 32-bit deep shift register with an addressable reading
port, resulting i n improved resource usage efficiency, as depicted in Fig. 1.4.This
approach can be found in 32-bit [5, 20, 23] and 8-bit [6, 25] AES designs.
8 J.C. Resende and R. Chaves
1.3.3 (Inv)SubBytes Implementations: Logic Versus Memory
Another major implementation differentiation in the state of the art is in the byte
substitution operation. These vary from a fine-grained implementation of the byte
substitution (Logic-based) [6, 14, 26], to more coarse grained ones using lookup
table (Memory-based) approaches [2, 17].
Logic-based structures implement the byte substitution operations by hard-wiring
their actual mathematical definition (Sect. 1.2.1) through logic components. If one
recalls Eq. (1.1), the SubBytes substitution requires five XOR operations for each bit,
but first the multiplicative inverse of the input byte, in the GF(2
8
) finite field, needs
to be calculated. The problem with the multiplicative inverse is that there is no direct
function to calculate it. It is possible to calculate the multiplicative inverse through the
Extended Euclidean Algorithm, but this solution is better suited for software rather
than hardware [7]. Another approach to compute this multiplicative inverse, more
oriented to hardware implementations, is to use Composite Fields [24, 26]. Within
logic-based SubBytes implementations, different subsets of Composite Fields can
be considered faster, or more compact, or allow for additional security features,
than other subsets [3, 18, 22, 26]. The logic-based solution for the InvSubBytes
computation is similar to SubBytes, but modifications are still needed.
Overall, logic-based SubBytes implementations are the most area efficient but also
the slowest approaches, when compared to memory-based solutions. In a memory-
based SubBytes, byte substitution is implemented using a 256-byte lookup SBox
table [5, 7, 19]. On FPGAs this can be implemented through the use of multiple
FPGA LUTs [2, 17], or even BRAMs [5, 10]. Memory-based approaches can lead
to faster circuits at the cost of memory blocks.
On ASIC technology, the decision of using either logic-based or memory-based
SubBytes should be carefully analyzed [15]. However, on FPGAs, the use of logic-
based implementations has been losing relevancy in comparison to the memory-based
counterpart, mainly due to technology improvements. On older or more economical
FPGAs, one FPGA LUT can only be configured as a 4-input arbitrary function, with
two LUTs per FPGA Slice. On more high end FPGAs, such as the Xilinx Virtex 5
and onwards technologies, each Slice contains four 6-input LUTs that can be easily
combined into a single 8-input lookup table (the exact specification of the AES SBox)
with a relatively low latency. If both SubBytes and InvSubBytes operations need to
be deployed, either a 9-bit lookup table needs to be considered, or two 8-bit lookup
tables multiplexed.
Another easily accessible solution is the use of embedded dual-port memory
blocks, BRAMs, that exist within the FPGA. These memory blocks easily allow to
store the 2k bits needed for each byte substitution operation.
Implementations that only allow for one ciphering mode often consider the use of
LUT-based SBoxes, for shorter clock latency (512 LUTs for 128-bit datapaths [2, 17]
and 32 LUTs for 8-bit datapaths [25]). Architectures that allow for both ciphering
modes often incorporate pipelined BRAM-based implementations, since they can
1 AES Datapaths on FPGAs: A State of the Art Analysis 9
easily store all tables in their larger memories (8 BRAMs for 128-bit datapaths [10]
and two BRAMs for 32-bit datapaths [5]).
1.3.4 Implementing the MixColumns: Logic
After the SubBytes and ShiftRows operations, in the encryption mode, the Mix-
Columns operation is computed by performing a matrix multiplication in GF(2
8
). In
this operation each 32-bit State column is multiplied by the left matrix of Eq. (1.2),
depicting the multiplication coefficients. Similarly to the SubBytes operation, the
MixColumns can also be implemented using logic or lookup tables.
In the MixColumns operation each byte is multiplied by a set of four constants
({03}, {02}, {01}, and {01} in the case of encryption). As described in Sect. 1.2.3,
the multiplication by 2, in GF(2
8
), can be computed by shifting the input value once
to the left. If the resulting 9th bit is ‘1’, the entire result has to be bitwise XORed
(subtraction in GF(2
8
)) by ‘0x11B’, in order to perform the modular reduction. The
multiplication by 3 can be achieved by adding the multiplications by 1 (the input value
itself) and by 2 (with the addition in GF(2
8
) being performed by a bitwise XOR).
To conclude the MixColumns matrix multiplication, the multiplied values are
added in GF(2
8
) by a XOR tree, as
r
0i
=
r
1i
=
r
2i
=
r
3i
=
02 × a
0i
⊕ 03 × a
1i
⊕ 01 × a
2i
⊕ 01 × a
3i
01 × a
0i
⊕ 02 × a
1i
⊕ 03 × a
2i
⊕ 01 × a
3i
01 × a
0i
⊕ 01 × a
1i
⊕ 02 × a
2i
⊕ 03 × a
3i
03 × a
0i
⊕ 01 × a
1i
⊕ 01 × a
2i
⊕ 02 × a
3i
(1.3)
Overall, in a logic-based MixColumns operation, the matrix coefficient multipli-
cations are relatively simple: it requires, for each byte, one 1-bit shift, one 8-bit con-
ditional XOR with the constant ‘0x1B’ to perform the modular reduction (computing
×02), and one 8-bit wide XOR to compute the addition (e.g., ×03 =×02 ⊕×01).
Figure 1.5 illustrates the multiplication of the four coefficients, given one input byte.
Fig. 1.5 Circuit example for the GF(2
8
) encryption multiplication
10 J.C. Resende and R. Chaves
On a 128-bit datapath, the MixColumns requires a total of 128 7-input functions, or
256 6-input FPGA LUTs. On FPGAs this operation can be performed with relatively
low latency, in comparison with the SubBytes stage, as suggested by [2, 5, 10, 17].
On 8-bit datapaths, a single State byte is provided in each clock cycle. As such,
the resulting bytes cannot be completed on a single cycle, since each byte result-
ing from the MixColumns operation depends on four State bytes. Given this, f or
8-bit datapaths, registered accumulation can be used. One such approach was first
introduced by Hämäläinen et al. [12] for ASIC technology, and later adapted for
FPGA by Chu and Benaissa [6]. The resulting structure is depicted in Fig. 1.6.
In this design, the input byte is shifted and XORed in order to obtain the 4 coeffi-
cient multiplications ({03; 01; 01; 02}). The resulting values are then XORed by zero
in the first iteration and temporarily stored in four 8-bit registers. In the following
cycles, a new input byte suffers the same transformations but is XORed with the
previously stored 4-bytes. After 4+1 cycles, one matrix multiplication for one State
column is performed. After 16+1 cycles, the entirety of the MixColumns operation
can be completed. The issue with this approach [6, 12], is the fact that it requires a
32-bit parallel-to-serial converter, given the 8-bit datapath, as depicted at the bottom
of Fig. 1.6.
Instead of performing the 4 coefficient multiplications in parallel, Sasdrich and
Güneysu [25] proposed an 8-bit-only accumulative implementation that performs
one coefficient multiplication per iteration, as illustrated in Fig. 1.7.
With this approach, a significant area reduction can be achieved by further folding
the matrix multiplication and by not needing the parallel-to-serial converter. Addi-
tional resources can be saved by preloading a Round Key byte into the register, thus
Fig. 1.6 Chu and Benaissa [6] Accumulative MixColumns 8-by-32-by-8 bits
1 AES Datapaths on FPGAs: A State of the Art Analysis 11
Fig. 1.7 Sasdrich and Güneysu [25] 8-bit Accumulative MixColumns
intrinsically performing the AddRoundKey operation. However, this area compres-
sion implies a significant throughput reduction, since it requires 96 clock cycles to
complete the MixColumns and AddRoundKey operations. It should be noted that
none of these solutions [6, 12, 25] addresses the InvMixColumns operation, which
is more complex given the used coefficients ({0B; 0D; 09; 0E}).
1.3.5 Implementing the InvMixColumns: Logic
The I nvMixColumns operation is identical to the MixColumns, but with the coeffi-
cients {0B; 0D; 09; 0E}, resulting in a more complex datapath. The required three
modular shifts (×08;×04;×02) and respective XORs (see Table 1.1 and Eq. (1.2))
create a dependency of up to 23 input signals for each bit of the 32-bit matrix
multiplication result, as depicted in Fig. 1.8. Because of this complexity, only two
state-of-the-art proposals have presented results for architectures with logic-based
InvMixColumns [2, 5].
In the single-mode structure presented by Bulens et al. [2], the authors implement
the extra required logic for the InvMixColumns (+150 Slices). Chodowiec and Gaj [5]
on the other hand, presented a 32-bit datapath that can operate in either encryption or
decryption mode. This approach allows to share resources between the two matrices
multiplications.
Chodowiec and Gaj [5] realized that, by applying a different, slightly simpler,
matrix multiplication over the MixColumns operation, one can compute both the
MixColumns and InvMixColumns by sharing resources. Being c(x) and d( x ) the
polynomials defining the MixColumns and InvMixColumns operations, respectively,
and given that
c(x)•d(x) = 01 ⇔ c(x)•d
2
(x) = d(x) (1.4)
剩余253页未读,继续阅读
soctest2010
- 粉丝: 0
- 资源: 6
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- IPQ4019 QSDK开源代码资源包发布
- 高频组电赛必备:掌握数字频率合成模块要点
- ThinkPHP开发的仿微博系统功能解析
- 掌握Objective-C并发编程:NSOperation与NSOperationQueue精讲
- Navicat160 Premium 安装教程与说明
- SpringBoot+Vue开发的休闲娱乐票务代理平台
- 数据库课程设计:实现与优化方法探讨
- 电赛高频模块攻略:掌握移相网络的关键技术
- PHP简易简历系统教程与源码分享
- Java聊天室程序设计:实现用户互动与服务器监控
- Bootstrap后台管理页面模板(纯前端实现)
- 校园订餐系统项目源码解析:深入Spring框架核心原理
- 探索Spring核心原理的JavaWeb校园管理系统源码
- ios苹果APP从开发到上架的完整流程指南
- 深入理解Spring核心原理与源码解析
- 掌握Python函数与模块使用技巧
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功