Ref. # 319433-014 1-5
INTEL® ADVANCED VECTOR EXTENSIONS
SIMD prefixes. The 128-bit data processing instructions in AVX cover floating-point and integer data movement
primitives.
Additional enhancements in AVX on 128-bit data processing primitives include 16 new instructions with the
following capabilities:
• Non-unit-strided fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching
primitives:
— broadcast of single data element into a 128-bit destination,
— masked move primitives to load or store SIMD data elements conditionally,
• Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data
manipulation primitives:
— permute primitives to facilitate efficient manipulation of floating-point data elements in 128-bit SIMD
registers
• Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:
— new variable blend instructions supports four-operand syntax with non-destructive source syntax.
Branching conditions dependent on floating-point data or integer data can benefit from Intel AVX. This is
more flexible than non-VEX encoded instruction syntax that uses the XMM0 register as implied mask for
blend selection. While variable blend with implied XMM0 syntax is supported in SSE4 using SIMD prefix
encoding, VEX-encoded 128-bit variable blend instructions only support the more flexible four-operand
syntax.
— Packed TEST instructions for floating-point data.
1.5.5 AVX2 and 256-bit Vector Integer Processing
AVX2 promotes the vast majority of 128-bit integer SIMD instruction sets to operate with 256-bit wide YMM regis-
ters. AVX2 instructions are encoded using the VEX prefix and require the same operating system support as AVX.
Generally, most of the promoted 256-bit vector integer instructions follow the 128-bit lane operation, similar to the
promoted 256-bit floating-point SIMD instructions in AVX.
Newer functionalities in AVX2 generally fall into the following categories:
• Fetching non-contiguous data elements from memory using vector-index memory addressing. These “gather”
instructions introduce a new memory-addressing form, consisting of a base register and multiple indices
specified by a vector register (either XMM or YMM). Data elements sizes of 32 and 64-bits are supported, and
data types for floating-point and integer elements are also supported.
• Cross-lane functionalities are provided with several new instructions for broadcast and permute operations.
Some of the 256-bit vector integer instructions promoted from legacy SSE instruction sets also exhibit cross-
lane behavior, e.g. VPMOVZ/VPMOVS family.
• AVX2 complements the AVX instructions that are typed for floating-point operation with a full compliment of
equivalent set for operating with 32/64-bit integer data elements.
• Vector shift instructions with per-element shift count. Data elements sizes of 32 and 64-bits are supported.
1.6 GENERAL PURPOSE INSTRUCTION SET ENHANCEMENTS
Enhancements in the general-purpose instruction set consist of several categories:
• A rich collection of instructions to manipulate integer data at bit-granularity. Most of the bit-manipulation
instructions employ VEX-prefix encoding to support three-operand syntax with non-destructive source
operands. Two of the bit-manipulating instructions (LZCNT, TZCNT) are not encoded using VEX. The VEX-
encoded bit-manipulation instructions include: ANDN, BEXTR, BLSI, BLSMSK, BLSR, BZHI, PEXT, PDEP, SARX,
SHLX, SHRX, and RORX.
• Enhanced integer multiply instruction (MULX) in conjunctions with some of the bit-manipulation instructions
allow software to accelerate calculation of large integer numerics (wider than 128-bits).
• INVPCID instruction targets system software that manages processor context IDs.