The x86 Instruction Set Architecture
1
CS232: Computer Architecture II
This set of notes provides an overview of the x86 instruction set architecture and its use in modern software. The goal
is to familiarize you with the ISA to the point that you can code simple programs and can read disassembled binary
code comfortably. Substantial portions of the ISA are ignored completely for the sake of simplicity. The notes use the
assembly notation used by the GNU tools, including the assembler as (used by the compiler gcc) and the debugger
gdb. Other tools may define other notations, but such things are merely cosmetic so long as you pay attention to what
you are using at the time.
The Basics: Registers, Data Types, and Memory
You may have heard or seen the term “Reduced Instruction Set Computing,” or RISC, and its counterpart, “Complex
Instruction Set Computing,” or CISC. While these terms were never entirely clear and have been further muddied by
years of marketing, the x86 ISA is certainly vastly more complex than that of MIPS. On the other hand, much of
the complexity has to do with backwards compatibility, which is mostly irrelevant to someone writing code today.
Furthermore, we need use only a limited subset of the ISA in this class.
Modern flavors of x86—also called IA32, or Intel Architecture 32—have eight 32-bit integer registers. The registers
are not entirely general-purpose, meaning that some instructions limit your choice of register operands to fewer than
eight. A couple of other special-purpose 32-bit registers are also available—namely the instruction pointer (program
counter) and the flags (condition codes), and we shall ignore the floating-point and multimedia registers. Unlike most
RISC machines, the registers have names stemming from their historical special purposes, as described below.
%eax accumulator (for adding, multiplying, etc.)
%ebx base (address of array in memory)
%ecx count (of loop iterations)
%edx data (e.g., second operand for binary operations)
%esi source index (for string copy or array access)
%edi destination index (for string copy or array access)
%ebp base pointer (base of current stack frame)
%esp stack pointer (top of stack)
%eip instruction pointer (program counter)
%eflags flags (condition codes and other things)
AH
BH
CH
DH
high
AX
31
AH AL
EAX
8 016 15 7
8−bit
AL
BL
CL
DL
low
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
32−bit
AX
BX
CX
DX
DI
DI
SP
BP
16−bit
The character “%” is used to denote a register in assembly code and is not considered a part of the register name itself;
note also that register names are not case sensitive. The letter “E” in each name indicates that the “extended” version
of the register is desired (extended from 16 bits). Registers can also be used to store 16- and 8-bit values, which is
useful when writing smaller values to memory or I/O ports. As shown to the right above, the low 16 bits of a register
are accessed by dropping the “E” from the register name, e.g., %si. Finally, the two 8-bit halves of the low 16 bits of
the first four registers can be used as 8-bit registers by replacing “X” with “H” (high) or “L” (low).
The x86 ISA supports both 2’s complement and unsigned integers in widths of 32, 16, and 8 bits, single and double-
precision IEEE floating-point, 80-bit Intel floating-point, ASCII strings, and binary-coded decimal (BCD). Most in-
structions are independent of data type, but some require that you select the proper instruction for the data types of the
operands. Try multiplying 32-bit representations of -1 and 1 to produce a 64-bit result, for example.
Use of memory is more flexible in x86 than in MIPS: in addition to load and store operations, many x86 operations
accept memory locations as operands. For example, a single instruction serves to read the value in a memory location,
add a constant, and store the sum back to the memory location. With x86, memory is 8-bit (byte) addressable and uses
32-bit addresses, although few machines today fully populate this 4 GB address space.
One aspect of x86’s treatment of memory may confuse you: it is little endian. Little endian means that if you store a
32-bit register into memory and then look at the four bytes of memory one by one, you will find the little end of the
32 bits first, followed by the next eight bits, then the next, and finally the high eight bits of the stored value. Thus
0x12345678 becomes 0x78, 0x56, 0x34, 0x12 in consecutive memory locations. Obviously, values read from memory