Microprocessor Design/Instruction Set Architectures

ISAs

Instruction Set 또는 Instruction Set Architecture (ISA)는 Processor가 Decoding해서 명령어를 실행하는 가장 작은 Instruction의 집합입니다.

ISA는 크게 RISC(Reduced Instruction Set Computer)와 CISC(Complex Instruction Set Computer)로 구분할 수 있습니다.

CISC

Complex Instruction Set Computer (CISC)

처음의 ISA 설계는 CISC 구조였습니다. 처음에는 컴파일러(Compiler)가 없었기 때문에 프로그램(Program)들은 프로그래머(Programmer)에 의해서 수작업으로 Instruction Set의 코드로 작성되었습니다.

쉽게 프로그래밍을 하기 위해서는 더 많은 명령어(Instruction)들이 추가 될 필요가 있었습니다. 이렇게 많은 명령어가 추가되는 것은 복잡한 구조가 되어가는 것을 의미합니다. 일반적으로 하드웨어가 더 복잡해지고, 특정한 명령어에만 특화되는 것은 비효율적입니다. 즉 더 많은 명령어가 추가될수록 성능상으로는 오히려 비효율적이게 되었습니다.

따라서, 일반적으로 CISC Archtecture에서 가장 간단한 명령어만 사용하면 가장 좋은 성능을 얻을 수 있습니다.

가장 잘 알려진 상업화된 CISC ISA로는 Motorola 68k 와 Intel x86 Architecture가 있습니다.

RISC

1970년 후반 IBM에서 논의 되었습니다.

연구자들은 대부분의 프로그램이 특정한 명령에서 이용될 수 있는 다양한 주소 모드(address mode)를 제대을 사용하지 못한다는걸 발견했습니다.

주소 모드(Address Mode)의 숫자를 줄이고, multi-cycle Instruction을 Single-Cycle Instruction 으로 바꿈으로서 얻을 수 있는 장점들에 대해서 깨달았습니다.

컴파일러를 만들고 최적화 시키는게 더 쉽다.
간단한 명령어를 사용함으로서 프로그램의 성능이 향상된다.
Minimum cycle time이 정의 됨으로서, clock의 속도를 더 높일 수 있다.

가장 잘 알려진 상업화된 RISC ISA로는 PowerPC, ARM, MIPS 그리고 SPARC Architecture가 있습니다.

VLIW

We will discuss VLIW Processors in a later section.

Vector processors

We will discuss Vector Processors in a later section.

Computational RAM

Computational RAM (C-RAM) is semiconductor random access memory with processors incorporated into the design to build an inexpensive massively-parallel computer.

Memory Arrangement

관습적으로 Instructions들은 Memory상에 연속적으로 배치되어 있습니다.

각 명령어들은 Computer word의 크기와 같거나, CISC 구조 같은 경우에는 더 많을 수도 있습니다.

PC(Program Counter)라고 불리는 Microprocessor안의 Register는 이러한 Instruction의 Address를 가지고 있습니다.

간단한 RISC구조에서 Instruction Fetching, Instruction Decoding and Register Fetching, Instruction Execution의 3단계를 한다고 할 때. Instruction Fetching은 Instruction Register에 현재 PC가 가리키고 있는 Address에서 명령어를 Fetching 하고. 자신의 PC 값을 다음 명령어가 있는 Address로 증가 시키는 것이라고 생각할 수 있습니다.

In addition to fetches of the executable instructions, many (but not all) instructions also fetch data values from memory ("load") into a data register, or write data values from a data register to memory ("store").

The address of the particular memory word accessed in such a load or store instruction is called the "effective address".

In the simplest instruction sets, the effective address always contained in some address register.

Other instruction sets have more complex "effective address" calculations — we will discuss such "addressing modes" later.

Common Instructions

Move, Load, Store

Move instructions cause data from one register to be moved or copied to another register.

Load instructions put data from an external source, such as memory, into a register.

Store instructions move data from a register to an external destination.

Instructions that move (or copy) data from one place to another are the #1 most-frequently-used instructions in most programs.^[1]

Branch and Jump

Branching 과 Jumping과 같은 분기명령어는 PC Register의 값을 변경하는 명령어 입니다.

이러한 분기명령어를 이용해서 얻을 수 있는 가장 큰 이점은, 어떠한 프로그램 코드(Procedure)를 호출(Call)하고 다시 본래의 프로그램 코드로 돌아갈 수 있도록 할 수 있다는 겁니다.

이러한 분기명령어 들을 이용해서 Interrupt를 구현하고, Procedure과 Function을 구현할 수 있습니다.

Arithmetic Instructions

The Arithmetic Logic Unit (ALU) is used to perform arithmetic and logical instructions. The capability of the ALU typically is greater with more advanced central processors, but RISC machines' ALUs are deliberately kept simple and so have only some of these functions. An ALU will, at minimum, perform addition, subtraction, NOT, AND, OR, and XOR, and usually also single-bit rotates and shifts. Many CISC machine ALUs can also perform multi-bit rotates and shifts (with a barrel shifter) and integer multiplication and division. While many modern CPUs can also do floating point mathematical operations, these are usually handled by the FPU, a different part of the machine. We describe the ALU in more detail in the ALU design chapter.

Input / Output

Input instructions fetch data from a specified input port, while output instructions send data to a specified output port. There is very little distinction between input/output space and memory space, the microprocessor presents an address and then either accepts data from, or sends data to, the data bus, but the sort of operations available in the input/output space are typically more limited than those available in memory space.

NOP

NOP, short for "no operation" is an instruction that produces no result and causes no side effects. NOPs are useful for timing and preventing hazards.

Instruction Length

There are several different ways people balance the various advantages and disadvantages of various instruction lengths.

Fixed-length instructions are less complicated for a CPU to handle than variable-width instructions for several reasons, and are therefore somewhat easier to optimize for speed. Such reasons include: CPUs with variable-length instructions have to check whether each instruction straddles a cache line or virtual memory page boundary; CPUs with fixed-length instructions can skip all that. ^[2]

There simply are not enough bits in a 16 bit instruction to accommodate 32 general-purpose registers, and also do "Ra = Rb (op) Rc" -- i.e., independently select 2 source and 1 destination register out of a general purpose register bank of 32 registers, and also independently select one of several ALU operations.

And so people who design instruction sets must make one or more of the following compromises:

sacrifice code density and use longer fixed-width instructions, typically 32 bit, such as the MIPS and DLX and ARM.
sacrifice fixed-width instructions, requiring a more complicated decoder to handle both short 16 bit instructions and longer 3-operand instructions, such as ARM Thumb
sacrifice 3-operands, using no more than 2 operands in all instructions for everything, such as the Atmel AVR. 3-operand instructions allow better reuse of data^[2]; without 3-operand instructions, programs occasionally require extra copy instructions when both variable input operands to some ALU operation need to be preserved for some later instruction(s).
sacrifice registers, so only 16 or 8 programmer-visible registers.
sacrifice the concept of general purpose register -- perhaps only 16 or 8 "data registers" are visible to 3-operand ALU instructions, as in the 68000, or the destination is restricted to one or two "accumulators", but other registers (such as "address registers") are visible to other instructions.

Instruction format

Any one particular machine-language instruction for any one particular CPU can typically be divided up into fields. For example, certain bits in a "ADD" instruction indicate the operation -- that this is actually an "ADD" rather than a "XOR" or "subtract" instruction. Other bits indicate which register is the source, other bits indicate which register is the destination, etc.

A few processors not only have fixed instruction widths but also have single instruction format -- a fixed set of fields that is the same for every instruction.

Many processors have fixed instruction widths but have several instruction formats. The actual bits stored in a special fixed-location "instruction type" field (that is in the same place in every instruction for that CPU) indicates which of those instruction format is used by this specific instruction -- which particular field layout is used by this instruction. For example, the MIPS processors have R-type, I-type, J-type, FR-type, and FI-type instruction formats.^[3] For example, the J1 processor has 3 instruction formats: Literal, Branch, and ALU.^[4] For example, the Microchip PIC mid-range has 4 instruction formats: byte-oriented register operations, bit-oriented register operations, 8-bit literal operations, and branch instructions with an 11-bit literal. ^[5]

Occasionally some new CPU has a different instruction set formats from some other CPU, making it "not binary-compatible" with that other CPU. However, sometimes this new CPU can be designed to have "source code backward-compatibility" with some other CPU -- it is "assembly-language compatible but not binary-compatible" with programs written some other CPU. (Such as, for example, the 8080 which was source-compatible but not binary-compatible with the 8008). (Such as, for example, the 8086 which was source-compatible but not binary-compatible with programs written for the 8085, the 8080, and the 8008 틀:Fact).