Some simple questions on instructions
- When would you write to memory and never read (or use) the written value?
In principle, there’s no greater sin than writing data and not using it because
you are throwing away time. In practice there several circumstances that may force
this pointless action. First, you might save data when entering a subroutine. Typically,
registers that are reused in the subroutine are saved. However, the data in a saved
register may not be required and the save/restore operation wastes time.
A second occasion where data is written but not read occurs when the saved data is
to be accessed in a conditional cause; for example, IF x == y THEN P=A+B ELSE P=A+C.
Here, if x is equal to y then variable C is not accessed. If this is the only place
C is read, then storing it was a pointless act.
- What is self-modifying code, where and when us it used, and what are its advantages
Self-modifying code is code that can modify itself at runtime; for example, suppose
memory location X contains the code 0x1111111100000000 which means ADD 1 to r0. Now
suppose that the op-code for the operation SUB 1 from r0 is 0x1111111100000001. Clearly,
toggling the least significant bit of X will change the operation from addition to
subtraction and vice versa
In the early days of computing, self-modifying code was very important. If a computer
didn’t have register indirect addressing, you could synthesize it by modifying an
instruction with an absolute address. Consider LDA 500 (load accumulator with the
contents of memory location 500). If we add 1 to this instruction, we generate the
new instruction LDA 501 (assuming that the operand address is right justified). We
have actually operated on the data filed of an instruction to transform a fixed absolute
address into a dynamic address. Today, this form of self-modifying code is not necessary
because address register indirect addressing is universal.
Self-modifying code can have a darker side. If you modify code at run-time you can
change its behaviour and implement malware by, for example, erasing disk files.
Self-modifying code is also considered bad practice because it makes debugging difficult.
If the code changes at run time, anyone trying to debug a program would see only
the pre-execution code.
Today’s sophisticated processors make self-modifying code either impossible or most
inadvisable. Consider out-of-order execution. If you execute an instruction, modify
it and then execute the modified version, it is difficult to guarantee that the modification
will not take place before the instruction is used the first time. As even more compelling
reason for not using self-modifying code is the cache memory. If an instruction is
cached, it will run from the cache. Changing the instruction in memory may not change
the cached instruction. Moreover, in a system with split instruction and data caches
(which we describe in chapter 9), all operations are applied to the data cache and
not the instruction cache.
Self-modifying code may fail on pipelined processors because the instruction being
modified may already be in the pipeline when the operation to modify the instruction
in memory takes place.
Self-modifying code in not necessary today. It’s disadvantages are that it cannot
be implemented on many processors, self-modifying code is difficult to debug, and
it can be used as a vehicle to implement malware.
Who’d be an instruction set designer?
No one in their right mind. It’s a no-win game. You have too much to do with too
few resources. In particular, the number of bits in an instruction is fixed (8, 16,
32, 64). When designing an 8-bit processor you can, indeed have to, use multiple
length instructions. When designing a 32-bit processor you have just enough bit to
create a credible instruction set *e.g., ARM, MIPS, SPARC, PowerPC). But you don’t
have enough bits to do everything you want. You could use 48 bits but then you’d
be on your own.
What is the first decision when creating a new architecture?
Assuming that you are not constrained by backward compatibility (e.g., creating a
new ARM variant), the fundamental decision is the general ISA type:
Classic CISC . This is the register-to-memory architecture. It can provide compact
code but memory accesses are far slower than register accesses. However, the old-fashioned
CISC structure at the heart of Intel’s IA32 processors does not seem to have greatly
affected the company’s profitability.
Classic RISC: The RISC architecture is strictly register-to-register or load and
store; that is, all data processing operations take place between registers (usually
with a three-operand format) and the only memory accesses are load register and store
register. MIPS and SPARC are typical RISC processors. ARM is largely RISC but with
multiple memory accesses and support for a stack mechanism.
DSP: DSP Processors are designed to process data representing audio and video; that
is, continuous, tine-varying data. DPS processors often have separate data and instruction
memory (Harvard format), and concentrate on accumulate and sum operations; that is
s = a + b.c. Because the data they process is continuous, DSP processors sometime
have the means to access data efficiently from circular buffers.
VLIW: The very long instruction word processor performs multiple operations per instruction
word (or instruction bundle). The best know VLIW is the Itanium processor.
What is the second decision?
You have to decide how many on chip registers to implement. Increasing the number
of registers reduces the processor-memory traffic and improves performance. However,
increasing the number of registers requires more bits to specify a register. A RISC
with a 3-register instruction and 32 registers must devote 3 x 5 = 15 bits to register
selection. Doing so reduces the number of bits left over for operation selection
and any literal that forms part of the instruction. If you have only 4 registers,
you would need only 2 x 3 = 6 bits. But that would provide an impossibly small set
of working registers.
Suppose you design an instruction with the following format:
Op-code; 10 bits
Destination register: 7 bits
Source register 1: 7 bits
Source register 2: 7 bits
Immediate/literal: 17 bits
a. What is the maximum number of unique op-codes?
b. How many registers are there?
c. That is the range of the immediate if we assume it is signed?
d. What is the total instruction length?
e. What’s in register 0?
f. How many bits is a data word?
g. You eventually, settle for a 67-bit data word. Are you mad?
- a. 210 = 1,024 instructions
- b. 27 = 128 registers
- c. ±217-1 = ±216 = ±64K
- d. 10 x 3 x 7 + 17 = 48 bits
- e. Nothing, of course. Since there are so many registers, you can easily afford to
give one up and keep it hardwired to 0 (like the MIPS). This gives you automatic
access to the constant zero simply by using r0 as a register in an instruction.
- f. Tricky! We could have a data word to match an instruction word; that is, 48 bits.
The problem here is that if you permit byte operation, each word would hold 48/8
= 6 bytes. To step though a data structure would mean stepping 6 bytes at a time.
Other computers have data words whose size is an integer power of 2 (8,16,32,64,128)
which means that stepping from word-to-word means incrementing by 1,2,4,8, or 16)
and the least significant address bits are x, x0, x00, x000, or x0000. We could use
a 32-bit data word or a 64-bit data word to avoid the addressing problem. This would
require a Harvard architecture since instruction and address words would be different.
- g. Well why not? You’ve selected a 64-bit data word partially because memory is so
cheap today and partially because the instruction format is 48 bits long which provides
more room for manoeuvre than current 32-bit processors. You’ve also thrown in 3 extra
bits. These three bits are not part of the data field, but are tags that define some
aspect of the data word (you were influenced by the Itanium that has a 65-bit data
word). There bits are:
- NaN (not a number) This is the Itanium bit and can be used to indicate that, for
some reason, the contents of this location are not valid. This is important because
it can be used to short-circuit calculations. If the NaN bit is set in one of the
variables, the whole result is invalid.
- D (dirty) The dirty bit is borrowed from the cache memory world. It indicates whether
the contents of this register are valid or not; that is, whether the register has
been used. This bit could be used to decide whether data is to be saved.
- P (parity) A parity bit could be used for error detection. If the parity bit is faulty,
it implies that the register contents have changed (due to an error).
You are designing a dedicated 16-bit processor. Simulation shows that you require
8 registers with a 2-register instruction format. Most literal operations require
only an unsigned constant in the range 0 to 15. If the total number of instruction
a. is such a processor possible?
b. what is the largest literal that can be loaded from memory?
c. Can’t you improve your answer to part b?
- Using 8 registers requires 3 register-select bits; that is, 6 bits to define two
registers. A constant in the range 0 to 15 requires 4 bits. The total number of bits
is 6 + 4 = 10, which leaves 16 – 10 – 6 bits for the op-code. This provides 26 =
64 op-codes which is more than the minimum of 50. Consequently the arrangement is
- If the op-code takes 6 bits and a register is specified by 3 bits, then a load register
instruction requires 9 bits leaving 16 – 9 = 7 bits for a literal. This gives a range
of 0 to 127.
- OK, OK. We said that there were 50 op-codes and we could specify 64. This means we
have 14 unused op-codes. We could define LOAD r0 as an op-code, LOAD r1 as an op-code,
and so on. That is, we move the register select bits into the op-code filed. That
provides a constant of 16 – 6 = 10 bits (0 to 1,023).
You have a really bright idea. Register-to-register operations require 3n bits to
specify registers, where n in the number of bits required to specify one register.
You design a system that has 64 registers and you can implement operations like ADD
r1,r2,r3. Unfortunately, where you redo your sums you find that you are a single
bit short. You complete your design and you shave off one bit by having a instruction
that can specify 64 destination registers, 64 source 1 registers, and 32 source 2
registers. You are fired and lose your job. Is there any way in which you can defend
The purpose of registers is to reduce processor-memory traffic. Registers hold temporary
variables and constants and intermediate values formed in calculations. Because the
second source register must be r0 to r31 rather than r0 to r63, you can keep this
half the register array for frequently accessed constants. Unlike with windowing,
you still have access to all register elements. The only restriction is that you
can’t simultaneously use two source operands in the range r32 to r63.