Chap 4

Home Up

Home Teaching Glossary ARM Processors Supplements Prof issues About

Some simple questions on instructions

Question 1

When would you write to memory and never read (or use) the written value?

In principle, there’s no greater sin than writing data and not using it because you are throwing away time. In practice there several circumstances that may force this pointless action. First, you might save data when entering a subroutine. Typically, registers that are reused in the subroutine are saved. However, the data in a saved register may not be required and the save/restore operation wastes time.

A second occasion where data is written but not read occurs when the saved data is to be accessed in a conditional cause; for example, IF x == y THEN P=A+B ELSE P=A+C. Here, if x is equal to y then variable C is not accessed. If this is the only place C is read, then storing it was a pointless act.

What is self-modifying code, where and when us it used, and what are its advantages and disadvantages?

Self-modifying code is code that can modify itself at runtime; for example, suppose memory location X contains the code 0x1111111100000000 which means ADD 1 to r0. Now suppose that the op-code for the operation SUB 1 from r0 is 0x1111111100000001. Clearly, toggling the least significant bit of X will change the operation from addition to subtraction and vice versa

In the early days of computing, self-modifying code was very important. If a computer didn’t have register indirect addressing, you could synthesize it by modifying an instruction with an absolute address. Consider LDA 500 (load accumulator with the contents of memory location 500). If we add 1 to this instruction, we generate the new instruction LDA 501 (assuming that the operand address is right justified). We have actually operated on the data filed of an instruction to transform a fixed absolute address into a dynamic address. Today, this form of self-modifying code is not necessary because address register indirect addressing is universal.

Self-modifying code can have a darker side. If you modify code at run-time you can change its behaviour and implement malware by, for example, erasing disk files.

Self-modifying code is also considered bad practice because it makes debugging difficult. If the code changes at run time, anyone trying to debug a program would see only the pre-execution code.

Today’s sophisticated processors make self-modifying code either impossible or most inadvisable. Consider out-of-order execution. If you execute an instruction, modify it and then execute the modified version, it is difficult to guarantee that the modification will not take place before the instruction is used the first time. As even more compelling reason for not using self-modifying code is the cache memory. If an instruction is cached, it will run from the cache. Changing the instruction in memory may not change the cached instruction. Moreover, in a system with split instruction and data caches (which we describe in chapter 9), all operations are applied to the data cache and not the instruction cache.

Self-modifying code may fail on pipelined processors because the instruction being modified may already be in the pipeline when the operation to modify the instruction in memory takes place.

Self-modifying code in not necessary today. It’s disadvantages are that it cannot be implemented on many processors, self-modifying code is difficult to debug, and it can be used as a vehicle to implement malware.

Question 2

Who’d be an instruction set designer?

No one in their right mind. It’s a no-win game. You have too much to do with too few resources. In particular, the number of bits in an instruction is fixed (8, 16, 32, 64). When designing an 8-bit processor you can, indeed have to, use multiple length instructions. When designing a 32-bit processor you have just enough bit to create a credible instruction set *e.g., ARM, MIPS, SPARC, PowerPC). But you don’t have enough bits to do everything you want. You could use 48 bits but then you’d be on your own.

What is the first decision when creating a new architecture?

Assuming that you are not constrained by backward compatibility (e.g., creating a new ARM variant), the fundamental decision is the general ISA type:

Classic CISC . This is the register-to-memory architecture. It can provide compact code but memory accesses are far slower than register accesses. However, the old-fashioned CISC structure at the heart of Intel’s IA32 processors does not seem to have greatly affected the company’s profitability.

Classic RISC: The RISC architecture is strictly register-to-register or load and store; that is, all data processing operations take place between registers (usually with a three-operand format) and the only memory accesses are load register and store register. MIPS and SPARC are typical RISC processors. ARM is largely RISC but with multiple memory accesses and support for a stack mechanism.

DSP: DSP Processors are designed to process data representing audio and video; that is, continuous, tine-varying data. DPS processors often have separate data and instruction memory (Harvard format), and concentrate on accumulate and sum operations; that is s = a + b.c. Because the data they process is continuous, DSP processors sometime have the means to access data efficiently from circular buffers.

VLIW: The very long instruction word processor performs multiple operations per instruction word (or instruction bundle). The best know VLIW is the Itanium processor.

What is the second decision?

You have to decide how many on chip registers to implement. Increasing the number of registers reduces the processor-memory traffic and improves performance. However, increasing the number of registers requires more bits to specify a register. A RISC with a 3-register instruction and 32 registers must devote 3 x 5 = 15 bits to register selection. Doing so reduces the number of bits left over for operation selection and any literal that forms part of the instruction. If you have only 4 registers, you would need only 2 x 3 = 6 bits. But that would provide an impossibly small set of working registers.

Question 3

Suppose you design an instruction with the following format:

Op-code; 10 bits

Destination register: 7 bits

Source register 1: 7 bits

Source register 2: 7 bits

Immediate/literal: 17 bits

a. What is the maximum number of unique op-codes?

b. How many registers are there?

c. That is the range of the immediate if we assume it is signed?

d. What is the total instruction length?

e. What’s in register 0?

f. How many bits is a data word?

g. You eventually, settle for a 67-bit data word. Are you mad?

Solution 3

a. 210 = 1,024 instructions

b. 27 = 128 registers
c. ±217-1 = ±216 = ±64K
d. 10 x 3 x 7 + 17 = 48 bits
e. Nothing, of course. Since there are so many registers, you can easily afford to give one up and keep it hardwired to 0 (like the MIPS). This gives you automatic access to the constant zero simply by using r0 as a register in an instruction.
f. Tricky! We could have a data word to match an instruction word; that is, 48 bits. The problem here is that if you permit byte operation, each word would hold 48/8 = 6 bytes. To step though a data structure would mean stepping 6 bytes at a time. Other computers have data words whose size is an integer power of 2 (8,16,32,64,128) which means that stepping from word-to-word means incrementing by 1,2,4,8, or 16) and the least significant address bits are x, x0, x00, x000, or x0000. We could use a 32-bit data word or a 64-bit data word to avoid the addressing problem. This would require a Harvard architecture since instruction and address words would be different.
g. Well why not? You’ve selected a 64-bit data word partially because memory is so cheap today and partially because the instruction format is 48 bits long which provides more room for manoeuvre than current 32-bit processors. You’ve also thrown in 3 extra bits. These three bits are not part of the data field, but are tags that define some aspect of the data word (you were influenced by the Itanium that has a 65-bit data word). There bits are:

NaN (not a number) This is the Itanium bit and can be used to indicate that, for some reason, the contents of this location are not valid. This is important because it can be used to short-circuit calculations. If the NaN bit is set in one of the variables, the whole result is invalid.

D (dirty) The dirty bit is borrowed from the cache memory world. It indicates whether the contents of this register are valid or not; that is, whether the register has been used. This bit could be used to decide whether data is to be saved.
P (parity) A parity bit could be used for error detection. If the parity bit is faulty, it implies that the register contents have changed (due to an error).

Question 4

You are designing a dedicated 16-bit processor. Simulation shows that you require 8 registers with a 2-register instruction format. Most literal operations require only an unsigned constant in the range 0 to 15. If the total number of instruction is 50:

a. is such a processor possible?

b. what is the largest literal that can be loaded from memory?

c. Can’t you improve your answer to part b?

Solution 4

Using 8 registers requires 3 register-select bits; that is, 6 bits to define two registers. A constant in the range 0 to 15 requires 4 bits. The total number of bits is 6 + 4 = 10, which leaves 16 – 10 – 6 bits for the op-code. This provides 26 = 64 op-codes which is more than the minimum of 50. Consequently the arrangement is feasible.

If the op-code takes 6 bits and a register is specified by 3 bits, then a load register instruction requires 9 bits leaving 16 – 9 = 7 bits for a literal. This gives a range of 0 to 127.
OK, OK. We said that there were 50 op-codes and we could specify 64. This means we have 14 unused op-codes. We could define LOAD r0 as an op-code, LOAD r1 as an op-code, and so on. That is, we move the register select bits into the op-code filed. That provides a constant of 16 – 6 = 10 bits (0 to 1,023).

Question 5

You have a really bright idea. Register-to-register operations require 3n bits to specify registers, where n in the number of bits required to specify one register. You design a system that has 64 registers and you can implement operations like ADD r1,r2,r3. Unfortunately, where you redo your sums you find that you are a single bit short. You complete your design and you shave off one bit by having a instruction that can specify 64 destination registers, 64 source 1 registers, and 32 source 2 registers. You are fired and lose your job. Is there any way in which you can defend your decision?

Solution 5

The purpose of registers is to reduce processor-memory traffic. Registers hold temporary variables and constants and intermediate values formed in calculations. Because the second source register must be r0 to r31 rather than r0 to r63, you can keep this half the register array for frequently accessed constants. Unlike with windowing, you still have access to all register elements. The only restriction is that you can’t simultaneously use two source operands in the range r32 to r63.