Performance – Some Examples
Example 1
A computer executes 2.5 x 1010 instructions in 12 seconds. All instructions except loads take 1 cycle, and loads take 4 cycles. The clock rate is 3,000 MHz. What fraction of the instruction workload consists of register loads?
Solution 1
We need to find the average number of cycles per instruction (which must be greater than 1) and then work out what fraction of loads are needed to achieve this figure.
In 12 seconds 2.5 x 1010 instructions are executed. This period corresponds to 12 x 3,000 x 106 = 3.6 x 1010 cycles.
Consequently, an instruction is executed in 3.6 x 1010/2.5 x 1010 = 1.44 cycles/instruction.
If the fraction of load instructions is f, the fraction of non-
If 3f + 1 = 1.44, then 3f = 0.44 and f = 0.44/3 = 0.137. That is, 13.7% of instructions are load operations.
Example 2
Consider the following loop:
int sum = 0, int x[64]
for (j = 0; j < max, j++){
sum+= x[j]/max;
}
This can be encoded in an ARM-
ADR r0,x ; r0 points to array X
MOV r1,#max ; r1 contains max
MOV r2,#0 ; r2 is loop variable j initialized to 0
MOV r3,#0 ; r3 is sum initialized to 0
Next LDR r4,[r0] ; get x[j]
DIV r4,r4,#max ; calculate r3/max (not ARM code)
ADD r3,r4,r4 ; add new element to running total
ADD r2,r2,#1 ; increment loop variable j
ADD r0,r0,#4 ; point to next element in X
CMP r2,r1 ; test for end of loop
BNE Next ; repeat until all done
Note that the integers of array X are expressed as 32-
ADR 2 CPI
LDR 4 CPI
MOV, ADD,CMP, BNE 1 CPI
DIV 15 CPI
Solution 2
The average CPI is 29/11 = 2.64.
[(2 + 1 + 1 + 1 ) + 50(4 + 15 + 1 + 1 + 1 + 1 + 1)]/(4 + 50 x 7) = 1205/354 = 3.40
ADR r0,x ; r0 points to array X
ADD r1,r0,#4*max ; r1 contains base + 4 x max (multiplication by 4 at compile time)
MOV r3,#0 ; r3 is sum initialized to 0
Next LDR r4,[r0] ; get x[j]
ADD r3,r4,r4 ; add new element to running total
ADD r0,r0,#4 ; point to next element in X
CMP r0,r1 ; test for end of loop
BNE Next ; repeat until all done
DIV r3,r3,#max ; calculate sum/max (take the division out of the loop)