Source | Optimization | % of Cycles | # of Cycles | Performance Improvement |
fir0a | None | 64.34 | 27230.60 | -- |
fir8a | Separated Inner Loop into Extension Instruction | 25.15 | 5065.00 | >5x |
fir8b | Shuffled Extension Instruction Schedule | 17.46 | 3189.00 | 8x |
fir8c | Manually Unrolled Loops | 15.24 | 2711.40 | >10x |
fir16 | Used 16 Multipliers | 6.96 | 1128 | >24x |
fir32 | Used 32 Multipliers | 4.36 | 687 | >39x |
Table 1: Performance improvement of executing function over straight C code.