The starting point of the design is the original "golden" reference MATLAB model of the GSC. The next step is to define a fully parameterized fixed-point arithmetic model. This model is directly coupled to the original MATLAB model to maintain lockstep with this golden reference. There are two critical aspects for efficiency in this step:
- The ability to intuitively associate fixed-point parameters with variables in the MATLAB algorithm description
- The ability to quickly evaluate the effects of the fixed-point arithmetic on the overall performance of the algorithm
Defining a fully parameterized fixedpoint arithmetic model is an iterative process. In the case of the GSC with QRD-RLS, the numerical performance of the implicit matrix inversion operation is measured by the attenuation shown in the overall beampattern. We evaluated several input bit-widths with the intermediate variables sized to avoid overflows—the effect on the attenuation in the beampattern is shown in Figure 9.
(Click to enlarge)
9. GSC beampattern in fixed point
We selected a 16-bit implementation that achieves an almost ideal interference rejection, as shown in Figure 9.
Generation of Hardware Implementation
The final step in our methodology is to
generate the hardware implementation.
There are two critical aspects to achieve
efficiency:
- The ability to automatically generate an implementation that is bit-accurate against the fixed-point model of the DSP algorithm
- The ability to tailor the hardware architecture of the implementation to meet area/speed requirements
The generation of a suitable hardware
implementation is also done iteratively
to balance resource utilization
and speed of operation. For the QRD-RLS
algorithm, there are two points
where the area/speed of the implementation
can be affected:
- Controlling the degree of resource sharing of the givens_rotation function
- The rotation of row elements in the Givens rotation function can be achieved with different computation styles, including Newton-Raphson (using multipliers) and CORDIC (multiplier-less) microarchitectures
We performed iterations of this step exploring different combinations of resource utilization and speed of operation of the QRD-RLS function. The results of RTL synthesis, summarized in Figure 10, are for a Xilinx Virtex-4 XC4VSX55 target device.
Metric | Multiplier Parallel | Multiplier Sequential | CORDIC Parallel | CORDIC Sequential |
---|---|---|---|---|
LUTs | 5% | 4% | 10% | 9% |
DSP48s | 51 | 19 | 1 | 1 |
Sustainable Data Rate | 1.9 MSPS | 0.8 MSPS | 1.8 MSPS | 0.18 MSPS |
The results indicate a small decrease in resource utilization (LUTs) with sequential implementations. With a goal of maximum speed of operation and a minimum use of hardware multipliers, the CORDIC parallel implementation was picked for the place and route of the netlist from RTL synthesis. The results of the implementation using the Xilinx ISE software mapped to the same target device as during synthesis are shown in Figure 11.
Occupied Slices | 3076 (12%) |
DSP48s | 1 |
Sustainable Data Rate | 1.7 MSPS |
The hardware implementation results show that the QRD-RLS function can be implemented in 12% of the logic resources of a XC4VSX55 device with a sustainable data rate of 1.7 megasamples per second.
Conclusion
You can create an efficient hardware implementation of DSP algorithms in Xilinx FPGAs using matrix inversion operations with fixed-point arithmetic. The efficiency with which you can implement these algorithms is based on the use of the AccelDSP Synthesis tool to enable a high level of automation. The results show the effectiveness of this methodology in the implementation of a challenging algorithm in fixed-point arithmetic hardware.
To get more information about Xilinx DSP solutions, visit www.xilinx.com/dsp/.
About the authors
Ramon Uribe is a Sr. Principal IP Development Engineer at Xilinx. His focus is on the development of linear algebra intellectual property models for DSP applications. Prior to Xilinx, Ramon worked in design and development of DSP algorithms and applications at Bell Laboratories, Motorola and Tellabs. He received a BSEE degree from the University of Illinois at Chicago, and a MSEE from Stanford University. He can be reached at [email protected] .
Tom Cesear is a Principal Engineer with the DSP Engineering Group managing the North American IP development team at Xilinx. Prior to Xilinx, Tom was the Chief Scientist and Director of IP at AccelChip. He has more than 20 years of entrepreneurial experience in all phases of marketing, strategic business development, and high-speed DSP design at companies such as bit-tru, Centerpoint Broadband Technologies, Mentor Graphics, dQdt and Hughes Aircraft Company. Dr. Cesear holds a Ph.D. and M.S. in Electrical Engineering from UCSD, and a B.S. in Electrical Engineering from Purdue University. He can be reached at [email protected].