This work provides an evaluation on the accuracy of the minimum width transistor area models in ranking the actual layout area of FPGA architectures. Both the original VPR area model and the new COFFE area model are compared against the actual layouts with up to 3 metal layers for the various FPGA building blocks. We found that both models have significant variations with respect to the accuracy of their predictions across the building blocks. In particular, the original VPR model overestimates the layout area of larger buffers, full adders and multiplexers by as much as 38% while underestimate the layout area of smaller buffers and multiplexers by as much as 58% for an overall prediction error variation of 96%. The newer COFFE model also significantly overestimates the layout area of full adders by 13% and underestimates the layout area of multiplexers by a maximum of 60% for a prediction error variation of 73%. Such variations are particularly significant considering sensitivity analyses are not routinely performed in FPGA architectural studies. Our results suggest that such analyses are extremely important in studies that employing the minimum width area models.
To improve the computing performance in real-time applications, modern embedded platforms comprise hardware accelerators that speed up the tasks most compute-intensive parts. A recent trend in the design of real-time embedded systems is to integrate FPGA that are reconfigured with accelerators at runtime, to cope with timing constrained, dynamic workloads. One of the major limitations when dealing with partial FPGA reconfiguration in real-time systems is that the reconfiguration port can only perform one reconfiguration at a time, therefore it is possible to incur in scheduling problems like starvation and priority inversion. This paper shows how priority inversion and starvation can be solved by making the reconfiguration process preemptive, i.e., allowing it to be interrupted at any time and resumed at a later time without restarting it from scratch. Such a feature is crucial for the design of runtime reconfigurable real-time systems, but not yet available in today's platforms. Furthermore, it has been identified and analyzed the tradeoff of achieving a guaranteed bound on the reconfiguration delay for low-priority tasks and the maximum delay induced for high-priority tasks when preempting an ongoing reconfiguration. Experimental results on the Xilinx Zynq-7000 platform show that the proposed implementation of preemptive reconfiguration introduces a low runtime overhead, thus effectively solving priority inversion and starvation.
Introduction to the Special Section on FCCM 2016
This paper presents a thorough account of RIPL, a high level image processing Domain Specific Language for FPGAs. We motivate its design, based on algorithmic skeletons, with requirements from the image processing domain, showing that RIPL's skeletons suffice to elegantly describe algorithm examples. At its core, RIPL employs a dataflow intermediate representation. We give a precise account of the compilation scheme from RIPL to a subset of the Dataflow Process Network model that can be efficiently compiled to FPGA designs. The strengths of RIPL compared to state-of-the-art are expressivity, conciseness, speed and portability. RIPL is expressive compared to Darkroom, an image processing FPGA DSL and compared to the Xilinx OpenCV C++ HLS library, because RIPL supports user defined functions and non-linear image processing pipelines. RIPL is concise compared to equivalent dataflow implementations, RIPL is 111x shorter than an equivalent implementation in the CAL dataflow language for a visual saliency algorithm. RIPL is fast compared to equivalent software implementations, we report 8.3x faster runtimes for visual saliency. RIPL is portable, we demonstrate RIPL with a complete working camera architecture that is designed to be portable across any Zynq based system.