Although FPGAs have grown in capacity, FPGA-based soft processors have grown very little because of the difficulty of achieving higher performance in exchange for area. Superscalar out-of-order processors promise large performance gains, and the memory subsystem is a key part of such a processor that must help supply increased performance. In this paper we describe and explore microarchitectural and circuit-level trade-offs in the design of such a memory system. We show the significant instructions-per-cycle wins for providing various levels of out-of-order memory access, memory dependence speculation, and two-level cache design (up to 2.1x from the memory system alone). With careful microarchitecture and circuit design, we also achieve a L1 TLB and cache lookup with 29% less logic delay than the simpler Nios II/f memory system.
The list of significant papers from the first 25 years of the Field-Programmable Logic and Applications conference (FPL) is presented in this paper. These 27 papers represent those which have most strongly influenced theory and practice in the field.
We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlations on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix-computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user/platform constraints including runtime, power, and memory. To facilitate automation, SSketch takes advantage of a HW/SW co-design approach: It provides an API that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on various datasets with more than 11 million non-zeros demonstrate up to 200-fold speedup on our hardware-accelerated realization compared to a software-based deployment on a general-purpose processor.
Commercial Off-the-Shelf (COTS) FPGAs are becoming increasingly powerful. In addition to their huge hardware resources, they are also integrated into complete systems on chips (SOCs), e.g. in the latest Xilinx Zynq or Altera Stratix platforms. However, cooperation between FPGAs and their surroundings, and the flexibility of hardware task management could still be improved. For instance, mechanisms have yet to be automated to allow multi-user approaches. A reconfigurable resource can be shared between applications or users only if it has a context-switch ability allowing applications to be paused and resumed in response to system demands. Here, we present a High-Level Synthesis (HLS) design flow producing a context-switch-capable circuit. The design flow manipulates the intermediate representation of a HLS tool to build the context extraction mechanism and to optimize performance for the circuit produced. The method is based on efficient checkpoint selection and insertion of a powerful scan-chain into the initial circuit. This scan-chain can extract flip-flops or memory content. Experiments with the system produced show that it has a low hardware overhead for many benchmark applications, and that the hardware added has a negligible impact on application performance. Comparison with current standard methods highlights the efficiency of our contributions.