ACM Transactions on

Reconfigurable Technology and Systems (TRETS)

Latest Articles

Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs

Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given... (more)

The FPOA, a Medium-grained Reconfigurable Architecture for High-level Synthesis

In this article, we present a novel type of medium-grained reconfigurable architecture that we term the Field Programmable Operation Array (FPOA).... (more)

A Design Flow Engine for the Support of Customized Dynamic High Level Synthesis Flows

High Level Synthesis is a set of methodologies aimed at generating hardware descriptions starting from specifications written in high-level languages.... (more)

FRoC 2.0: Automatic BRAM and Logic Testing to Enable Dynamic Voltage Scaling for FPGA Applications

In earlier technology nodes, FPGAs had low power consumption compared to other compute chips such as CPUs and GPUs. However, in the 14nm technology node, FPGAs are consuming unprecedented power in the 100+W range, making power consumption a pressing concern. To reduce FPGA power consumption, several researchers have proposed deploying dynamic... (more)

Unrolling Ternary Neural Networks

The computational complexity of neural networks for large-scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This article demonstrates, for the case where the neural network... (more)


Editorial: A Message from the new Editor-in-Chief


2018 TRETS Best Paper Award Winner:

We are pleased to announce the winner of the 2018 TRETS Best Paper:

General-Purpose Computing with Soft GPUs on FPGAs

Muhammed Al Kadi, Benedikt Janssen, Jones Yudi, and Michael Huebner.


New Editor-in-Chief:

TRETS welcomes Deming Chen as its new Editor-in-Chief for the term March 1, 2019, to February 28, 2022. Deming is a Professor in the Electrical and Computer Engineering Department at University of Illinois at Urbana-Champaign as well as a Research Professor in the Coordinated Science Laboratory and an Affiliate Professor in the CS Department.


New Page Limit:

Effective immediately, the page limit for TRETS submissions has increased to 32 pages.



Forthcoming Articles
GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Due to the irregular nature of connections in most graph datasets, partitioning graph algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems. In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8 GTEPS on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature.

RAiSD-X: A Fast and Accurate FPGA System for the Detection of Positive Selection in Thousands of Genomes

Detecting traces of positive selection in genomes carries theoretical significance and has practical applications, from shedding light on the forces that drive adaptive evolution to the design of more effective drug treatments. The size of genomic datasets currently grows at an unprecedented pace, fueled by continuous advances in DNA sequencing technologies, leading to ever-increasing compute and memory requirements for meaningful genomic analyses. The majority of existing methods for positive selection detection either are not designed to handle whole genomes or scale poorly with the sample size; they inevitably resort to a run-time versus accuracy trade-off, raising an alarming concern for the feasibility of future large-scale scans. To this end, we present RAiSD-X, a high-performance system that relies on a decoupled access-execute processing paradigm for efficient FPGA acceleration, and couples a novel, to our knowledge, sliding-window algorithm for the recently introduced ? statistic with a mutation-driven hashing technique to rapidly detect patterns in the data. RAiSD-X achieves up to three orders of magnitude faster processing than widely used software implementations, and more importantly, it can exhaustively scan thousands of human chromosomes in minutes, yielding a scalable full-system solution for future studies of positive selection in species of flora and fauna.

A DSL-Based Hardware Generator in Scala for Fast Fourier Transforms and Sorting Networks

We present a hardware generator for computations with regular structure including the fast Fourier transform (FFT), sorting networks, and others. The input of the generator is a high-level description of the algorithm; the output is a token-based, synchronized design in the form of RTL-Verilog. Building on prior work, the generator uses several layers of domain-specific languages (DSLs) to represent and optimize at different levels of abstraction to produce a RAM- and area-efficient hardware implementation. Two of these layers and DSLs are novel. The first one allows the use and domain-specific optimization of state-of-the-art streaming permutations. The second DSL enables the automatic pipelining of a streaming hardware dataflow and the synchronization of its data-independent control signals. The generator including the DSLs are implemented in Scala, leveraging its type system, and uses concepts from lightweight modular staging (LMS) to handle the constraints of streaming hardware. Particularly, these concepts offer genericity over hardware number representation, including seamless switching between fixed-point arithmetic and FloPoCo generated IEEE floating-point operators, while ensuring type-safety. We show benchmarks of generated FFTs, sorting networks and Walsh-Hadamard transforms that outperform prior generators.

All ACM Journals | See Full Journal Index

Search TRETS
enter search term and/or author name