ACM Transactions on

Reconfigurable Technology and Systems (TRETS)

Latest Articles

Leakier Wires: Exploiting FPGA Long Wires for Covert- and Side-channel Attacks

In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all sub-circuits share common on-chip infrastructure, such as routing resources. This presents an attack vector for all FPGAs that contain designs from multiple vendors, especially for FPGAs... (more)

Mitigating Electrical-level Attacks towards Secure Multi-Tenant FPGAs in the Cloud

A rising trend is the use of multi-tenant FPGAs, particularly in cloud environments, where partial access to the hardware is given to multiple third... (more)

A Protection and Pay-per-use Licensing Scheme for On-cloud FPGA Circuit IPs

Using security primitives, a novel scheme for licensing hardware intellectual properties (HWIPs) on Field Programmable Gate Arrays (FPGAs) in public... (more)

Recent Attacks and Defenses on FPGA-based Systems

Field-programmable gate array (FPGA) is a kind of programmable chip that is widely used in many areas, including automotive electronics, medical... (more)

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data... (more)

Novel Congestion-estimation and Routability-prediction Methods based on Machine Learning for Modern FPGAs

Effectively estimating and managing congestion during placement can save substantial placement and... (more)


Editorial: A Message from the new Editor-in-Chief


2018 TRETS Best Paper Award Winner:

We are pleased to announce the winner of the 2018 TRETS Best Paper:

General-Purpose Computing with Soft GPUs on FPGAs

Muhammed Al Kadi, Benedikt Janssen, Jones Yudi, and Michael Huebner.


New Editor-in-Chief:

TRETS welcomes Deming Chen as its new Editor-in-Chief for the term March 1, 2019, to February 28, 2022. Deming is a Professor in the Electrical and Computer Engineering Department at University of Illinois at Urbana-Champaign as well as a Research Professor in the Coordinated Science Laboratory and an Affiliate Professor in the CS Department.


New Page Limit:

Effective immediately, the page limit for TRETS submissions has increased to 32 pages.



Forthcoming Articles
GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Due to the irregular nature of connections in most graph datasets, partitioning graph algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic. Previous research has shown that FPGAs can outcompete software-based graph processing in shared memory contexts, but it remains an open question if this advantage can be maintained in distributed systems. In this work, we present GraVF-M, a framework designed to ease the implementation of FPGA-based graph processing accelerators for multi-FPGA platforms with distributed memory. Based on a lightweight description of the algorithm kernel, the framework automatically generates optimized RTL code for the whole multi-FPGA design. We exploit an aspect of the programming model to present a familiar message-passing paradigm to the user, while under the hood implementing a more efficient architecture that can reduce the necessary inter-FPGA network traffic by a factor equal to the average degree of the input graph. A performance model based on a theoretical analysis of the factors influencing performance serves to evaluate the efficiency of our implementation. With a throughput of up to 5.8 GTEPS on a 4-FPGA system, the designs generated by GraVF-M compare favorably to state-of-the-art frameworks from the literature.

The FPOA, A Medium-Grained Reconfigurable Architecture for High-Level Synthesis

In this paper we present a novel type of medium-grained reconfigurable architecture that we term the Field Programmable Operation Array (FPOA). This device has been designed specifically for the implementation of HLS-generated circuitry. At the core of the FPOA is the OP-block. Unlike a standard LUT, an OP-block performs multi-bit operations through gate-based logic structures, translating into greater speed and efficiency in digital circuit implementation. Our device is not optimized for a specific application domain. Rather, we have created a device that is optimized for a specific circuit structure, namely those generated by HLS. This gives the FPOA a significant advantage as it can be used across all application domains. In this work, we add support for both distributed and block memory to the FPOA architecture. Experiment results show up to a 13.5x reduction in logic area and a 9.5x reduction in critical path delay for circuit implementation using the FPOA compared to a standard FPGA.

FRoC 2.0: Automatic BRAM and Logic Testing to Enable Dynamic Voltage Scaling for FPGA Applications

In earlier technology nodes FPGAs had low power consumption compared to other compute chips such as CPUs and GPUs. However, in the 14-nm technology node FPGAs are consuming unprecedented power in the 100+ Watt range, making power consumption a pressing concern. To reduce FPGA power consumption, researchers have proposed deploying dynamic voltage scaling. While the previously proposed solutions show promising results, they are restricted to applications that only use the soft logic part of the FPGA. In this work, we present the first DVS solution that is able to fully handle FPGA applications that use BRAMs. Our solution not only robustly tests the soft logic component of the application but also tests all components connected to the BRAMs. We extend a previously proposed CAD tool, FRoC, to automatically generate calibration bitstreams that are used to measure the application's critical paths delay on silicon. The calibration bitstreams also include testers that ensure all used SRAM cells operate safely while scaling Vdd. We experimentally show that using our DVS solution, we are able to save 32% of the total power consumed by a discrete Fourier transform application running with a fixed Vdd and clocked at the Fmax reported by static timing analysis.

Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs

Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in Machine Learning often require a cluster of accelerators to make a relevant difference in performance. In this paper, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The paper explores several levels of the problem: 1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs; 2) the data partitioning and distribution strategies maximizing performance; 3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.

A Design Flow Engine for the Support of Customized Dynamic High Level Synthesis Flows

High Level Synthesis is a set of methodologies aimed at generating hardware descriptions starting from specifications written in high level languages. While these methodologies share different elements with traditional compilation flows, there are characteristics of the addressed problem which require ad hoc management. In particular, differently from most of the traditional compilation flows, the complexity and the execution time of the High Level Synthesis techniques are much less relevant than the quality of the produced results. For this reason, fixed point analyses as well as successive refinement optimizations can be accepted, provided that they can improve the quality of the generated designs. This paper presents the design flow engine developed for Bambu, an open source High Level Synthesis tool based on GNU GCC. The engine allows the execution of complex and customized synthesis flows and supports: dynamic addition of passes, cyclic dependences, selective pass invalidation and skipping. Experimental results show the benefits of such type of design flows with respect to static linear design flows when applied to High Level Synthesis

Introduction to the Special Issue on Security in FPGA-Accelerated Cloud and Datacenters

All ACM Journals | See Full Journal Index

Search TRETS
enter search term and/or author name