ACM DL

ACM Transactions on

Reconfigurable Technology and Systems (TRETS)

Menu
Latest Articles

FeatherNet: An Accelerated Convolutional Neural Network Design for Resource-constrained FPGAs

Convolutional Neural Network (ConvNet or CNN) algorithms are characterized by a large number of model parameters and high computational complexity. These two requirements have made it challenging for implementations on resource-limited FPGAs. The challenges are magnified when considering designs for low-end FPGAs. While previous work has... (more)

Fast Adjustable NPN Classification Using Generalized Symmetries

NPN classification of Boolean functions is a powerful technique used in many logic synthesis and technology mapping tools in both standard cell and... (more)

NEWS

New Editor-in-Chief:

TRETS welcomes Deming Chen as its new Editor-in-Chief for the term March 1, 2019, to February 28, 2022. Deming is a Professor in the Electrical and Computer Engineering Department at University of Illinois at Urbana-Champaign as well as a Research Professor in the Coordinated Science Laboratory and an Affiliate Professor in the CS Department.

2017 TRETS Best Paper Award Winner:

We are pleased to announce the winner of the 2017 TRETS Best Paper:

Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs

by Nachiket Kapre and Jan Gray

------------------------------------------

New Page Limit:

Effective immediately, the page limit for TRETS submissions has increased to 32 pages.

-----------------------------------------------

 

Forthcoming Articles
A Novel FPGA Implementation of a Time-to-Digital Converter Supporting Run-Time Estimation and Compensation

Time-to-digital converters (TDCs) are widely used in applications that require the measurement of the time interval between events. In previous designs using a feedback loop and an extended delay line, process-voltage-temperature (PVT) variation often decreases the accuracy of measurements. To overcome the loss of accuracy caused by PVT variation, this study proposes a novel design of a synthesizable TDC that employs run-time estimation and compensation of PVT variation. A delay line consisting of a series of buffers is used to detect the period of a ring oscillator designed to measure the time interval between two events. By comparing the detected period and the system clock, the variation of the oscillation period is compensated at run-time. The proposed TDC is successfully implemented by using a low-cost Xilinx Spartan-6 LX9 FPGA with a 50-MHz oscillator. Experimental results show that the proposed TDC is robust to PVT variation with a resolution of 19.1ps. In comparison with previous design the proposed TDC achieves about five times better trade-off in the area, resolution, and frequency of the reference clock.

Exact and Practical Modulo Scheduling for High-level Synthesis

Loop pipelining is an essential technique in high-level synthesis (HLS) to increase the throughput and resource utilisation of FPGA-based accelerators. It relies on modulo schedulers to compute an operator schedule that allows subsequent loop iterations to overlap partially when executed, while still honouring all precedence and resource constraints. Modulo schedulers face a bi-criteria problem: minimise the initiation interval (II), i.e. the number of time steps after which new iterations are started, and minimise the schedule length. We present Moovac, a novel exact formulation that models all aspects (including the II minimisation) of the modulo scheduling problem as a single integer linear program (ILP), and discuss simple measures to prevent excessive runtimes, to challenge the old preconception that exact modulo scheduling is impractical. We substantiate this claim by conducting an experimental study covering 188 loops from two established HLS benchmark suites, four different time limits, and three bounds for the schedule length, to compare our approach against a highly-tuned exact formulation and a state-of-the-art heuristic algorithm. In the fastest configuration, an accumulated runtime of under 35 minutes is spent on scheduling all loops, and proven optimal IIs are found for 175 test instances.

Mitigating Electrical-Level Attacks towards Secure Multi-Tenant FPGAs in the Cloud

A rising trend is the use of multi-tenant FPGAs, particularly in cloud environments, where partial access to the hardware is given to multiple 3rd parties. This leads to new types of attacks in FPGAs, which operate not only on the logic level, but also on the electrical level through the common power delivery network. Since FPGAs are configured from the software-side, which enables attackers to launch hardware attacks from software, impacting the security of an entire system. In this paper, we show the first attempt of a countermeasure against attacks on the electrical level, which is based on a bitstream checking methodology. Bitstreams are translated back into flat technology mapped netlists, which are then checked for properties that indicate potential malicious runtime behaviour of FPGA logic. Our approach can provide a metric of potential risk of the FPGA bitstream being used in active fault or passive side-channel attacks against other users of the FPGA fabric or the entire SoC platform.

Leakier Wires: Exploiting FPGA Long Wires for Covert- and Side-Channel Attacks

In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all sub-circuits share common on-chip infrastructure. This presents an attack vector for all FPGAs that contain designs from multiple vendors: hardware imperfections can be used to infer high-level state and break security guarantees. In this paper, we show that "long" routing wires present a new source of information leakage on FPGAs, by influencing the delay of adjacent long wires. We show that the effect is measurable for both static and dynamic signals, and that it can be detected using small on-board circuits. We characterize the channel in detail and show that it is measurable even when multiple competing circuits (including multiple long-wire transmitters) are present and can be replicated on different generations and families of Xilinx devices (Virtex 5, Virtex 6, Artix 7, and Spartan 7). We exploit the leakage to create a convert channel with 6kbps bandwidth and 99.9% accuracy, and a side channel which can recover signals kept constant for only 128 cycles, with an accuracy of more than 98.4%. Finally, we propose countermeasures to reduce the impact of this information leakage.

Automata Processing in Reconfigurable Architectures: In-the-cloud Deployment, Cross-platform Evaluation and Fast Symbol-only Reconfiguration

We present a general automata processing framework on FPGAs, which can generate an RTL kernel for automata processing together with an AXI and PCIe based I/O circuitry. We implement the framework on both local nodes and cloud platforms (Amazon AWS and Nimbix) with novel features. A full performance comparison of the framework is conducted against state-of-the-art automata processing engines on CPUs, GPUs, and Micron`s Automata Processor using the ANMLZoo benchmark suite and some real-world datasets. Results show that FPGAs enable extremely high-throughput automata processing compared to von Neumann architectures. We also collect the resource utilization and power consumption on the two cloud platforms, and find that the I/O circuitry consumes most of the hardware resources and power. Furthermore, we propose a fast, symbol-only reconfiguration mechanism based on the framework for large pattern sets that cannot fit on one device and need to be partitioned. The proposed method supports multiple passes of the input stream and reduces the re-compilation cost from hours to seconds.

Editorial: A Message From the New Editor-in-Chief

All ACM Journals | See Full Journal Index

Search TRETS
enter search term and/or author name