Time-to-digital converters (TDCs) are widely used in applications that require the measurement of the time interval between events. In previous designs using a feedback loop and an extended delay line, process-voltage-temperature (PVT) variation often decreases the accuracy of measurements. To overcome the loss of accuracy caused by PVT variation, this study proposes a novel design of a synthesizable TDC that employs run-time estimation and compensation of PVT variation. A delay line consisting of a series of buffers is used to detect the period of a ring oscillator designed to measure the time interval between two events. By comparing the detected period and the system clock, the variation of the oscillation period is compensated at run-time. The proposed TDC is successfully implemented by using a low-cost Xilinx Spartan-6 LX9 FPGA with a 50-MHz oscillator. Experimental results show that the proposed TDC is robust to PVT variation with a resolution of 19.1ps. In comparison with previous design the proposed TDC achieves about five times better trade-off in the area, resolution, and frequency of the reference clock.
Loop pipelining is an essential technique in high-level synthesis (HLS) to increase the throughput and resource utilisation of FPGA-based accelerators. It relies on modulo schedulers to compute an operator schedule that allows subsequent loop iterations to overlap partially when executed, while still honouring all precedence and resource constraints. Modulo schedulers face a bi-criteria problem: minimise the initiation interval (II), i.e. the number of time steps after which new iterations are started, and minimise the schedule length. We present Moovac, a novel exact formulation that models all aspects (including the II minimisation) of the modulo scheduling problem as a single integer linear program (ILP), and discuss simple measures to prevent excessive runtimes, to challenge the old preconception that exact modulo scheduling is impractical. We substantiate this claim by conducting an experimental study covering 188 loops from two established HLS benchmark suites, four different time limits, and three bounds for the schedule length, to compare our approach against a highly-tuned exact formulation and a state-of-the-art heuristic algorithm. In the fastest configuration, an accumulated runtime of under 35 minutes is spent on scheduling all loops, and proven optimal IIs are found for 175 test instances.
A rising trend is the use of multi-tenant FPGAs, particularly in cloud environments, where partial access to the hardware is given to multiple 3rd parties. This leads to new types of attacks in FPGAs, which operate not only on the logic level, but also on the electrical level through the common power delivery network. Since FPGAs are configured from the software-side, which enables attackers to launch hardware attacks from software, impacting the security of an entire system. In this paper, we show the first attempt of a countermeasure against attacks on the electrical level, which is based on a bitstream checking methodology. Bitstreams are translated back into flat technology mapped netlists, which are then checked for properties that indicate potential malicious runtime behaviour of FPGA logic. Our approach can provide a metric of potential risk of the FPGA bitstream being used in active fault or passive side-channel attacks against other users of the FPGA fabric or the entire SoC platform.
In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all sub-circuits share common on-chip infrastructure. This presents an attack vector for all FPGAs that contain designs from multiple vendors: hardware imperfections can be used to infer high-level state and break security guarantees. In this paper, we show that "long" routing wires present a new source of information leakage on FPGAs, by influencing the delay of adjacent long wires. We show that the effect is measurable for both static and dynamic signals, and that it can be detected using small on-board circuits. We characterize the channel in detail and show that it is measurable even when multiple competing circuits (including multiple long-wire transmitters) are present and can be replicated on different generations and families of Xilinx devices (Virtex 5, Virtex 6, Artix 7, and Spartan 7). We exploit the leakage to create a convert channel with 6kbps bandwidth and 99.9% accuracy, and a side channel which can recover signals kept constant for only 128 cycles, with an accuracy of more than 98.4%. Finally, we propose countermeasures to reduce the impact of this information leakage.
We present a general automata processing framework on FPGAs, which can generate an RTL kernel for automata processing together with an AXI and PCIe based I/O circuitry. We implement the framework on both local nodes and cloud platforms (Amazon AWS and Nimbix) with novel features. A full performance comparison of the framework is conducted against state-of-the-art automata processing engines on CPUs, GPUs, and Micron`s Automata Processor using the ANMLZoo benchmark suite and some real-world datasets. Results show that FPGAs enable extremely high-throughput automata processing compared to von Neumann architectures. We also collect the resource utilization and power consumption on the two cloud platforms, and find that the I/O circuitry consumes most of the hardware resources and power. Furthermore, we propose a fast, symbol-only reconfiguration mechanism based on the framework for large pattern sets that cannot fit on one device and need to be partitioned. The proposed method supports multiple passes of the input stream and reduces the re-compilation cost from hours to seconds.