Effectively estimating and managing congestion during placement can save substantial placement and routing runtime. In this paper, we present a machine-learning model for accurately and efficiently estimating congestion during FPGA placement. Compared with the state-of-the-art machine-learning congestion-estimation model, our results show a 25% improvement in prediction accuracy. This makes our model competitive with congestion estimates produced using a global router. However, our model runs, on average, 291x faster than the global router. Overall, we are able to reduce placement runtimes by 17% and router runtimes by 19%. An additional machine- learning model is also presented that uses the output of the first congestion-estimation model to determine whether or not a placement is routable. This second model has an accuracy of in the range of 93% to 98% depending on the classification algorithm used to implement the learning model, and runtimes of a few milliseconds, thus making it suitable for inclusion in any placer with no worry of additional computational overhead.
A rising trend is the use of multi-tenant FPGAs, particularly in cloud environments, where partial access to the hardware is given to multiple 3rd parties. This leads to new types of attacks in FPGAs, which operate not only on the logic level, but also on the electrical level through the common power delivery network. Since FPGAs are configured from the software-side, which enables attackers to launch hardware attacks from software, impacting the security of an entire system. In this paper, we show the first attempt of a countermeasure against attacks on the electrical level, which is based on a bitstream checking methodology. Bitstreams are translated back into flat technology mapped netlists, which are then checked for properties that indicate potential malicious runtime behaviour of FPGA logic. Our approach can provide a metric of potential risk of the FPGA bitstream being used in active fault or passive side-channel attacks against other users of the FPGA fabric or the entire SoC platform.
In this paper we present a novel type of medium-grained reconfigurable architecture that we term the Field Programmable Operation Array (FPOA). This device has been designed specifically for the implementation of HLS-generated circuitry. At the core of the FPOA is the OP-block. Unlike a standard LUT, an OP-block performs multi-bit operations through gate-based logic structures, translating into greater speed and efficiency in digital circuit implementation. Our device is not optimized for a specific application domain. Rather, we have created a device that is optimized for a specific circuit structure, namely those generated by HLS. This gives the FPOA a significant advantage as it can be used across all application domains. In this work, we add support for both distributed and block memory to the FPOA architecture. Experiment results show up to a 13.5x reduction in logic area and a 9.5x reduction in critical path delay for circuit implementation using the FPOA compared to a standard FPGA.
In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all sub-circuits share common on-chip infrastructure. This presents an attack vector for all FPGAs that contain designs from multiple vendors: hardware imperfections can be used to infer high-level state and break security guarantees. In this paper, we show that "long" routing wires present a new source of information leakage on FPGAs, by influencing the delay of adjacent long wires. We show that the effect is measurable for both static and dynamic signals, and that it can be detected using small on-board circuits. We characterize the channel in detail and show that it is measurable even when multiple competing circuits (including multiple long-wire transmitters) are present and can be replicated on different generations and families of Xilinx devices (Virtex 5, Virtex 6, Artix 7, and Spartan 7). We exploit the leakage to create a convert channel with 6kbps bandwidth and 99.9% accuracy, and a side channel which can recover signals kept constant for only 128 cycles, with an accuracy of more than 98.4%. Finally, we propose countermeasures to reduce the impact of this information leakage.
Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing, previously utilized the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We show how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes 6-LUTs. The improved BISMO achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a Xilinx UltraScale+ MPSoC.
Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in Machine Learning often require a cluster of accelerators to make a relevant difference in performance. In this paper, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The paper explores several levels of the problem: 1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs; 2) the data partitioning and distribution strategies maximizing performance; 3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.
Using security primitives, a novel scheme for licensing hardware intellectual properties (HWIPs) on FPGAs in public clouds is proposed. The proposed scheme enforces a pay-per-use model, allows HWIP?s installation only on specific on-cloud FPGAs, and efficiently protects the HWIPs from being cloned, reverse engineered, or used without the owner?s authorization by any party including a cloud insider. It also provides protection for the users? designs integrated with the HWIP on the same FPGA. This enables cloud tenants to license HWIPs in the cloud from the HWIP vendors at a relatively low price based on usage instead of paying the expensive unlimited HWIP license fee. The scheme includes a protocol for FPGA authentication, HWIP secure decryption and usage by the clients without the need for the HWIP vendor to be involved or divulge their master keys. A complete prototype test-bed implementation showed that the proposed scheme is very feasible with relatively low resource utilization. Experiments also showed that a HWIP could be licensed and set up in the on-cloud FPGA in 0.9 s. This is 15 times faster than setting up the same HWIP from outside the cloud, which takes about 14 s based on the average global Internet speed.
With the increasing popularity of chips in various applications, Integrated Circuits (ICs) security issues not only cause huge economic losses for the whole country but also bring enormous threats to national defense security. Field-programmable gate array (FPGA) is a kind of programmable chip which is widely applied in various areas, such as automotive electronics, military, consumer electronics, medical service and so on. Unlike application specific integrated circuit (ASIC), an FPGA bitstream is essentially a binary bitfile which is more vulnerable to cloning, reverse engineering, bitstream replay attacks and so on. This survey reviews the security and trust issues related to FPGA-based systems from the market perspective. For each party involved in FPGA supply and demand, we show the security and trust problems they need to be aware of and the associated solutions as well.