Accelerating FPGA Developments from C to Bitstreams by Partial Reconfiguration

Loading...
Thumbnail Image

Embargo Date

Degree type

Doctor of Philosophy (PhD)

Graduate group

Electrical and Systems Engineering

Discipline

Electrical Engineering

Subject

Compile Time
FPGA
Latency Insensitive
Partial Reconfiguration
Streaming

Funder

Grant number

License

Copyright date

2023

Distributor

Related resources

Contributor

Abstract

Divide-and-Conquer and incremental compilation strategies are widely used in software compilations. The divide-and-Conquer means that separate source files are compiled independently by multi-threads to objectives, which are linked together to an executable-format file, while incremental compilation means that software tools only need to re-compile modified source files and quickly re-link the objectives. To enable these strategies for FPGAs, this dissertation presents an open-source framework called PRflow which can speed up the compilation times by an order of magnitude. PRflow supports different optimization levels to make better trade-offs among compile-time, area, and performance. -O0 (PRflow RISCV) maps applications to a cluster of on-chip RISC-V cores within seconds for quick verification and debugging. -O1 (PRflow) compiles the separate parts of an application to partial FPGA bitstreams for different partial reconfigurable regions on the chip. Separate parts can be compiled in parallel within 24 minutes. The interconnections between separate parts can be set up by sending configuration packets to configure a network-on-a-chip (NoC) without re-routing physical wires. -O2 (PRflow DW) supports inter-connection customization with a fixed page-size overlay on top of a commercial FPGA to meet high inter-page bandwidth requirements which can improve the performance by up to 10× compared with -O1. -O3 (PRflow HiPR) supports overlay customization for arbitrary inter-page throughput and various page size requirements with similar incremental compile time to -O1 and -O2. HiPR extracts the interconnect information among separate sub-functions and generates a customized overlay with PR regions defined. Users can perform quick incremental compilation for dedicated sub-functions at the cost of an acceptable one-time overlay compilation overhead. -O3 compiles applications with the most aggressive optimization strategies similar to commercial tools.We demonstrate the PRflow framework on the Xilinx Alveo-U50 data-center card with an xcu50-fsvh2104-2-e FPGA chip (16nm FinFET) by mapping Rosetta HLS complete benchmark set. PRflow can accelerate the compilation times from 2–3 hours (state-of-art Vitis) to 10-24 minutes. We expect PRflow based on PR technique to become an important compilation strategy as the increasing scales of FPGAs greatly slow down the compile times.

Date of degree

2023

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation