As growing power dissipation and thermal effectsdisrupted the rising clock frequency trend and threatened toannul Moore’s law, the computing industry has switched its routeto higher performance through parallel processing. The rise ofmulti-core systems in all domains of computing has opened thedoor to heterogeneous multi-processors, where processors ofdifferent compute characteristics can be combined to effectivelyboost the performance per watt of different application kernels.GPUs and FPGAs are becoming very popular in PC-basedheterogeneous systems for speeding up compute intensive kernelsof scientific, imaging and simulation applications. GPUs canexecute hundreds of concurrent threads, while FPGAs providecustomized concurrency for highly parallel kernels. However,exploiting the parallelism available in these applications iscurrently not a push-button task. Often the programmer has toexpose the application’s fine and coarse grained parallelism byusing special APIs. CUDA is such a parallel-computing API thatis driven by the GPU industry and is gaining significantpopularity. In this work, we adapt the CUDA programmingmodel into a new FPGA design flow called FCUDA, whichefficiently maps the coarse and fine grained parallelism exposedin CUDA onto the reconfigurable fabric. Our CUDA-to-FPGAflow employs AutoPilot, an advanced high-level synthesis toolwhich enables high-abstraction FPGA programming. FCUDA isbased on a source-to-source compilation that transforms theSPMD CUDA thread blocks into parallel C code for AutoPilot.We describe the details of our CUDA-to-FPGA flow anddemonstrate the highly competitive performance of the resultingcustomized FPGA multi-core accelerators. To the best of ourknowledge, this is the first CUDA-to-FPGA flow to demonstratethe applicability and potential advantage of using the CUDAprogramming model for high-performance computing in FPGAs.
Alexandros Papakonstantinou, Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign
Karthik Gururaj, Computer Science Dept., University of California, Los-Angeles
John A. Stratton, Deming Chen, Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign
Jason Cong, Computer Science Dept., University of California, Los-Angeles
Wen-Mei W. Hwu, Electrical & Computer Eng. Dept., University of Illinois, Urbana-Champaign