CUDA Papers

A collection of research papers and projects utilizing CUDA technology

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architecture


Reduction is a common component of many applications,but can often be the limiting factor for parallelization.Previous reduction work has focused on detecting reductionidioms and parallelizing the reduction operationby minimizing data communications or exploiting moredata locality. While these techniques can be useful, theyare mostly limited to simple code structures. In this paper,we propose a method for exploiting more parallelism byisolating the reduction from users of the intermediate results.The other main contribution of our work is enablingthe parallelization of more complex reduction codes, includingthose that involve the use of intermediate reductionresults. The proposed transformations are often implementedby programmers in an ad-hoc manner, but tothe best of our knowledge no previous work has been proposedto automate these transformations for many-corearchitectures. We show that the automatic transformationscan result in significant speedup compared to the originalcode using two benchmark applications.


Xiao-Long Wu, Nady Obeid, Wen-Mei Hwu, University of Illinois at Urbana-Champaign

Leave a Reply