CUDA Papers

A collection of research papers and projects utilizing CUDA technology

Program Optimization Strategies for Data-Parallel Many-Core Processors

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/report/phd-thesis-shane-ryoo.pdf

Abstract

Program optimization for highly parallel systems has historically been consideredÂ an art, with experts doing much of the performance tuning by hand. With theÂ introduction of inexpensive, single-chip, massively parallel platforms, more developersÂ will be creating highly data-parallel applications for these platforms whileÂ lacking the substantial experience and knowledge needed to maximize applicationÂ performance. In addition, hand-optimization even by motivated and informedÂ developers takes aÂ significantÂ amount of time and generally still underutilizes theÂ performance of the hardware by double-digit percentages. This creates a need forÂ structured and automatable optimization techniques that are capable ofÂ findingÂ a near-optimal programÂ configurationÂ for this new class of architecture.Â My work discusses various strategies for optimizing programs on a highly dataparallelÂ architecture with fine-grained sharing of resources. IÂ first investigateÂ useful strategies in optimizing a suite of applications. I then introduce programÂ optimization carving, an approach that discovers high-performance applicationÂ configurations for data-parallel, many-core architectures. Instead of applying aÂ particular phase ordering of optimizations, it starts with an optimization space ofÂ major transformations and then reduces the space by examining the static codeÂ and pruning configurations that do not maximize desirable qualities in isolationÂ or combination. Careful selection of pruning criteria for applications running onÂ the NVIDIA GeForce 8800 GTX reduces the optimization space by as much asÂ 98% whileÂ finding configurations within 1% of the best performance. RandomÂ sampling, in contrast, can require nearlyÂ five times as many configurations toÂ findÂ performance within 10% of the best. I also examine the technique’s effectivenessÂ when varying pruning criteria.

Authors

Shane Ryoo, University of Illinois at Urbana-Champaign

GPU Optimization

← GPU Acceleration of Cutoff Pair Potential for Molecular Modeling Applications Fast BVH construction on GPUs →

You must be logged in to post a comment.

CUDA Papers

Program Optimization Strategies for Data-Parallel Many-Core Processors

Abstract

Authors

Leave a Reply

Categories

Archives

CUDA Papers

Program Optimization Strategies for Data-Parallel Many-Core Processors

Abstract

Authors

Leave a Reply

Categories

Archives

Tags