An Adaptive Performance Modeling Tool for GPU Architectures
November 24, 2010
Posted by on
This paper presents an analytical model to predict the performanceof general-purpose applications on a GPU architecture. The modelis designed to provide performance information to an auto-tuningcompiler and assist it in narrowing down the search to the morepromising implementations. It can also be incorporated into a toolto help programmers better assess the performance bottlenecks intheir code. We analyze each GPU kernel and identify how the kernelexercises major GPU microarchitecture features. To identifythe performance bottlenecks accurately, we introduce an abstractinterpretation of a GPU kernel, work flow graph, based on whichwe estimate the execution time of a GPU kernel. We validated ourperformance model on the NVIDIA GPUs using CUDA (ComputeUnified Device Architecture). For this purpose, we used data parallelbenchmarks that stress different GPU microarchitecture eventssuch as uncoalesced memory accesses, scratch-pad memory bankconflicts, and control flow divergence, which must be accuratelymodeled but represent challenges to the analytical performancemodels. The proposed model captures full system complexity andshows high accuracy in predicting the performance trends of differentoptimized kernel implementations. We also describe our approachto extracting the performance model automatically from akernel code.
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, Wen-mei W. Hwu, University of Illinois at Urbana-Champaign