CUDA Papers

A collection of research papers and projects utilizing CUDA technology

Data Layout Transformation for Structured-Grid Codes on GPU

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/workshop/sung.pdf Abstract We present data layout transformation as an effectiveperformance optimization for memory-bound structuredgridapplications for GPUs. Structured grid applications are aclass of applications that compute grid cell values on a regular2D, 3D or higher dimensional regular grid. Each output pointis computed as a function of itself and its nearest neighbors.Stencil code is an instance of […]

GPU Optimization

An Adaptive Performance Modeling Tool for GPU Architectures

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/conference/sara.pdf Abstract This paper presents an analytical model to predict the performanceof general-purpose applications on a GPU architecture. The modelis designed to provide performance information to an auto-tuningcompiler and assist it in narrowing down the search to the morepromising implementations. It can also be incorporated into a toolto help programmers better assess the performance bottlenecks […]

GPU Optimization Analytical model, Multiple Data Stream Architectures, Parallel programming, Performance estimation

Accelerating Iterative Field-Compensated MR Image Reconstruction on GPUs

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/conference/isbi2010.pdf Abstract We propose a fast implementation for iterative MR image reconstruction using Graphics Processing Units (GPU). In MRI, iterative reconstruction with conjugate gradient algorithms allows for accurate modeling the physics of the imaging system. Specifically, methods have been reported to compensate for the magnetic field inhomogeneity induced by the susceptibility differences near the air/tissue […]

Image Reconstruction, MRI Conjugate Gradient, Field inhomogeneity, Iterative reconstruction, MRI

Multi-GPU Implementation for Iterative MR Image Reconstruction with Field Correction

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/conference/ismrm2010.pdf Abstract Many advanced MRI image acquisition and reconstruction methods see limited application due to high computational cost in MRI. For instance,iterative reconstruction algorithms (e.g. non-Cartesian k-space trajectory, or magnetic field inhomogeneity compensation) can improve image qualitybut suffer from low reconstruction speed. General-purpose computing on graphics processing units (GPU) have demonstrated significantperformance speedups and cost […]

Image Reconstruction, Medical, MRI, Multi-GPU Image Reconstruction

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architecture

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/conference/fgc2010.pdf Abstract Reduction is a common component of many applications,but can often be the limiting factor for parallelization.Previous reduction work has focused on detecting reductionidioms and parallelizing the reduction operationby minimizing data communications or exploiting moredata locality. While these techniques can be useful, theyare mostly limited to simple code structures. In this paper,we propose a […]

Algorithms, GPU Optimization, Reduction reduction

Sparse regularization in MRI iterative reconstruction using GPUs

Leave a Comment Posted by admin on November 24, 2010

http://impact.crhc.illinois.edu/ftp/conference/xiaolong-2010.pdf Abstract Regularization is a common technique used toimprove image quality in inverse problems such as MR imagereconstruction. In this work, we extend our previous GraphicsProcessing Unit (GPU) implementation of MR imagereconstruction with compensation for susceptibility-induced fieldinhomogeneity effects by incorporating an additional quadraticregularization term. Regularization techniques commonly imposethe prior information that MR images are relatively […]

Image Reconstruction, Medical, MRI Medical, MRI, Reconstruction

Benchmarking GPUs to Tune Dense Linear Algebra

Leave a Comment Posted by admin on November 23, 2010

http://portal.acm.org/ft_gateway.cfm?id=1413402&type=pdf&doid2=1413370.1413402 http://www2.computer.org/portal/c/document_library/get_file?folderId=97697&name=DLFE-3337.pdf Abstract We present performance results for dense linear algebra using the 8-series NVIDIA GPUs. Our GEMM routine runs 60% faster than the vendor implementation and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80-90% of the peak GEMM rate. Our parallel LU running on two GPUs […]

Algorithms, Linear Algebra Factorization, GEMM, Linear Algebra

High Performance Discrete Fourier Transforms on Graphics Processors

Leave a Comment Posted by admin on November 23, 2010

http://portal.acm.org/ft_gateway.cfm?id=1413373&type=pdf&doid2=1413370.1413373 http://www2.computer.org/portal/c/document_library/get_file?folderId=97697&name=DLFE-3346.pdf Abstract We present novel algorithms for computing Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory on GPUs using a Stockham formulation. We reduce the memory transpose overheads in hierarchical algorithms by combining the […]

FFT Discrete FFT

Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA

Leave a Comment Posted by admin on November 23, 2010

http://portal.acm.org/ft_gateway.cfm?id=1413376&type=pdf&doid2=1413370.1413376 http://www2.computer.org/portal/c/document_library/get_file?folderId=97697&name=DLFE-3317.pdf Abstract Most GPU performance “hypes” have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVidia CUDA, achieves nearly 80 GFLOPS on a top-end […]

FFT 3D, FFT

Designing Efï¬cient Sorting Algorithms for Manycore GPUs

Leave a Comment Posted by admin on November 23, 2010

http://mgarland.org/files/papers/nvr-2008-001.pdf Abstract We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 […]

Algorithms, Sorting Sorting

← Older posts

Newer posts →

CUDA Papers

Data Layout Transformation for Structured-Grid Codes on GPU

An Adaptive Performance Modeling Tool for GPU Architectures

Accelerating Iterative Field-Compensated MR Image Reconstruction on GPUs

Multi-GPU Implementation for Iterative MR Image Reconstruction with Field Correction

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architecture

Sparse regularization in MRI iterative reconstruction using GPUs

Benchmarking GPUs to Tune Dense Linear Algebra

High Performance Discrete Fourier Transforms on Graphics Processors

Bandwidth Intensive 3-D FFT kernel for GPUs using CUDA

Designing Efï¬cient Sorting Algorithms for Manycore GPUs

Categories

Archives

Categories

Archives

Tags