CUDA Papers

A collection of research papers and projects utilizing CUDA technology

CUDASA: Compute Unified Device and Systems Architecture

http://www.vis.uni-stuttgart.de/~dachsbcn/download/egpgv08.pdf

Abstract

We present an extension to the CUDA programming language which extends parallelism to multi-GPU systems and GPU-cluster environments. Following the existing model, which exposes the internal parallelism of GPUs, our extended programming language provides a consistent development interface for additional, higher levels of parallel abstraction from the bus and network interconnects. The newly introduced layers provide the key features specific to the architecture and programmability of current graphics hardware while the underlying communication and scheduling mechanisms are completely hidden from the user. All extensions to the original programming language are handled by a self-contained compiler which is easily embedded into the CUDA compile process. We evaluate our system using two different sample applications and discuss scaling behavior and performance on different system architectures

Authors

M. Strengert, C. Müller, C. Dachsbacher, and T. Ertl
Visualization Research Center (VISUS), University of Stuttgart

Accelerating Monte Carlo Simulations with an NVIDIA Graphics Processor

http://researchcommons.waikato.ac.nz/bitstream/10289/2682/1/Kunnemeyer%20Accelerating.pdf

Abstract

Modern graphics cards, commonly used in desktop computers, have evolved beyond asimple interface between processor and display to incorporate sophisticated calculationengines that can be applied to general purpose computing. The Monte Carlo algorithm formodelling photon transport in turbid media has been implemented on an NVIDIA®8800GT graphics card using the CUDA toolkit. The Monte Carlo method relies onfollowing the trajectory of millions of photons through the sample, often taking hours ordays to complete. The graphics-processor implementation, processing roughly 110 millionscattering events per second, was found to run more than 70 times faster than a similar,single-threaded implementation on a 2.67 GHz desktop computer.

Authors

Paul Martinsen, The Plant and Food Research Institute of New Zealand, Hamilton, New Zealand

Johannes Blaschke, Philipps-Universität Marburg, Hessen, Germany

Rainer Künnemeyer, The University of Waikato, Hamilton, New Zealand

Robert Jordan, The Plant and Food Research Institute of New Zealand, Hamilton, New Zealand