CUDA Papers

A collection of research papers and projects utilizing CUDA technology

Category Archives: Sorting

Efficient Parallel Scan Algorithms for GPUs Abstract Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the flatten- ing transform, which allows for nested data-parallel programs to be compiled into flat data-parallel languages. In this paper, we describe the design of efficient scan […]

Designing Efficient Sorting Algorithms for Manycore GPUs Abstract We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 […]