Generic parallel partitioning algorithms

Leonor Frias (leofm.sourceforge@gmail.com)

This project provides several implementations of optimal parallel partitioning algorithms in the number of key comparisons. They can be used as a replacement for STL partition.

Description

The partitioning of an array is a basic building block of many key algorithms, such as quicksort and quickselect. It is well-known that an array can be partitioned sequentially and in-place comparing each element exactly once, which is optimal. Besides, the partitioning of an array is typically used in the implementation of STL partition, nth_element and sort. STL stands for Standard Template Library and constitutes the algorithmic core of the C ++ standard library.

In the latest years, the hardware industry has decidedly shifted to parallel chips, due to the current practical impossibility of providing more processing power by further increasing clock-rates. In particular, most of today cheap laptop computers are multi-core (multi)processors, i.e., several independent processors that share the memory system. However, taking advantage of several processing units can only be done explicitly and writing parallel programs is involved and error-prone. Due to the limitations of automated parallelization, a popular alternative is providing parallel implementations of data structures libraries. For instance, the GNU compiler, from version 4.3 includes parallel implementations for several STL algorithms, in particular for STL partition, nth_element and sort. These are available through the so-called libstdc++ parallel mode.

The parallel partitioning algorithms implemented in the libstdc++ consist of a main parallel step and a cleanup step. This is also the case for other practical parallel partitioning algorithms for multi-core computers. All these algorithms might need to compare some elements again during cleanup in order to avoid too much synchronization during the main parallel step. Alternatively, the cleanup can be enhanced to achieve one key comparison per element.

We provide new implementations of several parallel partitioning algorithms with so an enhanced cleanup step and with the original cleanup. Specifically, the implementation corresponds to the piece of work in [1].

The properties regarding comparisons of the resulting enhanced parallel partitioning algorithms are of particular interest when the elements are strings, and partitioning is used repeatedly, as in quicksort and quickselect. See article [2] and the Sourceforge project https://sourceforge.net/projects/stringbsts/ for further details.

Downloads

Implementation at SourceForge.net: http://sourceforge.net/projects/parpartition/

HowTo

The implementation consists of a set of C ++ header files, and hence it can be used by simply including it from client code. Specifically, parallel_partition.h is the main header. The interface and behavior of the partitioning algorithms is the same as in STL partition, except that a new parameter is added to specify the number of threads. Besides, input iterators are required to provide random access. In particular, the interface is

template <class RandomAccessIterator, class Predicate>
RandomAccessIterator parallel_partition(RandomAccessIterator first, RandomAccessIterator last, Predicate pred, const int num_threads)

where parallel_partition should be replaced by one of the following:

STRIDED: (enhanced_)strided_partition
BLOCKED: (enhanced_)blocked_partition
F&A: (enhanced_)fetch_and_add_partition
F&A(MCSTL): enhanced_fetch_and_add_mcstl_partition

Specifically, the name on the left corresponds to the notation in [1]. Besides, the prefix enhanced is used to denote the implementation with the new cleanup step (otherwise, it denotes the implementation with the original cleanup).

Furthermore, the source code for performance tests (see [1] for more information on these tests) can be retrieved from the svn repository https://parpartition.svn.sourceforge.net/svnroot/parpartition provided by SourceForge.net. It can also be used as an usage example.

Publications

1: L. Frias and J. Petit.
Parallel partition revisited.
In Experimental Algorithms, 7th International Workshop, WEA 2008, Provincetown, MA, USA, May 30-June 1, 2008, Proceedings, volume 5038 of Lecture Notes in Computer Science, pages 142-153, Berlin, Heidelberg, 2008. Springer.
2: L. Frias and J. Petit.
Combining digital access and parallel partition for quicksort and quickselect.
In IWMSE '09: Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, pages 33-40, Washington, DC, USA, 2009. IEEE Computer Society.