The partitioning of an array is a basic building block of many key algorithms, such as quicksort and quickselect. It is well-known that an array can be partitioned sequentially and in-place comparing each element exactly once, which is optimal.
Besides, the partitioning of an array is typically used in the implementation of STL partition, nth_element and sort. STL stands for
Standard Template Library and constitutes the algorithmic core of the C ++ standard library.
In the latest years, the hardware industry has decidedly shifted to parallel chips, due to the current practical impossibility of providing more processing power by further increasing clock-rates. In particular, most of today cheap laptop computers are multi-core (multi)processors, i.e., several independent processors that share the memory system. However, taking advantage of several processing units can only be done explicitly and writing parallel programs is involved and error-prone. Due to the limitations of automated parallelization, a popular alternative is providing parallel implementations of data structures libraries. For instance, the GNU compiler, from version 4.3 includes parallel implementations for several STL algorithms, in particular for STL partition, nth_element and sort. These are available through the so-called libstdc++ parallel mode.
The parallel partitioning algorithms implemented in the libstdc++ consist of a main parallel step and a cleanup step. This is also the case for other practical parallel partitioning algorithms for multi-core computers. All these algorithms might need to compare some elements again during cleanup in order to avoid too much synchronization during the main parallel step. Alternatively, the cleanup can be enhanced to achieve one key comparison per element.
We provide new implementations of several parallel partitioning algorithms with so an enhanced cleanup step and with the original cleanup. Specifically, the implementation corresponds to the piece of work in .
The properties regarding comparisons of the resulting enhanced parallel partitioning algorithms are of particular interest when the elements are strings, and partitioning is used repeatedly, as in quicksort and quickselect. See article  and the Sourceforge project https://sourceforge.net/projects/stringbsts/ for further details.
The implementation consists of a set of C ++ header files, and hence it can be used by simply including it from client code. Specifically, parallel_partition.h is the main header. The interface and behavior of the partitioning algorithms is the same as in STL partition, except that a new parameter is added to specify the number of threads. Besides, input iterators are required to provide random access. In particular, the interface is
template <class RandomAccessIterator, class Predicate> RandomAccessIterator parallel_partition(RandomAccessIterator first, RandomAccessIterator last, Predicate pred, const int num_threads)
where parallel_partition should be replaced by one of the following:
Specifically, the name on the left corresponds to the notation in . Besides, the prefix enhanced is used to denote the implementation with the new cleanup step (otherwise, it denotes the implementation with the original cleanup).
Furthermore, the source code for performance tests (see  for more information on these tests) can be retrieved from the svn repository https://parpartition.svn.sourceforge.net/svnroot/parpartition provided by
SourceForge.net. It can also be used as an usage example.