Intro

In this document, a performance evaluation of three different implementations of the QuickSort algorithm is presented. The document compare between:

Testbed

For the hardware:

## Processor: Intel(R) Core(TM) i5 CPU M 450 @ 2.40GHz
## Processor architecture: x86_64
## #Processors: 4
## #cores per processor: 2
## Main memory: 3832292 kB
## Caches: 
## L1d cache:             32K
## L1i cache:             32K
## L2 cache:              256K
## L3 cache:              3072K

For the software:

## Operating system: GNU/Linux
## Distribution: Ubuntu 15.10
## Kernel version: 4.2.0-23-generic
## Compiler version: g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010

Software spec.

Experiments

The design of experiments is:

fac.design(nfactors=2, replications=15, repeat.only=FALSE,
                       blocks=1, randomize=TRUE, seed=24625,
                       factor.names=list(size=(1:20)*50000, tlevel=c(2:6)))

Results

First plot of the data, 8 threads is used in the parallel version:

The LibC is worse than the two others, and the gap increases with the increased size. This mainly due to the comparison function calls, as it is shown by profiling.

Let’s focus on the sequential and parallel implementations:

With the confidence interval:

With the regression line:

For array size over 300K, the parallel version outperforms the sequential one

Analyzing the number of threads in the parallel verion

For a high number of threads (16 and 32) we notice a high variance and high number of outliers; this can be explained by the increasing number of cache misses.

Just focusing on 4 and 8 threads: