# M2R Parallel Systems

## Table of Contents

## Sitemap

## Parallel Systems

### General Informations

These lectures take place in every monday from 9h30 to 12h30. The coordinator for these lectures is Arnaud Legrand

The next lecture will be on Monday 30/11 in room D 207 from 9h45 to 12h45

### Objectives

Today, parallel computing is omnipresent across a large spectrum of computing platforms. At the ``microscopic'' level, processor cores have used multiple functional units in concurrent and pipelined fashions for years, and multiple-core chips are now commonplace with a trend toward rapidly increasing numbers of cores per chip. At the ``macroscopic'' level, one can now build clusters of hundreds to thousands of individual (multi-core) computers. Such distributed-memory systems have become mainstream and affordable in the form of commodity clusters. Furthermore, advances in network technology and infrastructures have made it possible to aggregate parallel computing platforms across wide-area networks in so-called ``grids.''

An efficient exploitation of such platforms requires a deep understanding of both architecture, software and infrastructure mechanisms and of advanced algorithmic principles. The aim of this course is thus twofold. It aims at introducing the main trends and principles in the area of high performance computing infrastructures, illustrated by examples of the current state of the art. It intends to provide a rigorous yet accessible treatment of parallel algorithms, including theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and fundamental notions of scheduling and work-stealing. These notions will always be presented in connection with real applications and platforms.

### Program and expected schedule

To be decided. Check last year's schedule to get a foretaste.

Tentative schedule (in permanent modification).

`28/09/2008: Arnaud Legrand`

Introduction to parallel computing. High Performance Architectures Processors (superscalar, simultaneous multi-threading, multi-core, GPU…). Symmetric MultiProcessors. OS features for cluster computing Multi-threading. From clusters to Grids. 01_parallel_architectures.pdf`05/10/2009: Vincent Danjean`

High Performance Networks: bandwidth, latency, DMA, PIO, overlapping. How to Efficiently Program High Performance Architectures ? System and Low Level approaches. MPI, pthreads. 02_parallel_programming.pdf`12/10/2009: Vincent Danjean`

How to Efficiently Communicate on Distributed Architectures ? Research aspects of mixing different HP API (e.g. how to efficiently use MPI and pthreads, how to efficiently use threads on hierarchical platforms, ….) 03_communications.pdf`19/10/2009: Arnaud Legrand`

Performance Evaluation. Network models and topologies. Many parallel algorithms on a ring and on a grid. 04_parallel_algorithms.pdf`02/11/2009: Arnaud Legrand`

End of the previous lecture. From fine-grain to coarse-grain. PRAM, sorting networks and application to implementation on NOWs. 05_theory.pdf`09/11/2009: Arnaud Legrand`

Modeling parallel programs and platforms. Fundamental characteristics: Work and Depth. Dataflow graph representation of an execution. BSP programs. Introduction to Scheduling! Classical scheduling techniques and application to Resource management (PBS, LSF, SGE, OAR). 06_scheduling.pdf`13/11/2009: Vincent Danjean`

Parallel programming in practice: MPI, POSIX threads. 07_MPI_tutorial.tgz`23/11/2009: Jean-Louis Roch`

Work, depth and work-stealing. Analysis on processors with changing speeds. Parallelism extraction and algorithmic schemes by recursive splitting. Examples: iterated product/accumulate, cascading divide-and-conquer.

08_algopar_cours1.pdf 08_exercise-parallel-merge-sort-english-questions.pdf 08_exercise-parallel-merge-sort-english-answers.pdf 08_max-AC-ultrafast.pdf

`30/11/2009: Jean-Louis Roch`

Work-stealing and data locality. Sorting and merging, FFT, matrix operations. Adaptive algorithms and cascading divide & conquer: prefix computation, data compression, linear system solving. 09_algopar_cours1.pdf`7/12/2009 Derrick Kondo`

Desktop Grids. On the convergence of cloud computing and desktop grids; Google Map Reduce.`14/12/2009 Jean-Louis Roch`

Adpative parallel algorithms: coupling sequential and parallel code. Parallel complexity - NC class and P-complete problems. Evaluation of arithmetic expressions and straight-line programs. 11_algopar.pdf`11/01/2010`

Presentation of students' individual studies.`18/01/2010`

Presentation of students' individual studies.

### Course Organization

The course gives 6 credits (ECTS). In addition to the lectures, each student performs an individual study: either the analysis of the parallelization of a given application (to be proposed by the student) or the presentation of a research paper (to be chosen among a proposed list of papers) with answers to given questions.

### Project

#### Things to Note

- Most of the projects have an "open-ended" flavor to them. The idea is to approach them like small research projects in which you define your own methodology and (some of) your objectives. What I am looking for is "mature" approaches expected from graduate students. In particular, your first task should be to come up with a precise formulation of the project.
- Some projects may be harder than others, and expectations will be tailored to the projects.
- At most 2 students can pick the same project.
- Group projects involving 2 students are possible, but expectations will be higher for the end result and I refuse to get involved in "I did everything and my partner did nothing because he/she was out skiing all the time" arguments.
- Students are strongly encouraged to define their own projects, but these projects have to be approved by me beforehand and approval will be contingent on the project being sufficiently involved.
- Looking at previous work, papers, results in the literature is encouraged. Downloading actual code that does the project is ok for comparison with your own implementation, but using that code instead of your own implementation constitutes ground for getting a zero.
For the projects that require that you write a parallel applications, it is understood that you should write a sequential version (if need be) and that you compute speedups with respect to the sequential version. It is also understood that you will perform in-depth performance measurements. It is your responsibility to come up with interesting things to say in your report! One way to do this is coming up with multiple versions of your implementation so that you can study what worked and what didn't in terms of performance. Producing only one implementation doesn't really give you anything interesting to talk about. Performance analysis/modeling is a plus.

**IMPORTANT:**You should discuss progress with me, and not hesitate to ask me questions and for directions!

#### What to Turn in

You must turn in your code, a report (PDF) around 10-pages, and be prepared to present your project to the class and answer questions (about 15/20 minutes).

Projects are due on `January 6 2010`

, with presentations starting January 11 2010.

#### List of Possible Projects

Again, this is a non exhaustive list of **possible** projects. Feel free
to propose me other subjects and/or to come discuss with me if nothing
really suits you here. If the one you wanted to work on is already
taken, we can try to think to another subject in the same area but
with a different perspective. h5. System and Architecture

`Parallel Sorting on a Cluster`

(This project is assigned to Sofiane Kara Mostefa and Aastha Sharma) Sorting a list of numbers is a key operation in many applications and it is well-known that it is difficult to parallelize it efficiently. For instance, due to the fact that the amount of data is large when compared to the amount of computation, the cost of I/O may be overwhelming. In this project you will consider the following problem: