Introduction to HPC with MPI for Data Science

Springer, 3 févr. 2016 - 282 pages

This gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard has been designed as a first course for undergraduates on parallel programming on distributed memory models, and requires only basic programming notions.

Divided into two parts the first part covers high performance computing using C++ with the Message Passing Interface (MPI) standard followed by a second part providing high-performance data analytics on computer clusters.

In the first part, the fundamental notions of blocking versus non-blocking point-to-point communications, global communications (like broadcast or scatter) and collaborative computations (reduce), with Amdalh and Gustafson speed-up laws are described before addressing parallel sorting and parallel linear algebra on computer clusters. The common ring, torus and hypercube topologies of clusters are then explained and global communication procedures on these topologies are studied. This first part closes with the MapReduce (MR) model of computation well-suited to processing big data using the MPI framework.

In the second part, the book focuses on high-performance data analytics. Flat and hierarchical clustering algorithms are introduced for data exploration along with how to program these algorithms on computer clusters, followed by machine learning classification, and an introduction to graph analytics. This part closes with a concise introduction to data core-sets that let big data problems be amenable to tiny data problems.

Exercises are included at the end of each chapter in order for students to practice the concepts learned, and a final section contains an overall exam which allows them to evaluate how well they have assimilated the material covered in the book.

Aperçu du livre »

Pages sélectionnées

Table des matières

List of Tables	1

Part I High Performance Computing HPC with the Message Passing Interface MPI	2

1 A Glance at High Performance Computing HPC	3

The Message Passing Interface	21

3 Topology of Interconnection Networks	63

4 Parallel Sorting	99

5 Parallel Linear Algebra	121

6 The MapReduce Paradigm	147

7 PartitionBased Clustering with kMeans	162

8 Hierarchical Clustering	195

Practice and Theory of Classification with the kNN Rule	212

10 Fast Approximate Optimization in High Dimensions with CoreSets and Fast Dimension Reduction	231

11 Parallel Algorithms for Graphs	245

Appendix AWritten Exam 3 h	260

A Resource Manager and JobScheduler on Clusters of Machines	273

Index	277

Part II High Performance Computing HPC for Data Science DS	161

Droits d'auteur

Autres éditions - Tout afficher

Introduction to HPC with MPI for Data Science
Frank Nielsen
Aucun aperçu disponible - 2016

Expressions et termes fréquents

Amdahl’s law approximation argc argv barycenter Big Data bitonic sequences block broadcast called centroid char choose classifier communication primitives complexity computer cluster consider core-sets cost function Data Science data-set define dendrogram denote densest sub-graph dimension distributed memory elements Euclidean distance example Figure filename follows global graph Gray code Hamming distance hierarchical clustering hypercube illustrates implementation initialization input interconnection network interface isomorphism iterations k-medoids k-NN label linear linkage Lloyd’s logn machines MapReduce matrix product merge sort minimize MPI_COMM_WORLD MPI_INT multi-core nearest neighbor nodes number number number obtain optimal parallel algorithms parallel programming partition perform permutation pivot problem processors QuickSort rank recursively regular topology rejection sampling ring topology sampling sequential smallest enclosing ball source code speedup squared Euclidean distance stage stored sub-lists Theorem torus tree variance vector vertices Voronoi diagram

À propos de l'auteur (2016)

Frank Nielsen is a Professor at École Polytechnique in France where he teaches graduate (vision/graphics) and undergraduate (Java/algorithms),and a senior researcher at Sony Computer Science Laboratories Inc. His research includes Computational information geometry for imaging and learning and he is the author of 3 textbooks and 3 edited books. He is also on the Editorial Board for the Springer Journal of Mathematical Imaging and Vision.

Informations bibliographiques

Titre	Introduction to HPC with MPI for Data Science Undergraduate Topics in Computer Science
Auteur	Frank Nielsen
Édition	illustrée
Éditeur	Springer, 2016
ISBN	3319219030, 9783319219035
Longueur	282 pages

Exporter la citation	BiBTeX EndNote RefMan

À propos de Google Livres - Règles de confidentialité - Conditions d' utilisation - Informations destinées aux éditeurs - Signaler un problème - Aide - Accueil Google