Practical Performance Measurement and Analysis of Parallel Programs
on Clusters (and Grids)

 

CCGrid 2005 Tutorial

 

                                                              Bernd Mohr                                                                

                                                    Forschungszentrum Jülich

                                       Zentralinstitut für Angewandte Mathematik

                                                              52425 Jülich

                                                                  Germany

                                                                           

                                                       b.mohr@fz-juelich.de                                                           

 

 

Length: 3 hour lecture

 

Content level:   20% Introductory  /  60% Intermediate  /  20% Advanced

 

Abstract:  This tutorial will give an introduction into the theory and practical application of experimental performance measurement and analysis of parallel programs executing on cluster computer systems. We will introduce the basic terminology, methodology, and techniques of performance measurement and analysis and give practical advice on how to use these in an effective manner. Next we describe commercial and research performance analysis tools for MPI, OpenMP, and hybrid MPI/OpenMP applications available for these machines along with practical tips and hints for their usage. Here we concentrate on portable tools which are available for any (or at least almost any) parallel computer system and on tools for Linux clusters. We show how these tools can be used to diagnose and locate typical performance bottlenecks in real-world parallel programs. Finally, we give an overview of issues and problems in regard to performance analysis in a Grid environment, discuss to which extend “traditional  performance tools can be used here, and give pointers to research projects and tools for measuring the performance of Grid applications.

 

Prerequisites:  Basic understanding of parallel computing and its de-facto programming standards MPI and OpenMP. Knowledge of a programming language such as Fortran or C is helpful.

 


Detailed Lecture Outline

 

 

Part I: Terminology and Methodology

 

·      Motivation

·      Application performance analysis techniques

·      Performance Metrics

·      Program instrumentation techniques

§        Source code/static instrumentation

§        Binary/Dynamic Instrumentation

§        MPI Instrumentation (PMPI)

§        OpenMP Instrumentation (OPARI, POMP)

·      Measurement techniques

§        Sampling

§        Profiling

§        Event tracing

·      Tips and tricks for practical performance analysis and tuning

 

Part II: Resources and Tools

 

·      Portable commercial tools

§        GuideView

§        Vampirtrace / Vampir

§        Paraver

·      Portable open-source tools

§        TAU

§        MPICH MPE

§        SvPablo

§        KOJAK

§        Pointers to more tools (mpiP, Paradyn, MARMOT, ...)

·      Tools for Linux clusters (IA32, IA64, X86-64)

§        for IA32

o       PerfSuite

o       HPCToolkit

o       Vtune

§        for IA64

o       Perfmon

o       Pfmon

o       Histx

·      If everything else fails:

§        Timing routines

§        Hardware performance counter interfaces (PAPI)

·      Performance analysis of Grid applications