Practical
Performance Measurement and Analysis of Parallel Programs
on Clusters (and Grids)
CCGrid 2005 Tutorial
Bernd Mohr
Forschungszentrum Jülich
Zentralinstitut
für Angewandte Mathematik
52425
Jülich
Germany
b.mohr@fz-juelich.de
Length: 3 hour lecture
Content
level: 20% Introductory /
60% Intermediate / 20% Advanced
Abstract: This
tutorial will give an introduction into the theory and practical application of
experimental performance measurement and analysis of parallel programs
executing on cluster computer systems. We will introduce the basic terminology,
methodology, and techniques of performance measurement and analysis
and give practical advice on how to use these in an effective manner. Next we
describe commercial and research performance analysis tools for MPI, OpenMP, and hybrid MPI/OpenMP
applications available for these machines along with practical tips and hints
for their usage. Here we concentrate on portable tools which are available for
any (or at least almost any) parallel computer system and on tools for Linux
clusters. We show how these tools can be used to diagnose and locate typical
performance bottlenecks in real-world parallel programs. Finally, we give an
overview of issues and problems in regard to performance analysis in a Grid
environment, discuss to which extend “traditional” performance tools can be used here,
and give pointers to research projects and tools for measuring the performance
of Grid applications.
Prerequisites: Basic understanding of parallel computing and
its de-facto programming standards MPI and OpenMP. Knowledge of a programming language such as Fortran or C is helpful.
Detailed Lecture Outline
Part I: Terminology and Methodology
·
Application performance analysis techniques
·
Performance Metrics
·
Program instrumentation techniques
§
Source code/static instrumentation
§
Binary/Dynamic Instrumentation
§
MPI Instrumentation (PMPI)
§
OpenMP Instrumentation
(OPARI, POMP)
·
Measurement techniques
§
Sampling
§
Profiling
§
Event tracing
Part II: Resources and Tools
§
GuideView
§
Vampirtrace / Vampir
§
Paraver
·
Portable open-source tools
§
TAU
§
MPICH MPE
§
SvPablo
§
KOJAK
§
Pointers to more tools (mpiP,
Paradyn, MARMOT, ...)
·
Tools for Linux clusters (IA32, IA64, X86-64)
§
for IA32
o PerfSuite
o HPCToolkit
o Vtune
§
for IA64
o Perfmon
o Pfmon
o Histx
·
If everything else fails:
§
Timing routines
§
Hardware performance counter interfaces (PAPI)
·
Performance analysis of Grid applications