High-Performance I/O for Scientific Applications Robert Latham Mathematics and Computer Science Division, Argonne National Lab, IL, USA Abstract: For all their hype, parallel file systems alone are not sufficient for achieving high-performance and usable I/O on large clusters. In addition to the performance that a parallel file system can afford, functionality is also needed for mapping high-level or domain specific application abstractions into file system ones and for managing the many processes that make up these applications. Because of these additional demands, high-performance I/O (HPIO) systems in parallel machines have evolved into multi-tiered solutions. At the top are high-level I/O libraries such as HDF5 and PnetCDF that provide application developers familiar interfaces and abstractions such as multidimensional typed variables. Below this layer sits MPI-IO, providing management of groups of processes and aggregation of operations performed by these groups. The parallel file system sits at the lowest software layer and provides a unified view of a large number of I/O resources and manages concurrent access to these resources. This talk will focus on the application programmer's point of view. We will discuss how the layers of this software stack lead to an effective solution, what happens to I/O requests as they move through this software stack, the tradeoffs of interfacing at the various layers, and general guidelines for obtaining high performance. Attendees will come out of the tutorial with an understanding of how HPIO systems are constructed and how they are different from other storage systems, knowledge of what happens in a HPIO system after an application makes an I/O call, a familiarity with the tools available to them for making use of these resources, and some guidelines for extracting performance from these complicated systems. Table of Contents: - Introduction and I/O stacks - Application I/O vs. parallel I/O - Bridging the gap with I/O stacks - I/O stacks for computational science 1.High Level Libraries 2.I/O Middleware 3.Parallel File System - I/O interfaces and formats, with examples - POSIX file system interface 1.Examples 2.POSIX and PFS interaction - MPI-IO interface 1.Collective I/O 2.Noncontiguous I/O 3.Nonblocking and Asynchronous I/O 4.Examples 5.Optimizations 6.MPI-IO Implementations - Parallel netCDF (PnetCDF) 1.File Layout 2.Usage 3.Examples 4.Interaction with MPI-IO - Hierarchical Data Format (HDF5) 1.File Layout 2.Usage 3.Examples 4.Interaction with MPI-IO - I/O best practices - Choosing an I/O interface - Guidelines for I/O performance - Tuning I/O stacks with hints 1.MPI-IO hints 2.File system-specific hints 3.High-level library hints - Enlisting the experts - Conclusions and supplemental material