Improving Access to Shared Data in a Partitioned Global Address Space Programming Model

Partitioned Global Address Space (PGAS) programming languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, such as Unified Parallel C (UPC), combine the simplicity of shared-memory programming with the efficiency of the message-passing paradigm. PGAS languages partition the application's address space into private, shared-local, and shared-remote memory. The latency of shared-remote accesses is typically much larger than that of local, private accesses, especially when the underlying hardware is a distributed-memory machine and remote accesses imply communication over a network.

In order to achieve good performance, an optimizing compiler must be able to handle two features commonly found in PGAS languages: (i) shared data distribution; and (ii) parallel loop construct. When developing a parallel application, the programmer identifies data that is shared among threads and specifies how the shared data is distributed among the threads. This thesis introduces new static analyses that allow the compiler to distinguish between local shared data and remote shared data. The compiler then uses this information to reduce the time to access shared data using three techniques. (i) When the compiler can prove that a shared data item is local to the accessing thread, the address of the shared data item is retrieved and subsequent accesses to the shared data are performed as traditional memory accesses; (ii) When several remote shared-data accesses are performed and all remote shared data is owned by the same processor, a single coalesced shared-data access can replace several individual shared-data accesses; (iii) When shared-data accesses require explicit communication to move shared-data, the compiler can overlap the communication with other computation to hide the communication latency.

Greg Steffan
Last modified: Mon Dec 22 10:25:10 EST 2008