In order to achieve good performance, an optimizing compiler must be able to handle two features commonly found in PGAS languages: (i) shared data distribution; and (ii) parallel loop construct. When developing a parallel application, the programmer identifies data that is shared among threads and specifies how the shared data is distributed among the threads. This thesis introduces new static analyses that allow the compiler to distinguish between local shared data and remote shared data. The compiler then uses this information to reduce the time to access shared data using three techniques. (i) When the compiler can prove that a shared data item is local to the accessing thread, the address of the shared data item is retrieved and subsequent accesses to the shared data are performed as traditional memory accesses; (ii) When several remote shared-data accesses are performed and all remote shared data is owned by the same processor, a single coalesced shared-data access can replace several individual shared-data accesses; (iii) When shared-data accesses require explicit communication to move shared-data, the compiler can overlap the communication with other computation to hide the communication latency.