Partitioned Global Address Space (PGAS) environments simplify writing parallel code for clusters because they make data movement implicit — dereferencing global pointers automatically moves data around. However, it does not free the programmer from needing to reason about locality — poor placement of data can lead to excessive and even unnecessary communication. For this reason, modern PGAS languages such as X10, Chapel, and UPC allow programmers to express data layout constraints and explicitly move computation. This places an extra burden on the programmer, and is less effective for applications with limited or data-dependent locality (e.g., graph analytics).
This paper proposes Alembic, a new static analysis that frees programmers from having to manually move computation to exploit locality in PGAS programs. It works by determining regions of code that access the same cluster node, then transforming the code to automatically migrate parts of the execution by passing around continuations to increase the proportion of accesses to local data. We implement the analysis and transformation for C++ in LLVM and show that in irregular application kernels, Alembic can achieve 82% of the hand-tuned performance (for comparison, naïve compiler-generated communication achieves only 13%).
Alembic talk slides (alembic-oopsla.pdf) | 1.22MiB |