Corelab Seminar

Paris Koutris
Topology-aware Parallel Data Processing

The analysis of massive datasets requires a large number of processors. When designing and implementing algorithms for such large clusters, prior research has largely assumed a uniform topology. However, this assumption rarely holds in many practical settings with heterogeneous machines and networks. This necessitates an end-to-end investigation of how one can model, design, and deploy topology-aware algorithms for fundamental data processing tasks at a large scale. In this talk, I will start by describing a simple theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, I will then explore algorithms with theoretical guarantees for basic data processing tasks. Finally, I will show how these algorithms have the potential to significantly speed up the performance compared to their topology-oblivious counterparts.