Corelab Seminar

Vaggelis Chatziafratis
Hierarchical Clustering as a Tool for Unsupervised Learning: An Optimization Viewpoint

Hierarchical Clustering (HC) is a widely studied problem in unsupervised learning and exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. Applications of HC include reasoning about text documents, understanding the Evolution of species and the Tree of life, decomposing social networks like Facebook, or even organizing large data centers efficiently. Surprisingly, despite the plethora of heuristics for tackling the problem, until recently there was no optimization objective associated with it; this is in stark contrast with flat clustering objectives like k-means, k-median and k-center. In this talk, we will give an overview of the optimization objectives for Hierarchical Clustering, we will analyze the popular Average-Linkage procedure and mention some of its drawbacks. Then, we will propose better algorithms with provably better approximation guarantees. (Full paper:
Most of the talk is based on joint works with Moses Charikar and Rad Niazadeh.