Distributed Frameworks for Alternating Least Squares (Poster presentation)

    Alternating Least Squares (ALS) is a popular method for latent factor modeling in recommender systems with particularly good performance for the implicit rating problem. For very large data sets, implementing ALS for a multi-machine no shared memory architecture is problematic because of the large amount of communication alternating between the rows and the columns of the rating matrix. As an answer to the drawback of MapReduce with no permanent storage for the rating matrix, graph parallel frameworks were proposed. In these frameworks, all nonzero elements of the rating matrix correspond to an edge. However, the "think as a node" philosophy of these frameworks means that we have to repeat the same message for all graph nodes that reside on the same server. Moreover, while typical graph algorithms where the message for an edge is usually a single number or node ID, for ALS the message size is quadratic in the number of latent factors.

    We present our ongoing experiments with Giraph, Apache Flink, and its Pregel implementation Spargel. We implemented communication primitives that bind identical messages to many or all nodes of a partition and send only once, thus resulting in significant speedup. Our method will apparently speed up seemingly low communication intense algorithms such as PageRank.

    Marton Balassi, Robert Palovics and Andras A. Benczur
    Large-Scale Recommender Systems in conjunction with RecSys 2014