LiveJournal data

    The data set is intended for research purposes only and freely available as per Creative Commons Attribution-Noncommercial-Share Alike 3.0, which basically states that you are free to use the labels and that we make no warranties about them. You can download and use the data for research in any institution public or private. The "nc-sa" (non-commercial, share-alike) rule applies if you want to redistribute the labels publicly.

    When using the data set, you should acknowledge the source by citing it as:

    Miklós Kurucz, András A. Benczúr, Attila Pereszlényi. Large-Scale Principal Component Analysis on LiveJournal Friends Network. In proc Workshop on Social Network Mining and Analysis Held in conjunction with The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008) August 24-27, 2008, Las Vegas, NV

    The friend connection graph is in graph.zip.
    Format is standard, first the id of the node, then the list of the nodes it has outedge.
    This graph is directed.
    The names of users associated with the node ids is in name2id.zip.

    The following data is present only where the user provided it.
    Location data(only country) is in location.zip.
    The birth data(year or/and month and day) is in birth.zip.
    Data regarding education is in schools.zip.
    Users may have multiple records of schools.
    The number of blog posts a user has created is in activity.zip.
    Interest of the users are in interests.zip.
    Users can have multiple interests.

    Most of this data was downloaded from LiveJournal in early 2008.
    We would like to thank Kleinberg et al. for providing parts of this data.

    Languages