Temporal Katz centrality - data sets

    Temporal Katz centrality is a centrality measure updateable by the edge stream in a dynamic network. It incorporates the elapsed time of edge activations as time decay.

    You can find details about this research on GitHub:

    Roland-Garros 2017 Twitter collection

    We collected tweets for Roland-Garros, the French Open tennis tournament for Days 1-15 in 2017. The dataset contains the mentions between Twitter users as well as the accounts of tennis players who participated in this contest. The schedule of Days 1-15, that was downloaded from the official event website, is also provided as ground truth.

    You can find more information about the dataset in this GitHub repository:

    OpenCL implementation of Similarity kernel

    An OpenCL implementation of similarity kernel based on various distances:

    • L2
    • L1
    • Jensen-Shannon

    GitHub link:

    MOL BUBI Analytics Challenge - training and test data

    - Description of the files can be found at
    - The train and test files has the same columns and format


    Co-cluster is a clustering framework implemented in c++. It is capable of clustering and bi-clustering with several different distance measures. It can handle sparse data set effectively. We can run it on multiple input dataset with different distance measure on each input and aggregate the distances with predefined weights. Download from GitHub:

    Correlation Learning

    The source codes below extend the Lemur RankLib toolkit.

    RecSys Challenge 2015 - Team Budapest


    • session\_time: unix timestamp of the session.
    • session\_hour: hour of the day @session\_time.
    • session\_hour\_threshold: 2, if session\_hour is between 5 and 18 and 1, if session\_hour is between 3-5 or 18-20, and 0 otherwise.
    • session\_day: day of the week @session\_time.
    • session\_length: length of the session in seconds.
    • session\_length\_diff: difference of session\_length from 1,200 sec.

    RecSys Challenge 2014

    Adatminőség javítás és adatintegráció

    Csoportunk adatminőség javító és adatintegrációs megoldásainak rövid összefoglalója.

    Twitter influence subgraphs

    Anonymized Twitter influence subgraphs are available here