OpenCL implementation of Similarity kernel

    An OpenCL implementation of similarity kernel based on various distances:

    • L2
    • L1
    • Jensen-Shannon

    GitHub link:

    MOL BUBI Analytics Challenge - training and test data

    - Description of the files can be found at
    - The train and test files has the same columns and format


    Co-cluster is a clustering framework implemented in c++. It is capable of clustering and bi-clustering with several different distance measures. It can handle sparse data set effectively. We can run it on multiple input dataset with different distance measure on each input and aggregate the distances with predefined weights. Download from GitHub:

    Correlation Learning

    The source codes below extend the Lemur RankLib toolkit.

    RecSys Challenge 2015 - Team Budapest


    • session\_time: unix timestamp of the session.
    • session\_hour: hour of the day @session\_time.
    • session\_hour\_threshold: 2, if session\_hour is between 5 and 18 and 1, if session\_hour is between 3-5 or 18-20, and 0 otherwise.
    • session\_day: day of the week @session\_time.
    • session\_length: length of the session in seconds.
    • session\_length\_diff: difference of session\_length from 1,200 sec.

    RecSys Challenge 2014

    Adatminőség javítás és adatintegráció

    Csoportunk adatminőség javító és adatintegrációs megoldásainak rövid összefoglalója.

    Twitter influence subgraphs

    Anonymized Twitter influence subgraphs are available here

    Lecture series on distributed data processing technologies

    A series of lectures on distributed data processing technologies, architectures and tools related to the Big Data and Cloud SZTAKI projects.