Downloads

    Lecture series on distributed data processing technologies

    A series of lectures on distributed data processing technologies, architectures and tools related to the Big Data and Cloud SZTAKI projects.

    Cross-Linugual Web Classification

    If you use this data, please cite

    A. Garzó, B. Daróczy, T. Kiss, D. Siklósi, A.A. Benczúr
    Cross-Lingual Web Spam Classification
    In Proc. WICOW 2013 in conjunction with WWW 2013

    ECML/PKDD 2010 Discovery Challenge Data Set

    The Web Quality datasets in this site are provided to advance research on Web document classification. These labels are intended for research purposes only. We advise you not to use these labels directly for search engine ranking or filtering.

    Wimmut: searching and navigating Wikipedia

    Download our test queries with assessment and a Java application with a user-friendly graphical interface for searching Wikipedia content and navigating network of pages.

    Reticular Alignment: algorithm for multiple sequence alignment

    Reticular Alignment is our new method for for multiple sequence alignment. Unlike previous corner-cutting methods, our approach does not define a compact part of the dynamic programming table. Instead, it defines a set of optimal and suboptimal alignments at each step during the progressive alignment.

    MCMC for metabolic networks

    In our model the evolution of a metabolic network is characterized by gain and loss of reactions connecting two metabolites and can be described as a discrete space continuous time Markov process.

    LiveJournal data

    The data set is intended for research purposes only and freely available as per Creative Commons Attribution-Noncommercial-Share Alike 3.0, which basically states that you are free to use the labels and that we make no warranties about them. You can download and use the data for research in any institution public or private.

    Temporal Features for Web Spam Detection

    Temporal features for Web spam detection calculated from monthly snapshots of the .uk domain between October 2006 and May 2007.

    Pages

    Languages