Data mining, data analysis

    The main focus of our group is data mining, finding correlations and hidden relationships in large amounts of data. We deal with complex and heterogeneous data sets, and do predictive modeling. We have taken part in several data mining contests. We know, apply and improve state of the art techniques of this field.

    Entity resolution

    Data integration and data analysis often require solving the problem of entity resolution, also called record linkage, or de-duplication: we have to group data items together if they correspond to the same object in the real world.

    Analyzing social networks

    Virtual social networks are used by hundreds of millions of people, and are rapidly growing. In parallel the need for analyzing these networks also keeps growing. We investigate the flow of information within a network, and in another project we try to identify the same users in different networks.

    "Big Data": handling extremely large datasets

    Handling and analyzing huge amounts of data is a timely challenge. Our group is researching, developing and applying special tools and technology targeted to solve the problems that are posed by having overwhelming volumes of data.

    Machine learning

    Data miners often face challenges that are easier to solve with a machine learning algorithm. We apply and develop such algorithms and related efficient tools with success.

    Search technologies

    Researching and applying data search systems are in the focus of our group for almost ten years, with scientific as well as practical achievements.

    Data warehousing

    Besides developing and supporting relational data warehouse systems for our partners, we are looking for new ways to extend the current borders of scalable, distributed data management technologies and the related tools.

    Network visualization

    We have developed a tool that is able to handle large amounts of heterogeneous data, and supports efficient searching and browsing a network of nodes.


    The analysis of large sets of biological data requires deep understanding of data mining concepts. To handle problems in the field of bioinformatics we use special algorithms, and extend existing solutions, or improve them to be more efficient.