Csalogány Károly doktori védése, 2010. június 3.

    Identifying and preventing spam was cited as one of the top challenges in web search engines in a 2002 paper. Amit Singhal, principal scientist of Google Inc. estimated that the search engine spam industry had a revenue potential of $4.5 billion in year 2004 if they had been able to completely fool all search engines on all commercially viable queries. Due to the large and ever increasing financial gains resulting from high search engine ratings, it is no wonder that a significant amount of human and machine resources are devoted to artificially inflating the rankings of certain web pages.

    We give five methods for spam detection. SpamRank and Link Based Similarity work on the neighboring link structure of the pages, Commercial Intent and Language Model Disagreement rely on the textual content, while stacked graphical learning, a recent machine learning technique provides a powerful way of combining linkage and content based methods.

    adatbányászat szeminárium
    csütörtök, 2010, június 3 - 10:00