Content-based trust and bias classification via biclustering

    In this paper we improve trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification NDCG results by over 3--10% for neutrality, bias and trustworthiness. The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC only slightly above 0.5 indicates the hardness of the task.

    Év: 
    2012
    Szerzők: 
    D. Siklósi, B. Daróczy and A.A. Benczúr
    Kiadvány: 
    Proceeding WebQuality '12 Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality Pages 41-47 ACM New York, NY, USA ©2012