Web spam classification: a few features worth more

    In this paper we investigate how much various classes of Web spam features, some requiring very high computational effort, add to the classification accuracy. We realize that advances in machine learning, an area that has received less attention in the adversarial IR community, yields more improvement than new features and result in low cost yet accurate spam filters. Our original contributions are as follows:

    • We collect and handle a large number of features based on recent advances in Web spam filtering.