Term and document frequencies

    The host level aggregate term vectors of the most frequent terms are found here. To encourage the use of cross-lingual features, sites auto-detected to be in English, French and German are processed separately. See also the content and link based features.

    1. Natural Language Processing features
    2. Training labels
    3. URLs and hyperlinks
    4. Content-based and link-based Web spam features

    For inquiries please contact Miklós ErdélyiLast updated: 17 May, 2010.