The Hungarian Search Engine

    8 October, 2003
    • Key components developed in 2003
    • Serves Hungarian Telecom Intranet (approx 1M pages, 5000 users) since March 2004
    • Public service

    Our search engine is ideal for languages with a particularly complicated syntax that require an extensive use of natural language processing tools. Our design flexibly adapts to languages as complex as the Hungarian where complex interaction is necessary between linguistic tools and ranking methods. We handle agglutination, compounds, possible missing accents as well as multilingual document sets at all levels of the core engine.

    Our product scales best to mid-range subdomains, intrantes or document collections of the size of a few ten millions of documents. We apply efficient crawling and indexing policies that enable breaking news search as well as avoid most traps due to techniques for presenting personalized Web pages.

    Services include:

    • Agglutination, stemming, missing accents at user’s choice
    • Simple and boolean search, search within results
    • Document highlights, cached documents with jump to search word
    • Over 100 file formats (pdf, MS Office, zip, …) and media search
    • Indexing local file systems
    • Search for similar pages
    • Search alert function
    • Full combination of restrictions to date, content type, language, domain
      as well as links to a given page
    • Flexible wildcard (*, ?) and expression search
    • Usage and search statistics
    • Runs on Solaris and Linux with standard XML interface and JSP frontend