Fisher kernels for image descriptors: a theoretical overview and experimental results

    Visual words have recently proved to be a key tool in image
    classication. Best performing Pascal VOC and ImageCLEF systems use
    Gaussian mixtures or k-means clustering to dene visual words based on
    the content-based features of points of interest. In most cases, Gaussian
    Mixture Modeling (GMM) with a Fisher information based distance over
    the mixtures yields the most accurate classication results.

    In this paper we overview the theoretical foundations of the Fisher kernel
    method. We indicate that it yields a natural metric over images character-
    ized by low level content descriptors generated from a Gaussian mixture.
    We justify the theoretical observations by reproducing standard measure-
    ments over the Pascal VOC 2007 data. Our accuracy is comparable to the
    most recent best performing image classication systems.

    Bálint Daróczy, András A. Benczúr, Lajos Rónyai
    Annales Univ. Sci. Budapest., Sectio Computatorica special issue dedicated to Professors Zoltán Daróczy and Imre Kátai on the occasion of their 75th birthday