SZTAKI @ ImageCLEF 2012 Photo Annotation

    In this paper we describe our approach to the ImageCLEF2012 Photo Annota-
    tion task. We used both visual and textual modalities for all submissions. We described each
    image with a fixed length representation using different similarity measures. By this method
    we were able to combine, before the classification, a large variety of descriptors to improve
    the classification quality. This descriptor is a combination of several visual and textual sim-
    ilarity values between the actual image and a reference image set, containing well selected
    training images. We trained Gaussian Mixture Models (GMM) to define a generative model
    for low-level descriptors extracted from the training set using Harris-Laplacian point de-
    tection. We used two descriptors, a grayscale gradient and a color moment based one. In
    order to measure the visual similarity between two images, we extracted several dense Fisher
    vectors per image. Besides calculating visual features, we adopted a biclustering method to
    cluster the Flickr tags and the images at the same time. Additionally, we measured the
    similarity of images according to their Flickr tags using Jensen-Shannon divergence.

    B. Daróczy, D. Siklósi and A.A. Benczúr
    In Working Notes of the ImageCLEF 2011 Workshop at CLEF 2012 Conference, Rome, Italy