SZTAKI @ ImageCLEF 2012 Photo Annotation
In this paper we describe our approach to the ImageCLEF2012 Photo Annota-
tion task. We used both visual and textual modalities for all submissions. We described each
image with a fixed length representation using different similarity measures. By this method
we were able to combine, before the classification, a large variety of descriptors to improve
the classification quality. This descriptor is a combination of several visual and textual sim-
ilarity values between the actual image and a reference image set, containing well selected
training images. We trained Gaussian Mixture Models (GMM) to define a generative model
for low-level descriptors extracted from the training set using Harris-Laplacian point de-
tection. We used two descriptors, a grayscale gradient and a color moment based one. In
order to measure the visual similarity between two images, we extracted several dense Fisher
vectors per image. Besides calculating visual features, we adopted a biclustering method to
cluster the Flickr tags and the images at the same time. Additionally, we measured the
similarity of images according to their Flickr tags using Jensen-Shannon divergence.