Skip to main content
Back to KTH start page

Image-based matching and recognition

We perform basic research in the areas of spatial and spatio-temporal recognition to develop new methods for

  • recognizing previously seen objects from novel views,
  • classifying previously unseen objects into object categories, and
  • recognizing human activities and spatio-temporal events.

The overall methodology is based on image measurements in terms of receptive fields expressed in terms Gaussian derivatives and differential invariants as obtained from scale-space theory.

Concerning recent developments, we have developed a generalized framework for detecting scale-invariant interest points from images, based on which several of our new interest point detectors lead to significantly better performance compared to previous and more commonly used interest point detectors:

  • Lindeberg (2015) ``Image matching using generalized scale-space interest points'', Journal of Mathematical Imaging and Vision, 52(1): 3-36. (Download PDF)
  • Lindeberg (2013) ``Scale selection properties of generalized scale-space interest point detectors'', Journal of Mathematical Imaging and Vision, 46(2): 177-210. (Download PDF)

We have also performed an exhaustive experimental investigation of the information content in receptive field based image descriptors in terms of Gaussian derivatives and differential invariants up to order two, and shown that there are new receptive field based image descriptors with better discriminability properties compared to previously known image descriptors within the same class:

  • Linde and Lindeberg (2012) ``Composed complex-cue histograms: An investigation of the information content in receptive field based image descriptors for object recognition'', Computer Vision and Image Understanding, 116(4): 538-560. (Download PDF)

Specifically, our experimental results show that in many cases coarsely quantized image descriptors such as binary image descriptors perform surprisingly well for object recognition and object categorization.

In earlier work, we have developed approaches for spatio-temporal recognition based on local spatio-temporal descriptors in terms of local position-dependent histogram of either spatio-temporal gradients or optic flow:

  • Laptev and Lindeberg (2004) "Local descriptors for spatio-temporal recognition", Proc. ECCV'04 Workshop on Spatial Coherence for Visual Motion Analysis, (Prague, Czech Republic), 2004, Springer LNCS 3667: 91-103. (Download PDF)
  • Laptev, Caputo, Schuldt and Lindeberg (2007) "Local velocity-adapted motion events for spatio-temporal recognition",  Computer Vision and Image Understanding, 108(3): 207-229. (Download PDF)

These approaches can be seen as extensions of SIFT from space to space-time, and consitute precursors to other spatio-temporal image descriptors such as HOF. Whereas the spatio-temporal image descriptors in these papers were integrated with spatio-temporal interest point detection

  • Laptev and Lindeberg (2003) "Space-time interest points", Proc. International Conference on Computer Vision (ICCV'03), (Nice, France), volume I:432-439. (Download PDF)

these position-dependent histograms of either spatio-temporal gradients or optic flow can also be computed densely at every point in space-time, in a corresponding way as the pure image descriptor in SIFT can be computed densely in dense SIFT.

A more general set of spatio-temporal interest point detectors, which specifically have better spatio-temporal scale selection properties than the spatio-temporal Harris operator, is presented in

  • Lindeberg (2018) "Spatio-temporal scale selection in video data", Journal of Mathematical Imaging and Vision, 60(4): 525-562. (Download PDF)

  • Lindeberg (2017) "Spatio-temporal scale selection in video data", Proc. SSVM 2017: Scale-Space and Variational Methods in Computer Vision, (Kolding, Denmark), Springer LNCS 10302: 3-15. (Download PDF)

We have also developed earlier approaches to spatial recognition based on regional histograms of Gaussian derivatives or differential invariants and shown that cue combination in terms of higher-dimensional histograms of multiple receptive field responses improves the performance compared to previously used histogram descriptors:

  • Linde and Lindeberg (2004) "Object recognition using composed receptive field histograms of higher dimensionality", Proc ICPR 2004: International Conference on Pattern Recognition, (Cambridge, U.K), August 2004, volume 2: 1-6. (Download PDF)

Regarding the detection of interest points, which underlies a main paradigm of image-based matching based on local image descriptors, we developed the Laplacian and determinant of the Hessian interest points with automatic scale selection based on the detection of scale-space extrema in:

  • Lindeberg (1998) "Feature detection with automatic scale selection", International Journal of Computer Vision, 30(2): 77-116. (Download PDF)

The interest point detectors in SIFT and SURF can be seen as approximations of such scale-space extrema of the scale-normalized Laplacian or the scale-normalized determinant of the Hessian, respectively, using either a Laplacian pyramid or a Haar wavelet basis, respectively.