This work describes a technique that
produces a content-based representation of a video shot composed by a
background (still) mosaic and one or more foreground moving
objects. Segmentation of moving objects is based on ego-motion
compensation and on background modelling using tools from robust
statistics. Region matching is carried out by an algorithm that
operates on the Mahalanobis distance between region descriptors in two
subsequent frames and uses singular value decomposition to compute a
set of correspondences satisfying both the principle of proximity and
the principle of exclusion. The sequence is represented as a layered
graph, and specific techniques are introduced to cope with crossing
and occlusion.
The scheme on the left (click to enlarge) shows all the steps of
our system. The input is a video shot, the output are shape
descriptors for each moving object.