This method results in a large number of vectors and is susceptible to camera motion and object motion. The sensitivity to motion is reduced by extracting features from the whole frame, among which histogram is the mostly used. The disadvantage of global features is it tends to have low performance at detecting the boundary of two similar shots. To balance the tradeoff of resistance to motion and discriminating similar shots, a region based feature is proposed. Region-based method divides each frame into equal-sized blocks, and extracts a set of features per block. Based on the assumption that color content doesn’t change rapidly within but across shots, color is the mostly used features, others are edges and textures.

Assume that features computed in the first stage are similar within one shot but vary across shots, one can compute the distance of feature vectors of adjacent frames, and compare it against a threshold. A distance higher than the threshold usually corresponds to a hard cut. This fixed threshold is hard to detect gradual transitions because of its slow-changing nature. In addition, most videos have variations in each frame and also sudden change within one shot, such as appearance of a new object. Computing the distance of multiple frames will cancel these variations. Another method is to vary the threshold depending on the average distance within the current shot or the statistics of the whole video. The problem was also formulated as a binary classification task in which two classes are ’’transition” and ”no transition”.

