06 August 2020
10:00 Master's Defense Fully distance
Theme
Detection of Violent Events in Video Sequences Based on the Census Transform Histogram Operator
Student
Felipe Faria de Souza
Advisor / Teacher
Helio Pedrini
Brief summary
Surveillance systems in video sequences have been widely used to monitor scenes in various environments, such as airports, banks, schools, industries, bus and train stations, highways and stores. Due to the large amount of information obtained by surveillance cameras, the use of visual inspection by camera operators becomes a tiring and prone to failure, in addition to consuming a lot of time. A challenge is the development of intelligent surveillance systems capable of analyzing long sequences of videos captured by a network of cameras in order to identify a certain behavior. In this work, we propose and analyze the use of several classification techniques, based on the CENTRIST operator (Histogram of the Transformed Census), in the context of identifying violent events in video scenes. Additionally, we evaluated other traditional descriptors, such as HoG (Oriented Gradient Histogram), HoF (Optical Flow Histogram) and descriptors extracted from pre-trained deep machine learning models. In order to allow the evaluation only in regions of interest present in the video frames, we investigated techniques for removing the background from the scene. A sliding window-based approach was used to assess smaller regions of the scene in combination with a voting criterion. The sliding window is then applied together with block filtering using the scene's optical flow. To demonstrate the effectiveness of our method for discriminating violence in crowd scenes, we compared the results with other approaches available in the literature in two public databases (Violence in Crowds and Hockey Fights). The effectiveness of the combination between CENTRIST and HoG has been demonstrated in comparison to using these operators individually. The combination of these operators obtained approximately 88% against 81% using only HoG and 86% using CENTRIST. From the refinement of the proposed method, we identified that evaluating blocks of the frame with the sliding window approach made the method more effective. Techniques for generating visual words with sparse coding, measuring distance with a model of Gaussian mixtures and measuring distance between clusters were evaluated and discussed. Ways to restrict the actors present in the scenes using optical flow were analyzed using the Otsu method and the filtering of the blocks based on a threshold calculated by the average of the optical flow of the scene. Finally, we dynamically calculate the threshold for class voting, which, in most cases, obtained superior results, surpassing our most competitive results: 91,46% accuracy for the Violence in Crowds base and 92,79% for the Hockey base Fights.
Examination Board
Headlines:
Hélio Pedrini IC / UNICAMP
Alexandre Gonçalves Silva INF / UFSC
André Santanchè IC / UNICAMP
Substitutes:
Esther Luna Colombini IC / UNICAMP
Moacir Antonelli Ponti ICMC / USP