18 dez 2020
15:00 Defesa de Mestrado Integralmente a distância
Evaluating Attention-Based Models for Violence Classification in Videos
Marcos Vinícius Adão Teixeira
Orientador / Docente
Sandra Eliza Fontes de Avila
Breve resumo
Technological advances have contributed to an increase in the sharing of videos online. Generally, this type of media is primarily focused on entertainment and has been consumed on demand. Given this large volume of information, automatic techniques of identifying the type of content contained in the videos have been studied over the years. Specifically, identifying sensitive content has grown and aims to detect and analyze sensitive events for various applications, such as violent, pornographic, grotesque content. In this work, we focus on the classification of violence in videos. In this vein, several works are proposed in the literature with solutions ranging from local descriptors to deep neural networks. Most approaches use the entire representation of the video as input to extract the appropriate features for classification. Most approaches use the entire representation of the video as input to extract the appropriate features for classification. However, in the real world, some scenes may contain noisy and irrelevant parts that confuse the algorithm. We investigated the effectiveness of attention-based models to deal with this problem. Despite the success of attention-based models in different tasks, such as speech recognition, image captioning, and machine translation, such methods have not yet been explored for the context of violence. To conduct this work, we searched in the literature some attention-based models related to video classification task, adapted to our context of violence and compared it with traditional strategies. We use the EfficientNet network, state-of-the-art for image classification, to extract features for all approaches using RGB video frames and Optical Flow as input. Also, we extended the initial implementations to work with multimodal features using the late fusion approach. After conducting a detailed survey of the violence datasets, we chose three datasets to evaluate the methods studied: Hockey Fights, MediaEval 2015, and RWF-2000. Each dataset presents a different concept of violence, which made the experiments more interesting and challenging. We conducted quantitative experiments, analyzed the performance of attention-based models, compared them with traditional methods, and qualitative, analyzing the relevance scores produced by the attention-based models. The best results for each database were obtained using some attention-based model, demonstrating the effectiveness of the approach for the context of violence. However, not all the attention-based model models have produced better results than traditional approaches, not justifying the adoption of an additional module to the model in these cases. On the other hand, the best attention-based models have achieved better results than many more expensive approaches proposed in the literature, highlighting the advantage of their use. We emphasize that this work is the first to explore attention-based models to classify violence in videos.
Banca examinadora
Sandra Eliza Fontes de Avila IC/UNICAMP
Jefersson Alex dos Santos DCC/UFMG
Paula Dornhofer Paro Costa FEEC/UNICAMP
Esther Luna Colombini IC/UNICAMP
Álan Lívio Vasconcelos Guedes DI/PUC-Rio