27 out 2020
14:00 Doctoral defense Fully distance
Theme
Visual Rhythm-based Convolutional Neural Networks and Adaptive Fusion for a Multi-stream Architecture Applied to Human Action Recognition
Student
Helena de Almeida Maia
Advisor / Teacher
Helio Pedrini
Brief summary
The large amount of video data produced and released every day makes visual inspection by a human operator impractical. However, the content of these videos can be useful for several important tasks, such as surveillance and health monitoring. Therefore, automatic methods are needed to detect and understand relevant events in videos. The problem addressed in this work is the recognition of human actions in videos that aims to classify the action that is being carried out by one or more actors. The complexity of the problem and the volume of video data suggest the use of techniques based on deep learning, however, unlike problems related to images, there is not a wide variety of specific well-established architectures or annotated data sets as large as those based on images. To get around these limitations, we propose and analyze a multichannel architecture composed of networks based on pre-trained images on the ImageNet base. Different image representations are extracted from the videos that serve as input to the channels, in order to provide complementary information for the system. Here, we propose new channels based on visual rhythm that encode longer-term information when compared to static frames and optical flow. As important as the definition of representative and complementary aspects is the choice of suitable combination methods that explore the strengths of each modality. Thus, here we also analyze different fusion approaches to combine the modalities. In order to define the best parameters of our fusion methods using the training set, we have to reduce overfitting in individual modalities, otherwise, 100% accurate outputs would not offer a realistic and relevant representation for the fusion method. Thus, we investigated an early stopping technique to train individual networks. In addition to reducing overfitting, this method also reduces training costs, as it usually requires less time to complete the classification process and adapts to new channels and data sets thanks to its trainable parameters. The experiments are performed on the UCF101 and HMDB51 data sets, which are two challenging bases in the context of stock recognition.
Examination Board
Headlines:
Hélio Pedrini IC / UNICAMP
Rodrigo Luis de Souza da Silva DCC / UFJF
Tiago José de Carvalho IFSP
Esther Luna Colombini IC / UNICAMP
Tiago Fernandes Tavares FEEC / UNICAMP
Substitutes:
André Santanchè IC / UNICAMP
Alexandre Mello Ferreira IC / UNICAMP
Gilson Antonio Giraldi LNCC