11 Mar
14:00 Master's Defense Fully distance
Prediction of Secondary Protein Structures using Machine Learning and BLAST
Gabriel Bianchin de Oliveira
Advisor / Teacher
Zanoni Dias - supervisor / Hélio Pedrini - co-supervisor
Brief summary
Proteins, which are sequences of amino acids, are fundamental in several biological processes of living beings. Due to physical and chemical interactions between the amino acids that form proteins, local and global three-dimensional structures are formed. With technological advances in the biological area, protein sequencing has become simple and quick to be done. On the other hand, the definition of local three-dimensional structures, called secondary structures, and global, called tertiary structures, remains costly. Three-dimensional structures have a high impact on the definition of protein functions and aid in the development of applications, such as medicines and biosensors. As an option for the definition of global protein structures from the amino acid sequence, the analysis of secondary structures has become the main intermediate method in the literature. To perform the prediction of secondary structures, two approaches are most commonly used, namely model-based methods, which use tools that find similar proteins, and model-free methods, which use machine learning classifiers. In recent works, several methodologies have been proposed to predict secondary structures, but this problem remains open. Another important point in current methods is that most approaches use evolutionary information in addition to the sequence of amino acids that form proteins, being unable to predict secondary structures using only the chain of amino acids. In this research, we propose several model-based and model-free classifiers to perform the classification of secondary structures of proteins. In addition to the individual classifier analysis, we investigated the fusion between model-based predictors and model-free predictors, as well as the fusion between all classifiers. Our predictors are able to classify secondary structures from amino acid sequences with or without evolutionary information, which is not possible for most of the methods available in the literature. The results obtained in three different databases show that our classifiers are competitive compared to the approaches in the literature.
Examination Board
Zanoni Dias IC / UNICAMP
Ricardo Cerri DC / UFSCar
Guilherme Pimentel Telles IC / UNICAMP
Esther Luna Colombini IC / UNICAMP
Felipe Rodrigues da Silva Embrapa