23 Jun 2021
09:00 Doctoral defense Fully distance
Theme
Prediction-based software maintenance: a machine learning perspective
Student
Luiz Alberto Ferreira Gomes
Advisor / Teacher
Advisor: Mário Lucio Côrtes / Co-supervisor: Ricardo da Silva Torres
Brief summary
Software maintenance in Free / Libre Open Source (FLOSS) is based mainly on information extracted from bug reports registered in Bug Tracking Systems (BTS). This type of system is considered essential in communication and collaboration in both Closed Source Software (CSS) and FLOSS environments. This fact is particularly true for the latter, an environment characterized by many users and developers with different specialties, spread around the world. Such users and developers interact with BTS through bug reports, thus allowing communication with those responsible for maintaining the software. Users must then complete a bug report, providing a title, description and severity level. After completing the completion, a member of the maintenance team will review the information provided, and approve or reject the bug report. If the bug report is approved, the team member will provide more information, such as the indication of its priority and the assignment of a responsible person to correct the bug. In this scenario, the number of bug reports in large and medium-sized FLOSS projects is often high. The manual handling of this volume of bug reports can be totally tiring and subject to errors; and a wrong decision can seriously affect the planning of maintenance activities for that project. Due to this difficulty and the evident importance of the information contained in the bug reports for the planning of FLOSS maintenance, both the industry and the academic community have been showing a lot of interest in this problem, and much research has been carried out in this area. These efforts have been based mainly on traditional Machine Learning (AM) and Text Mining (MT) techniques. Such techniques have been applied successfully in solving real problems in many areas, including those related to BTS, such as the automatic assignment of the person responsible for fixing the bug. In a bug report, the severity level is one of the most critical variables for maintenance planning. It measures the impact of the bug on the execution of the software system and the time required for a bug to be resolved. On the other hand, many studies point out that most bugs that adversely affect the user experience and maintenance planning in versions of FLOSS are long-term bugs. A good portion of these long-lasting bugs can bore users for a long time and make it difficult for the management team to plan their activities, even in small quantities. In this context, our research focused on two critical areas, mentioned above, related to bug reports, providing the following contributions: a review of recent research efforts on automatic bug severity prediction that analyzed more than ten aspects of experiments published in the literature ; a survey and characterization of the long-term bug population in six popular FLOSS projects; and a comparison between five well-known AM algorithms in the task of predicting long-term bugs in the projects mentioned above. In addition, in our latest contribution, we propose the use of Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art deep neural learning network for Natural Language Processing (NLP), as a feature extractor, in contrast to the conventional method Term Frequency-Inverse Document Frequency (TF-IDF), to provide input features for pre-selected AM algorithms in long-term bug prediction. Our research efforts have produced a detailed and comprehensive view of the state-of-the-art of existing severity level forecasting approaches, indicating that traditional AM and MT applied to unstructured textual attributes of bug reports played a central role in the approaches presented . In addition, we demonstrate that it is possible to predict long-term bugs with good precision, despite the application of AM algorithms and simple MT methods, such as Neural Network and TF-IDF, in unstructured textual attributes of a bug report.
Examination Board
Headlines:
Mário Lucio Côrtes IC / UNICAMP
Hélio Pedrini IC / UNICAMP
Islene Calciolari Garcia IC / UNICAMP
Rosana Teresinha Vaccare Braga ICMC / USP
Maria Adriana Vidigal de Lima FACOM / UFU
Substitutes:
Eliane Martins IC / UNICAMP
Adler Diniz de Souza IMC / UNIFEI
Alexandre Mello Ferreira EEP