21 January 2021
10:00 Master's Defense Fully distance
Theme
Fault Detectors: Testing Platform and Comparative Study
Student
Valdomiro Luis Scannapieco Neto
Advisor / Teacher
Luiz Eduardo Buzato
Brief summary
Almost thirty years ago, between 1991-1992, Chandra, Toueg and Hadzilacos introduced the concept of fault detectors and showed how to use them to resolve consensus in asynchronous distributed systems subject to partial failures. During the following years, the abstraction of fault detectors proved to be an essential tool for engineering high-availability distributed systems. In short, fault detectors represent an elegant tool that allows distributed system designers to factor the time assumptions used to detect failures in distributed consensus algorithms. Currently, a significant number of fault detection algorithms have been published; each of them supposedly bringing a better solution for fault detection; usually based on an ad hoc assessment of the proposed algorithm. The lack of a common benchmark or testing platform for fault detectors represents an extra obstacle for system engineers when they need to choose a suitable fault detector for their application. In this context, it seems reasonable to ask the following question: what is the best fault detector for a given application, running on a given distributed system? In this work, an application is an active replication system developed over consensus-based broadcast transmission based on consensus (DTOC). DTOC is the common denominator for a large number of real applications. Chen, Toueg and Aguilera (CTA) proposed metrics to characterize the quality of the service provided by a fault detector. The metrics quantify: i) how fast a failure detector detects real failures and ii) how well it avoids false detections. This dissertation proposes, implements and evaluates a testing platform for fault detectors based on the widely accepted CTA metrics. Then, it uses this platform to seek an answer to the question asked above. The contributions of this research are: (i) the proposal of an experimental method to uniformly evaluate the behavior of the fault detectors, (ii) the implementation of a test platform to support the method, and (iii) a comparative study of four detectors known faults.
Examination Board
Headlines:
Luiz Eduardo Buzato IC / UNICAMP
Eliane Martins IC / UNICAMP
Regina Lucia de Oliveira Moraes FT / UNICAMP
Substitutes:
Guido Costa Souza de Araújo IC / UNICAMP
Daniel Cason Università della Svizzera Italiana, USI