07 April 2021
14:00 Doctoral defense Fully distance
Theme
Protocols for Private Set Intersection (PSI) and for Private Classification based on Naive Bayes
Student
Amanda Cristina Davi Resende
Advisor / Teacher
Diego de Freitas Spider
Brief summary
Due to the care in controlling access to sensitive / confidential information, there is a great concern in building computer systems that maintain the privacy of the entities that provide them. Thus, this work presents protocols for Private Set Intersection (PSI) and Machine Learning preserving privacy. PSI protocols allow two parties to calculate the intersection of their private sets without revealing any additional information beyond the result of the intersection, for one part (unidirectional) or both parts (mutual). Despite the various PSI protocols available in the literature, only recently have techniques been applied to existing protocols in order to make them more efficient when one party has a much smaller set of elements than the other. This is a realistic scenario in many cases, featuring an unbalanced configuration. Thus, we build on modern cryptographic engineering techniques and propose optimizations for promising unidirectional PSI protocols based on public key cryptography that are safe against both semi-honest and malicious opponents. We show that our improvements and optimizations generate protocols that improve the communication complexity and the execution time of previous proposals in the unbalanced configuration. In addition, we propose a Naive Bayes classifier that preserves privacy and apply it to the private text classification problem. In this scenario, one party (Alice) has a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will learn only the result of the classifier applied to her text input and Bob will learn nothing. Our solution is based on Secure Multipart Computing (MPC) and provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic and can be used in any other scenario where the Naive Bayes classifier can be used), we can classify an SMS as spam or non-spam (ham) in less than 340 ms in the case that the dictionary size of Bob's model includes all words (n = 5200) and Alice's SMS has a maximum of m = 160 unigrams. In the case of n = 369 in = 8 (the average of a spam SMS in the database), our solution takes just 21 ms. Based on our results, we believe that our proposals are practical alternatives to the solutions currently in place to preserve privacy, especially, but not only, in applications such as discovering contacts on existing social networks and classifying an SMS as spam or not. spam (ham).
Examination Board
Headlines:
Diego de Freitas Aranha IC / UNICAMP - AUNIV
Marco Aurélio Amaral Henriques FEEC / UNICAMP
Jeroen Antonius Maria van de Graaf DCC / UFMG
Marcos Antonio Simplicio Junior PCS / EPUSP
Conrado Porto Lopes Gouvêa kryptus
Substitutes:
Julio Cesar Lopez Hernandez IC / UNICAMP
Anderson Clayton Alves Nascimento UW
Mário Sérgio Ferreira Alvim Júnior DCC / UFMG