21 February 2025
09:00 Master's Defense Room 53 of IC 2
With the
Training and Inference of Weightless Neural Networks on Encrypted Data
Student
Leonardo Henrique Neumann
Advisor / Teacher
Edson Borin / Co-advisor - Antonio Carlos Guimarães Junior
Brief summary
The mass adoption of machine learning algorithms has raised concerns within the data privacy research community, calling for efforts to develop privacy-preserving techniques. Among these approaches, homomorphic evaluation of machine learning algorithms stands out for being able to compute directly on encrypted data, offering robust confidentiality guarantees. While there has been significant progress in efficient homomorphic encryption (HE) algorithms for inference in Convolutional Neural Networks (CNNs), there are still no efficient solutions for encrypted training. Current solutions often rely on interactive protocols, which, while preserving privacy, impose a huge communication cost. This limitation highlights the demand for faster, privacy-preserving machine learning solutions that can maintain data confidentiality and model performance across a wide range of applications.
This work presents a new approach to privacy-preserving machine learning through homomorphic evaluation of the Wilkie, Stonham, and Aleksander Recognition Algorithm (WiSARD) (Aleksander et al., 1984) and subsequent state-of-the-art Weightless Neural Networks (WNNs) using the fully homomorphic encryption (FHE) TFHE scheme. We present several contributions, including extensions to TFHE, parameter optimizations, and modifications to WiSARDs to improve accuracy. Our approach enables FHE-based training and inference, along with complementary techniques such as homomorphic balancing.
We evaluate our WiSARD homomorphic models against state-of-the-art approaches on three benchmark datasets: MNIST, HAM10000, and Wisconsin Breast Cancer. Our results demonstrate significant performance improvements, achieving competitive latency levels in minutes of encrypted training compared to the days required by previous work. For MNIST, we achieved 91,71% accuracy after just 3,5 minutes of encrypted training, increasing to 93,76% after 3,5 hours. On HAM10000, we achieved 67,85% accuracy in just 1,5 minutes, increasing to 69,85% after 1 hour. Compared to Glyph (Lou et al., 2020), the state-of-the-art in homomorphic training, these results represent performance gains of up to 1200 times with a maximum accuracy loss of 5,4%. For HAM10000, we even achieved a 0,65% accuracy improvement while being 60 times faster.
Our models offer a good balance between speed, accuracy, and privacy preservation. We also demonstrated the practicality of our approach on consumer-grade hardware, training over 1000 MNIST images in 12 minutes or the entire Wisconsin Breast Cancer dataset in just 11 seconds using a single core and less than 200 MB of memory. Our technique stands out for its flexibility in scenarios such as distributed, federated and continuous learning.
Examination Board
Headlines:
Edson Borin | IC / UNICAMP |
Hilder Vitor Lima Pereira | IC / UNICAMP |
Priscila Machado Vieira Lima | UFRJ |
Substitutes:
Allan Mariano de Souza | IC / UNICAMP |
Marco Aurélio Amaral Henriques | FEEC / UNICAMP |