17 Jul
14:00 noon CEST Doctoral defense Fully distance
Partitioning Convolutional Neural Networks for Inference on Constrained Internet-of-Things Devices
Fabíola Martins Campos de Oliveira
Advisor / Teacher
Edson Borin
Brief summary
Billions of devices will make up the Internet of Things (Internet of Things (IoT)) in the coming years, generating an enormous amount of data that will need to be processed and interpreted efficiently. Most of the data is currently processed in the cloud, however, this paradigm cannot be adopted to process the huge amount of data generated by IoT, mainly due to bandwidth limits and latency requirements of many applications. Mist computing can be used to process this data, using the network infrastructure and the devices themselves. In this context, deep learning techniques are adequate to infer information from this data, but the memory requirements of deep neural networks can prevent even the inference from being performed on a single resource-constrained device. In addition, the computational requirements of deep neural networks can produce an unworkable runtime. To enable the execution of neural network models in resource-restricted IoT systems, the code can be partitioned and distributed among multiple devices. Different partitioning approaches are possible, however, some of them reduce the rate of inference at which the system can perform or increase the amount of communication between multiple devices. In this thesis, the objective is to distribute the inference execution of deep neural networks among several restricted IoT devices. Three automatic partitioning algorithms have been proposed, which model the deep neural network as a data flow graph and focus on the characteristics of IoT systems to define the objective functions and restrictions of the algorithms, such as rate of inferences, communication and memory limitations. . The first algorithm is Partitioning based on Kernighan and Lin, whose objective function is to reduce communication, respecting the memory restrictions of each device. The second algorithm is the Partitioning of Deep Neural Networks for Restricted IoT Devices, which, in addition to the first algorithm, can optimize the rate of inferences of the neural network and can also properly account for the amount of memory required by the shared parameters and biases of Neural Networks Convolution. Finally, the third algorithm is the Multilevel Partitioning of Deep Neural Networks for Restricted IoT Devices, an algorithm that uses the multilevel approach to reduce the size of the graph and take advantage of the capabilities of the previous algorithm. The main contribution of this thesis is to provide a study and new algorithms to partition deep neural networks in restricted IoT devices using different objective functions while respecting the memory restrictions of each device. Compared to algorithms in the literature, reduced communication is generally the only objective function offered and there is no consideration of memory restrictions, allowing these algorithms to produce invalid partitioning. In addition, our algorithms, in most cases, produce better results than the approaches in the literature. Another contribution is that the proposed algorithms can be used to partition any computation between any devices that can be expressed as a data flow graph.
Examination Board
Edson Borin IC / UNICAMP
Kalinka Regina Lucas Jaquie Castelo Branco ICMC / USP
Job Ueyama ICMC / USP
Edmundo Roberto Mauro Madeira IC / UNICAMP
Sandra Eliza Fontes from Avila IC / UNICAMP
Luiz Fernando Bittencourt IC / UNICAMP
Eurípedes Guilherme de Oliveira Nóbrega FEM / UNICAMP
José Marcos Silva Nogueira DCC / UFMG