12 Mar 2021
10:00 Master's Defense Fully distance
Theme
Exploring Associative Processing with the RV-Across simulator
Student
Jonathas Evangelista da Silveira
Advisor / Teacher
Lucas Francisco Wanner
Brief summary
Recent work at the academy points to a performance bottleneck between the processor and memory. This bottleneck stands out in the execution of applications, such as Machine Learning, which process a large amount of data. In these applications, data movement represents a significant portion both in terms of processing time and energy consumption. The use of new multi-core architectures, accelerators and Graphics Processing Units (GPU) can improve the performance of these applications through parallel processing. However, using these architectures does not eliminate the need to move data, which passes through different levels of a memory hierarchy to be processed. Our work explores Processing in Memory (Processing in Memory --- PIM), specifically Associative Processing, as an alternative to speed up applications, processing your data in parallel in memory allowing for better system performance and energy savings. Associative Processing provides high-performance, low-power parallel computing using a Content-Addressable Memory --- CAM. Through the power of comparison and writing in parallel of the CAM, complemented by special control registers and lookup tables, it is possible to perform operations between data vectors using a small and constant number of cycles per operation. In our work, we analyze the potential of Associative Processing in terms of execution time and energy consumption in different application kernels. For this, we developed RV-Across, an associative processing simulator based on RISC-V for testing, validation and modeling of associative operations. The simulator facilitates the design of associative and near-memory processing architectures, offering interfaces both for the construction of new operations and for high-level experimentation. We created an architectural model for the simulator with associative processing and compared this model with alternatives based on CPU and multi-core. For performance evaluation, we built a latency and energy model based on data from the literature. We apply the model to compare different scenarios, changing input characteristics and the size of the Associative Processor in the applications. Our results highlight the direct relationship between data size and the potential improvement in associative processing performance. For the 2D convolution, the Associative Processing model obtained a relative gain of 2x in latency, 2x in energy consumption, and 13x in the number of load / store operations. In matrix multiplication, the acceleration increases linearly with the matrix dimension, reaching 8x for 200x200 byte matrices and surpassing the parallel execution on an 8-core CPU. The advantages of associative processing evidenced in the results reveal an alternative for systems that need to maintain a balance between processing and energy expenditure, such as embedded devices.
Examination Board
Headlines:
Lucas Francisco Wanner IC / UNICAMP
Alba Cristina Magalhães Alves de Melo CIC / UnB
Rodolfo Jardim de Azevedo IC / UNICAMP
Substitutes:
Sandro Rigo IC / UNICAMP
João Paulo Labegalini de Carvalho University of Alberta / Canada