31 January 2024
14:00 Master's Defense Room 53 of the IC2 building
Theme
Source Code Pattern Recognition and Rewriting for MLIR Using String-Based Automata
Student
Vinícius Couto Espindola
Advisor / Teacher
Guido Costa Souza de Araújo - Co-supervisor: Hervé Cédric Yviquel
Brief summary
A typical compiler flow relies on a unidirectional sequence of translation/optimization steps that reduce the program's level of abstraction, making it difficult to preserve high-level information at each transformation step. Modern ISA extensions and hardware accelerators can benefit from the compiler's ability to detect and elevate portions of the program for acceleration instructions or optimized library calls. Although recent studies based on MLIR (Multi-Level IR) have been proposed to raise the level of code abstraction, they depend on specialized languages, compiler recompilation or knowledge of specialized pattern description languages. This thesis discusses the Source-code Matching and Replacement (SMR) algorithm, a source-code-focused approach to pattern recognition and rewriting in MLIR, which does not require the intervention of a compiler expert. Leveraging the flexibility of MLIR, an intermediate representation, SMR offers an innovative approach to detecting and replacing languages ​​directly in source code. SMR employs a two-step acyclic directed graph recognition algorithm, implemented through string automata, inspired by previous work on pattern matching in trees. Initially, the Control-Dependency Graph (CDG) of the pattern is compared with the CDG of the code to be rewritten, discarding segments that do not have a control flow structure similar to the desired pattern. In the second step, the previously filtered code fragments have their Data-Dependency Graphs (DDGs) built and compared with the standard DDG. Experimental results demonstrate that SMR effectively identifies patterns in Fortran (FIR) and C (CIL), rewriting them as BLAS calls to raise the level of code abstraction for optimization. In addition to the core algorithm, this thesis explores the preprocessing capabilities of SMR, facilitating the integration of diverse front-ends and dialects, and ensuring the tool's adaptability to various front-ends and dialects. Challenges such as MLIR framework constraints, frontend sensitivities, and dialects are also discussed. Additional analysis also indicates performance improvements when using SMR for code replacement in areas such as approximate computing and hardware acceleration.
Examination Board
Headlines:
Guido Costa Souza de Araújo IC / UNICAMP
Sandro Rigo IC / UNICAMP
Guilherme de lima Ottoni Meta
Substitutes:
Lucas Francisco Wanner IC / UNICAMP
Alexandro José Baldassin DEMAC / UNESP