Doctorate Defense of Jose Luis Flores Campana

01 April 2024

13:30 Doctoral defense Room 85 of IC2

Theme

Improved Image Filling Based on Vision Transformers and Pencil-Sketch

Student

Jose Luis Flores Campana

Advisor / Teacher

Hélio Pedrini - Co-supervisor: Helena de Almeida Maia

Brief summary

Image filling is a computer vision technique focused on restoring damaged or missing regions in an image. Since the advent of deep neural networks, especially convolutional neural networks (CNNs), image filling has made great progress in restoring damaged images. However, the limited receptive fields of CNNs can sometimes result in unreliable results due to their inability to capture the global context of the image. Recently, Transformers have been used in the field of computer vision to deal with the problem of CNNs to model the global context of the image. Transformers can learn long-range dependencies through self-attention mechanisms, and because of this ability, Transformers can also be essential for achieving realistic results when image content has large missing regions and complex scenes. However, the quadratic computational and memory costs in Transformers make their use prohibitive in high-resolution images and restricted devices. To overcome this problem, we propose a Vision Transformers architecture with variable hyperparameters that (i) subdivides the feature maps into a variable number of multiscale slices, (ii) distributes the feature map into a variable number of heads to balance the complexity of the self-attention operation, and (iii) includes a new strategy based on depth-first convolution to reduce the number of feature map channels sent to each Transformer block. Furthermore, to generate more consistent results, some approaches also incorporated auxiliary information to guide the model's understanding of structural information. Therefore, to deal with the problem of inconsistency between structure and texture, as well as avoid the generation of artifacts, we developed a new method for image filling that uses pencil-sketch information to guide the restoration of structural elements as well as texture. Unlike previous work that employs edges, lines, or segmentation maps, we leverage the pencil-sketch mastery and Transformers capabilities to learn long-range dependencies to properly combine structure and texture information, producing more consistent results. We conduct experiments on three datasets from the literature: Places2, CelebA, and Paris StreetView. Our experiments show that our method consistently achieved the best results for the FID and LPIPS metrics on the CelebA dataset. We obtained competitive results for the Places2 and Paris StreetView datasets compared to state-of-the-art methods. Furthermore, our model performed best in terms of model size, number of parameters, and FLOPS.

Examination Board

Headlines:

Hélio Pedrini	IC / UNICAMP
Marcelo da Silva Reis	IC / UNICAMP
Leo Sampaio Ferraz Ribeiro	IC / UNICAMP
Samuel Botter Martins	Banco Itaú
Luiz Maurílio da Silva Maciel	ICE/UFJF

Substitutes:

Andre Santanche	IC / UNICAMP
Fátima de Lourdes dos Santos Nunes Marques	EACH / USP
Ronaldo Cristiano Prati	CMCC / UFABC

José Luis Flores Campana's PhD Defense

Related

News

Institute of Computing opens Selection Process for the Postdoctoral Researcher Program

Partners