Alexsandro Oliveira Alexandrino
On variants of the genome rearrangement distance problem
Considering a pair of genomes from individuals of related species, the goal of genome rearrangement distance problems is to estimate how distant these genomes are from each other based on genome rearrangements, which are mutational events that modify the genetic material or the relative position from segments of a genome. Using the Principe of Parsimony, the term distance, or rearrangement distance, refers to the minimum number of rearrangements necessary to transform one genome into the other. Seminal works in genome rearrangements assumed that both genomes being compared have the same set of genes (balanced genomes) and, furthermore, only the relative order of genes and their orientations, when they are known, are used in the mathematical representation of the genomes. These restrictions imply that it is possible to transform one genome into the other using only conservative rearrangements, which are rearrangements that do not alter the genetic material from a genome. In this case, the genomes are represented as permutations, originating the Sorting Permutations by Rearrangements problems. The main problems of Sorting Permutations by Rearrangements considered DCJs, reversals, transpositions, or the combination of both reversals and transpositions, and these problems have their complexity known. Besides these problems, other ones were studied involving the combination of transpositions with one or more of the following rearrangements: transreversals, revrevs, and reversals. Although there are approximation results for these problems, their complexity remained open. Some of the results of this thesis are the complexity proofs for these problems. Furthermore, we present a new 1.375-approximation algorithm, which has better time complexity, for the Sorting Permutations by Transpositions. As the field has progressed, new works started to consider unbalanced genomes and to incorporate the size distribution of intergenic regions. When considering unbalanced genomes, it is necessary to use insertions and deletions to transform one genome into another. In this thesis, we studied Rearrangement Distance problems on unbalanced genomes considering only gene order and their orientations (when they are known), as well as Intergenic Rearrangement Distance problems, which incorporate the information regarding the size distribution of intergenic regions, besides the use of gene order and their orientations (when they are known). We present new structures and concepts for problems that include reversals, transpositions, and the combination of reversals and transpositions. These structures and concepts are used in complexity proofs and approximation algorithms. Furthermore, we performed experiments in simulated and real genomes, showing the applicability of our algorithms.
Advisors: Zanoni Dias and Ulisses Dias
PhD Dissertation (In Portuguese), University of Campinas, Institute of Computing
Download
Slides