Genome Rearrangements
In the mathematical theory of genome rearrangements, genomes
are represented by lists of elements (genes) and the
rearrangements are operations that modify these lists. Typical
examples of such events are reversals and transpositions.
I have been interested in these problems since my first
contact with them on a Combinatorial Pattern Matching (CPM)
conference, in Asilomar, California, USA, in 1994. From this date
up to around 2007, several students of mine, including
Maria Emília Machado Telles Walter, Zanoni Dias,
Cleber Valgas Gomes Mira, Vinicius Fortuna, and André Atanasio
Maranhão Almeida worked on theoretical aspects of the problem.
They helped me shape an interesting idea, which is to apply the
mathematical theory of permutations to rearragement problems,
leading to an embrionary form of what we called algebraic
rearrangement theory.
After studying the theory of genome rearrangements for some
time, I felt the urge to apply some of the things I learned to
actual genomes. My
Masters students Patrícia Pilisson Côgo and Karina Zupo de
Oliveira applied the theory to Vibrio genomes, in an
informal collaboration with Fabiano Thompson, then at the
University of Campinas, now at the Federal University of Rio de
Janeiro, who studies these bacteria.
Côgo's contribution included working out several practical
aspects of transforming real genomes into permutations, such as
identifying homologous regions, treating regions without
counterpart in the other genome, and treating duplications. We
used the DCJ distance
to construct a phylogenetic tree of the 6 Vibrio genomes
completely sequenced up to 2005.
Côgo implemented her tests in a series of perl scripts.
Oliveira took them a step further, rewriting the entire pipeline
in Java, using a modern IDE and version control. In the
process, she noticed that the way we were constructing protein
families to assess homology was very time consuming. She
suggested instead that we should use Protein Clusters,
a protein family database and tools available from NCBI. This
reduced the time to add a new species from a month to less than
a day. As a result, she could pairwise compare
ten Vibrio species plus an outgroup (E. coli).
Côgo and Oliveira used DCJ as a distance measure after
reducing the genomes to equal content without repeated genes.
Another student of mine, Pedro Feijão, a doctoral student,
started to experiment with something we called
SCJ, a single-cut counterpart of DCJ. It is a measure for
which several hard problems turn out to have tractable
solutions, including small parsimony. He also wrote lots of
Java code to implement SCJ.
In 2011, Pedro Feijão discovered a way of extending the
algebraic theory to linear genomes, yielding a distance very
close to, but not exactly the same as, DCJ. Like DCJ, this
algebraic distance is easy to compute between two genomes, but
medians, which are hard for DCJ, are easy for the algebraic
distance, if one is willing to dive into orthogonal matrices
(see below).
Algebraic rearrangement theory may be a
tough subject for many, but we love it. Feijão tried hard to
make it palatable to a wider audience in his PhD thesis, and I
think he did a wonderful job. We also
investigated this same distance in the more general settings of
non-genomic permutations and of general matrices, with my
students João Paulo Pereira Zanetti and Priscila Biller.
Biller also run many
experiments with SCJ, to try and find out its suitability to
reconstructing phylogenies, with good results. She improved on
Feijão's initial Java code, and wrote a manual to help others use the code.
In 2013, Sophia Yancopoulos kindly invited to me co-author a
contribution to the MAGE workshop in honor of David Sankoff.
The text explores the DCJ and algebraic distances, trying to
uncover the reasons for their similarities and differences.
The algebraic distance is an alternative to DCJ, since both are
easy to compute and model many of the rearrangement events
observed in genome evolution. Our interest turned to finding
genomes that are close to three given genomes, an important step
in phylogeny reconstruction. Besides the traditional medians,
which minimize the sum of the algebraic distances, we are
experimenting with minimax genomes, which minimize the maximum
of the distances.
The algebraic distance has also an interesting relationship
with matrix theory. Genomes can be seen as matchings of gene
extremities. We can code such a matching as a matrix as
follows. Consider the adjacency matrix of this matching, but
add 1's in the diagonal for the unmatched vertices. Voilá! You
have a genome matrix. It turns out that the algebraic distance
between two genomes is exactly half of the rank distance between
the corresponding matrices.
Looking at genomes as matrices has an added benefit: one can
solve the median problem in polynomial time. However, the
solution is not always "genomic".
Here is some of our work on the subject, in reverse
chronological order:
Talks
- J. Meidanis.
Center genomes with respect to rank distance.
Presented at the
Bazilian Symposium of Bioinformatics,
On-line, Brazil, in November 25th, 2020.
Download pdf
- J. Meidanis.
Counting scenarios, intermediate genomes, and dealing with
missing genes with the rank distance.
Presented at the
Comparative Genomics in Campo Grande Workshop.
Campo Grande, Brazil, in December 2019.
Download pdf
- J. Meidanis.
Rank Distance Sheds Light on Genome Evolution.
Presented at the
22nd Conference of the International Linear Algebra Society (ILAS 2019).
Rio de Janeiro, Brazil, in July 2019.
Download Abstract
Download presentation
- J. Meidanis.
Genome Matrices and The Median Problem.
Presented at
the Department of Biochemistry,
University of Sao Paulo, Brazil, in June 2019.
Download pdf
- J. Meidanis.
Counting Sorting Scenarios and Intermediate
Genomes for the Rank Distance.
Presented at
the 6th International Conference on Algorithms for Computational Biology,
Berkeley, CA, USA, in May 2019.
Download pdf
- J. Meidanis.
Genome Matrices and The Median Problem.
Presented at
the Brazilian Symposium on Bioinformatics,
Niteroi, Brazil, in October 2018.
Download pdf
- J. Meidanis.
Genome Matrices and The Median Problem.
Presented at
the Department of Mathematics,
Universidade Federal Fluminense, Brazil, in December 2017.
Download pdf
- J. Meidanis.
Genome Matrices and The Median Problem.
Presented at
the II Symposium on Biological Sciences,
Ouro Preto, Brazil, in November 2017.
Download pdf
- J. Meidanis.
Genome Matrices and Applications.
15-minute recruiting presentation at
the Institute of Computing,
University of Campinas, Brazil, in August 2017.
Download pdf
- J. Meidanis.
Genome Matrices and The Median Problem.
Presented at
the Department of Computer Science and Operations Research,
Université de Montréal, Canada, in December 2016.
Presented at
the Cheriton School of Computer Science,
University of Waterloo, Canada, and
at
the School of Computer Science,
University of Windsor, Canada, and
at
the Discrete Math Seminar Series,
Simon Fraser University, Canada, in May 2017.
Presented at
the Institute of Computing,
University of Campinas, Brazil, in July 2017.
Download pdf
Publications
- J. P. P. Zanetti, L. P. Oliveira, L. Chindelevitch, J. Meidanis.
Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, Early Access.
10.1109/TCBB.2023.3277733 .
This is an extended version of the homonymous AlCoB 2019 paper.
- J. P. P. Zanetti, L. P. Oliveira, L. Chindelevitch, J. Meidanis.
Generalizations of the Genomic Rank Distance to Indels.
Bioinformatics, Volume 39, Issue 3, March 2023,
10.1093/bioinformatics/btad087 .
This is an extended version of the homonymous AlCoB 2019 paper.
- P. Biller, J. P. P. Zanetti, J. Meidanis.
Center Genome With Respect to the Rank Distance.
BSB: XIII Brazilian Symposium on Bioinformatics,
Online, Brazil, November 2020.
Download draft
- J. Meidanis, L. Chindelevitch.
Fast median computation for symmetric, orthogonal matrices under
the rank distance.
Linear Algebra and its Applications, Volume 614, 1
April 2021, Pages 394-414.
10.1016/j.laa.2020.10.030.
This article presents a speed-up of the orthogonal algorithm for
symmetric matrices, including genomes.
- D. Sankoff, C. Zheng, Y. Zhang, J. Meidanis, E. Lyons,
H. Tang.
Models for Similarity Distributions of Syntenic Homologs and
Applications to Phylogenomics.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics 16(3) 727-737 (2019).
10.1109/TCBB.2018.2849377
- J. P. P. Zanetti, L. Chindelevitch, J. Meidanis.
Generalizations of the Genomic Rank Distance to Indels.
AlCoB: 6th International Conference on Algorithms for
Computational Biology,
Berkeley CA, USA, May 2019.
- J. P. P. Zanetti, L. Chindelevitch, J. Meidanis.
Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance.
AlCoB: 6th International Conference on Algorithms for
Computational Biology,
Berkeley CA, USA, May 2019.
Download draft
Appendix
- L. Chindelevitch, J. Meidanis.
A cubic algorithm for the generalized rank median of three genomes.
RECOMB-CG: 16th Annual Satellite
Workshop on Comparative Genomics,
Sherbrooke, Canada, October 2018.
Journal version: Chindelevitch, L., La, S. & Meidanis, J. A
cubic algorithm for the generalized rank median of three
genomes. Algorithms Mol Biol 14, 16 (2019).
This article presents the MI algorithm for
genome medians.
10.1186/s13015-019-0150-y
- L. Chindelevitch, J. Meidanis.
On the Rank-Distance Median of 3 Permutations.
RECOMB-CG: 15th Annual Satellite
Workshop on Comparative Genomics,
Barcelona, Spain, October 2017. The journal version of this
paper, which appeared
in
BMC Bioinformatics 2018 19 (Suppl6):142 with additional
author J. P. P. Zanetti, presents the orthogonal algorithm for genome
medians.
Download draft
- J. Meidanis, P. Biller, J. P. P. Zanetti.
A Matrix-Based Theory for Genome Rearrangements.
Technical Report, Institute of Computing, University of
Campinas, 2017
Download pdf
- J. P. P. Zanetti, P. Biller, and J. Meidanis.
Median Approximations for Genomes Modeled as Matrices.
Bull Math Biol (2016) 78: 786.
doi:10.1007/s11538-016-0162-4.
Simulated data: add/remove adjacencies up to 30%.
Simulated data: DCJ operation up to 30%.
Real data.
- J. P. P. Zanetti, P. Biller, and J. Meidanis.
On the Matrix Median Problem.
Proceedings of the WABI 2013: 13th Workshop on Algorithms in
Bioinformatics, Sophia Antipolis, France, September 2013.
Download draft
- P. Feijao, J. Meidanis.
The Algebraic Theory for Genome Rearrangements.
Poster presented at WABI 2013: 13th Workshop on Algorithms in
Bioinformatics, Sophia Antipolis, France, September 2013.
Download pdf
- J. Meidanis, and S. Yancopoulos.
The emperor has no caps! A Comparison of DCJ and Algebraic
Distances. Proceedings of MAGE: Models and Algorithms for Genome
Evolution, Montreal, Canada, August 2013.
Download preliminary text
- J. P. P. Zanetti, P. Biller, J. Meidanis.
On the matrix median problem.
Poster presented at MAGE: Models and Algorithms
for Genome Evolution,
Montreal, Canada, August 2013.
Download pdf
- P. Biller, P. Feijão, and J. Meidanis.
Rearrangement-based phylogeny using the Single-Cut-or-Join operation.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 2013.
Download pdf.
Code,
data, and
companion manual.
- J. Meidanis, P. Feijao, P. Biller, J. P. P. Zanetti.
On the algebraic genome median.
Poster presented at RECOMB-CG: 10th Annual Satellite
Workshop on Comparative Genomics,
Niteroi, Brazil, October, 2012.
Download pdf
- P. Feijão.
On genome rearrangement models.
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in October 2012.
Download pdf
- P. Feijão, J. Meidanis.
Extending the Algebraic Formalism for Genome Rearrangements to
Include Linear Chromosomes.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 10(4), 819-831, 2012. An abridged version
appeared in the
Proceedings of the 2012 BSB: Brazilian Symposium on
Bioinformatics, Campo Grande, Brazil, August 2012.
Download abridged version
- P. Biller.
Experimentos em rearranjo de genomas
com a operação Single-Cut-or-Join.
(In Portuguese, with Abstract in English)
MSc Thesis, University of Campinas, March 2010. 101 pp.
Download pdf
- P. Feijão, J. Meidanis.
SCJ: A Breakpoint-Like Distance that Simplifies Several
Rearrangement Problems.
IEEE/ACM Transactions on Computational Biology and
Bioinformatics, vol. 8, no. 5, pp. 1318-1329,
Sept.-Oct. 2011,
doi:10.1109/TCBB.2011.34. This is an extended version of
the WABI 2009 paper.
- K. Z. de Oliveira.
Construção de Filogenias Baseadas em Genomas Completos.
(Phylogeny Construction based on Complete Genomes),
in Portuguese, with Abstract in English)
MSc Thesis, University of Campinas, March 2010. 101 pp.
Download pdf
Supplementary material
- P. Feijão, J. Meidanis.
SCJ: a variant of breakpoint distance for which sorting, genome
median, and genome halving problems are easy.
Proceedings of the WABI 2009: 9th Workshop on Algorithms in
Bioinformatics, Philadelphia, USA, September 2009.
Download draft
- P. P. Côgo.
Comparação de Genomas Completos de Espécies da
Família Vibrionacea Empregando Rearranjos de Genomas.
(In Portuguese, with Abstract in English)
MSc Thesis, University of Campinas, March 2008. 59 pp.
Download pdf
- P. P. Côgo, J. Meidanis.
A rearrangement-based approach to compare whole genomes of
Vibrionaceae strains.
Technical Report IC-08-05. Institute of Computing, University of
Campinas, 2008.
Download pdf
- P. P. Côgo, J. Meidanis, F. Thompson.
Comparison Between Complete Genomes of Vibrio Species.
Poster presented at the ISMB: 14th Annual International
Conference on Intelligent Systems for Molecular Biology,
Fortaleza, Brazil, August, 2006.
Download pdf
- P. P. Côgo, J. Meidanis.
Comparison Between Complete Genome Sequences
of Vibrionaceae Species.
Poster presented at the X-Meeting: 1st International
Conference of the Brazilian Association for Bioinformatics and
Computational Biology, Caxambu, Brazil, October, 2005.
Download pdf
- C. V. G. Mira.
Análise Algébrica de Problemas de Rearranjo em Genomas:
Algoritmos e Complexidade.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in October 2007.
Download pdf
- A. A. M. Almeida.
Comparação Algébrica de Genomas:
o caso da distância de reversão.
(In Portuguese)
MSc Thesis, Institute of Computing, University of Campinas,
April 2007, 125p.
Download pdf
- C. V. G. Mira, J. Meidanis.
A New Data Structure for Genome Rearrangement Problems.
Poster in ISMB/ECCB7. Vienna, Austria, 2007.
Download pdf
- C. V. G. Mira, J. Meidanis.
Sorting by Block-Interchanges and Signed Reversals.
Proceedings of the Fourth International Conference on
Information Technology: New Generations (
ITNG'07), p. 670-676. IEEE Computer Society, Los Alamitos,
USA, 2007.
Download pdf
- V. Fortuna.
Distâncias de Transposição entre Genomas.
(In Portuguese)
MSc Thesis, Institute of Computing, University of Campinas,
February 2005, 77p.
Download pdf
- C. V. G. Mira, J. Meidanis.
Analisys of Sorting by Transpositions based on Algebraic Formalism.
Poster in RECOMB'04. San Diego, USA, 2004.
Download presentation
Download poster
- Z. Dias.
Rearranjo de Genomas: Uma Coletânea de Artigos.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in November
2002.
Download pdf
- J. Meidanis, M. E. M. T. Walter, Z. Dias.
A Lower Bound on the Reversal and Transposition Diameter.
Journal
of Computational Biology, Vol.9, No.5, pp. 743-745, 2002.
This is a compact version of a piece of work described below
(*).
- Z. Dias, J. Meidanis.
Sorting by Prefix Transpositions.
Proceedings of SPIRE'2002 - String Processing and
Information Retrieval.
Lecture Notes in Computer Sciences, vol.2476. September,
11-13, 2002. Lisbon, Portugal.
Download pdf
View Presentation
- J. Meidanis, Z. Dias.
The Genome Rearrangement Distance by Fusion, Fission, and
Transposition with Arbitrary Weights.
Technical Report IC-02-01, Institute of Computing,
University of Campinas.
Download pdf
- Z. Dias, J. Meidanis.
Genome Rearrangements Distance by Fusion, Fission, and
Transposition is Easy.
Proceedings of SPIRE'2001 - String Processing and
Information Retrieval. November,
13-15, 2001. Laguna de San Rafael, Chile.
Summary.
A version appeared as Technical Report IC-01-07, Institute of Computing, University of Campinas.
- M. E. M. T. Walter, Z. Dias, J. Meidanis.
A New Approach for Approximating The Transposition Distance.
Proceedings of SPIRE'2000 - String Processing and
Information Retrieval. September,
27-29, 2000. La Coruna, Spain.
Download pdf
- J. Meidanis, Z. Dias.
An Alternative Algebraic Formalism for Genome Rearrangements.
Comparative Genomics, David Sankoff and Joseph Nadeau,
editors, Kluwer Academic Publishers, 2000, pp. 213-223.
Download pdf
- M. E. M. T. Walter.
Algoritmos para Problemas em Rearranjo de Genomas.
(In Portuguese)
PhD Thesis, presented at
the Institute of Computing, University of Campinas, in December
1999.
Download pdf
- (*) M. E. M. T. Walter, Z. Dias, J. Meidanis.
A Lower Bound on the Reversal and Transposition Diameter.
This is a revised and more detailed version of part of the
paper presented in SPIRE'98. It computes the reversal and
transposition distance between a permutation p and the
one obtained from p by reversing each gene. A compact
version was published in 2002 in the Journal of Computational
Biology.
Download pdf
- M. E. M. T. Walter, Z. Dias, J. Meidanis.
Reversal and Transposition Distance of Linear Chromosomes,
Proceedings of SPIRE'98 - String Processing and
Information Retrieval: A South American Symposium. September,
9-11, 1998, pp. 96-102. Santa Cruz de la Sierra, Bolivia.
Download pdf
- J. Meidanis, M. E. M. T. Walter, Z. Dias.
Transposition Distance Between a Permutation and Its Reverse.
Proceedings of WSP'97 - Fourth South American Workshop
on String Processing. November,
12-13, 1997, pp. 70-79. Valparaiso, Chile.
Download pdf
- J. Meidanis, M. E. M. T. Walter, Z. Dias.
Distancia de Reversao de Cromossomos Circulares.
(In Portuguese.)
Proceedings of SEMISH - XXIV Seminario Integrado de Software
e Hardware. August,
2-8, 1997, pp. 119-131. Brasilia, Brazil.
Download pdf
Joao Meidanis
Last modified: Thu Dec 14 17:26:15 -03 2023
by JM