Exercises marked with (*) require further reading/search beyond the suggested texts.
2. (DCJ without caps) Consider a graph formed by all synteny block extremities as vertices and all adjacencies from genomes A and B as edges. The connected components of this graph are cycles, paths with an odd number of edges, and paths with an even number of edges. Derive a formula for the DCJ distance using the sizes of each such component.
Answer:
Let us call the graph considered in this question the Line-Adjaceny Graph, or LAG, because it is the line graph of the Adjacency Graph. A cycle with 2k edges in the LAG contains k YAF breakpoints and 1 cycle, so it will contribute b - c = k - 1 to the DCJ distance.
A path with 2k+1 edges in LAG will correspond to a path with 2k+3 edges in the YAF construction, because of the extra caps. This will necessarily be an AA- or BB-path, and must therefore be closed with an extra edge to form a cycle with 2k+4 edges. This cycle will contribute b - c = (k + 2) - 1 = k + 1 to the DCJ distance, and this is the contribution of the 2k+1 path we started with.
A path with 2k edges in LAG will correspond to a path with 2k+2 edges in the YAF construction, because of the extra caps. This will necessarily be an AB-path, and its extemities will be identified to form a cycle with 2k+2 edges. This cycle will contribute b - c = (k + 1) - 1 = k to the DCJ distance, and this is the contribution of the 2k path we started with.
To summarize, each component in LAG contributes as follows towards the DCJ distance:
2k-cycle: k - 1 (2k+1)-path: k + 1 2k-path: k
In addition, we can offer the following formula to compute the DCJ distance from a LAG:
dDCJ = N - C - Peven / 2,
where N is the number of synteny blocks, C is the number of cycles and Peven is the number of even paths in the LAG. Notice that isolated vertices must be counted as even paths (they are paths of length zero).
© 2015 Joao Meidanis