# Last edited on 2017-02-11 11:30:46 by stolfilocal Creating a dataset of real matches ESCHERICHIA: EDH1 x E933 In 2012, Rafael ran this program: lastz.v1.02.00 \ bas/EDH1.bas \ bas/E933.bas \ --notrivial Parameters: EDH1 = /Escherichia_coli/ var K 12 DH1 4630707 bases E933 = /Escherichia_coli/ var O157 H7 EDL933 5521804 bases Scoring matrix: A C G T 91 -114 -31 -123 -114 100 -125 -31 -31 -125 100 -114 -123 -31 -114 91 O = 400, E = 30, K = 3000, L = 3000, M = 0 The output was saved as lav/EDH1_E933_LASTZ.lav DROSOPHILA: DSIM x DYAK The following sequences were downloaded from NCBI using FASTA (text) format: bas/CM000365.1.bas - Drosophila simulans chromosome 4, complete bas/CM000191.2.bas - Drosophila yakuba chromosome 4, complete In 2012, Rafael prepared these "normalized" files by removing all nucleotides except ATCG: bas/DSIM.bas - from bas/CM000365.1.bas bas/DYAK.bas - from bas/CM000191.2.bas In 2012, Rafael ran this program: lastz.v1.02.00 DSIM.bas DYAK.bas --notrivial --identity=70..100 Parameters: DSIM = /Drosophila simulans/ 807946 bases DYAK = /Drosophila dyakuba/ 1363842 bases Scoring matrix: A C G T 91 -114 -31 -123 -114 100 -125 -31 -31 -125 100 -114 -123 -31 -114 91 O = 400, E = 30, K = 3000, L = 3000, M = 0 The output was saved as lav/DSIM_DYAK_LASTZ.lav CONVERSION TO CDV FORMAT: Running {dm_lav_to_cdv}. For each ".lav" file found with LASTZ, create candidate files with (a) all pairings, (b) pairings with at least 2048 rungs and 85% matches (c) pairings with at least 256 rungs and 90% matches for tt in \ DSIM:807946:DYAK:1363842:LASTZ:0:00 \ DSIM:807946:DYAK:1363842:LASTZ:2048:85 \ DSIM:807946:DYAK:1363842:LASTZ:256:90 \ \ EDH1:4630707:E933:5521804:LASTZ:0:00 \ EDH1:4630707:E933:5521804:LASTZ:2048:85 \ EDH1:4630707:E933:5521804:LASTZ:256:90 \ ; do convert_lav_file.sh `echo ${tt} | tr ':' ' '` ; done ls -l cdv/*.cdv Got only the un-reversed matches; there were many reversed matches. CREATING FALSE CANDIDATES Scrambling the 90% candidates to create false candidates: for tt in \ DSIM:DYAK:LASTZ:2048:85 \ EDH1:E933:LASTZ:2048:85 \ ; do scramble_cands.sh `echo ${tt} | tr ':' ' '` ; done ls -l cdv/*_FF.cdv