# Last edited on 2014-08-05 16:59:16 by stolfilocal THE RAW NUCLEOTIDE SUBDIR The sub-directory "bas" contains samples of DNA sequences 250-base samples: The files 250A.bas and 250B.bas were created by hand. They have an almost identical common segment. The file 250E.bas is a segment of 250 bases from Drosophila Simulans. It is the reference sequence for the mutated versions 250E-in{NN}.bas and 250E-in{NN}.bas 500-base samples: The files 500A.bas and 500B.bas were created by hand. They have a "homologous" part that covers about 80% of theor length. 2500-base samples: The file 2500A.bas is the first 2500 bases from ../match/DSIM.bas (presumably /Drosophila simulans/ chromosome 4) The file 2500B.bas was created from 2500A.bas by replacing the first 800 bases and the last 800 bases by disjoint subsequences from the end of ../match/bas/DSIM.bas, respectively with 790 and 810 bases. The file 2500C.bas was created from 2500B.bas by making small edits every 22 bases or so, with ../mutate_bas_lines.gawk mutated versions: The files {NAME}-mt{NN}.bas are copies of {NAME}.bas with a sequence of {NN} consecutive bases replaced by arbitrary different bases. The files {NAME}-in{NN}.bas are copies of NAME}.bas with a sequence of {NN} consecutive bases inserted at some point. FILTERED VERSIONS The filtered versions are in directory "eqs". They were obtained as follows tdir="${STOLFIHOME}/programs/c/JSLIBS/libdnaenc/tests/010_filter" ( cd ${tdir} && make -k all ) fdir="${tdir}/out" seqs=( \ 100A \ 250A 250B 500A 500B 1500A 2500A 2500B 2500C \ 500A-in01 500A-in02 500A-in04 500A-in08 \ 500A-mt01 500A-mt02 500A-mt04 500A-mt08 \ ) ls ${fdir}/*.eqs for seq in ${seqs[@]} ; do if [[ -s ${fdir}/${seq}-ot-00.eqs ]]; then cp -avu ${fdir}/${seq}-ot-??.eqs eqs/ fi done