# Last edited on 2008-06-15 06:52:04 by stolfi PROJECT GOALS This project aims at building data sets for use in bioinformatics projects by J. Stolfi, Helena C. G. Leitão and collaborators, at IC-UNICAMP and IC-UFF, such as Bayesian detection of coding regions (Renatha Capua's diss.), mutual information through Fourier analysis (Luciana Pessôa's diss.), and so on. This project is not connected to IC-UNICAMP's extinct Laboratory for Bio-Informatics (LBI) nor current and past bioinformatics projects of other researchers at UNICAMP or UFF. MAIN DIRECTORY STRUCTURE The main sub-directories are reese - Exon/intron dataset by Martin Reese saxon - Exon/intron database (EID) by Saxonov et. al. ncbir - Whole genome datasets from NCBI RefSeq synth - Synthetic datasets tools - general-purpose tools. docs - miscellanelous documents and tables codes - translation code tables Other tools can be found at ${STOLFIHOME}/programs/c/DNA . This project used to be the directory ${STOLFIHOME}/programs/c/DATA . MISC TOOLS Splitting bibligraphic entries: gawk \ ' // { \ while(match($0, /[(][^()]*[0-9][\)]/)) \ { $0 = gensub(/[(]([^()]*[0-9])[\)][. ]*/, "@@ \\1\n", "g", $0); }; \ print; \ } \ ' \ | gawk \ ' BEGIN { n = 10; } \ /[@][@]/ { \ n++; \ gsub(/[@][@]/, ("[" n "].\n# [" n "]"), $0); \ } \ // { print; } \ '