Hacking at the Voynich manuscript - Side notes 100 Preparing a clean Voynichese sample for analysis Last edited on 2025-05-04 21:19:09 by stolfi SUMMARY We prepare clean Voynichese samples of prose and labels (without weirdos, unreadable characters, or contentious readings) for the statistical analyses that will go into the "lexeme structure" technical report. Redid it on 2025-04-29 to fix a bug in "". It was inserting blanks lines in the "raw.tlw" files AFTER the first token of a new line/paragraph, instead of BEFORE it. Must check all other notes -- did they depend on those blank lines? SETTING UP THE ENVIRONMENT Links: ln -s ../.. work ln -s work/basify_weirdos.gawk ln -s work/combine_counts.gawk ln -s work/compute_cum_cum_freqs.gawk ln -s work/compute_cum_freqs.sh ln -s work/compute_freqs.gawk ln -s work/extract_section_from_evt.sh ln -s work/format_words_filled.sh ln -s work/format_counts_packed.gawk ln -s work/select_units.gawk ln -s work/show_first_last_lines.sh ln -s work/totalize_fields.gawk ln -s work/update_paper_include.sh ln -s work/vms_wc.sh ln -s work/words_from_evt.gawk REFERENCE DATA The source data will be the interlinear release 1.6e6, already chopped into sections. DIRECTORY STRUCTURE The data files generated by this note (text, word counts, tables, etc.) for each sample text will live in the subdirectories "gen/", "gen/LANG/", "gen/LANG/BUK/", and "gen/LANG/BUK/SEC.K/", where LANG the sample's language. Two samples should have the same LANG only if they use the same spelling for shared words. Thus, English and Italian are different LANGs. Different encodings of Chinese (pinyin, GR, RomanNum) are different LANGs. Medieval French and Modern French are different LANGs. The Bible (with modern spelling) and War of the World are the same LANG. After much analysis, it seems that we can assign a single LANG ("voyn") to all parts of the VMS. BUK the book. Two samples with the same LANG and BUK should be by the same author and part of the same book. For Voynichese, BUK is "maj" = whole text, only the "majority vote" transcription lines. "lab" = only lines of "maj" that have labels or single words. "prs" = only the lines of "maj" excluded from "lab". "ini" = from "prs", only the first token of after each break. "fin" = from "prs", only the last token before each break. "mid" = from "prs", only every line of "prs" minus "ini" and "fin" tokens. "tak" = whole text, only Takahashi's transcription lines. For the other languages, BUK is a book tag (e.g. "wow" for War of the Worlds, "ptt" for the Pentateuch). SEC the major division within the book. The divisions must be disjoint. Partition of the book into divisions is worth the trouble only if the usage of common lexemes is expected to vary significantly between divisions (due to differences in subject matter and/or style), and those differences are considered relevant for the analysis. For the VMS, each classical section (Biological, Pharmaceutical, etc.) is a separate division, except that we split the Herbal section into two divisions "hea" and "heb". In the Culpeper herbal, the preamble, plant descriptions, and recipes could be in three separate divisions. In the Pentateuch, we could let each of the five books be a separate division. And so on. K the sub-division of SEC. For Voynichese, a subdivision is a maximal string of *consecutive* pages that belong to the same SEC; e.g. the Herbal-A consists of two separate sets of pages, "hea.1" and "hea.2". For other languages, we usually don't need to have more than one subdivision per SEC. In this note, each sub-division will be called a "section". Whether a BUK is partitioned into sections or not, it always has a pseudo-section "tot.1" which is the entire sample (hence the union of all other sections). Some of these data files are formatted as LaTeX tables and commands, and placed in the folders "tex/", "tex/LANG/", "tex/LANG/BUK/" and "tex/LANG/BUK/SEC.K/" as appropriate. The foldes "inp/", "inp/LANG/", "inp/LANG/BUK/" and "inp/LANG/BUK/SEC.K/" contain files or links created by hand that are inputs to various scripts. OUTPUT DATA FILES PER BOOK Each folder "gen/LANG/BUK/" contains the following files: "raw.evt" contains the text of that book according to some specific transcription, extracted or derived from the global VMS source EVT file. The file is in the EVT format, with each weirdo converted to an equivalent basic EVA char, or to "*" if impossible. "fnums.txt" contains the numbers of the logical pages (like "f11r", "f100v", "f86v5") that occur in that book. "sections-occ.tags" is the list of sections that occur in that book, in publishable order. "sections-use.tags" is the subset of the above that are worth analyzing separately, in the same order. It varies depending on the BUK. "raw-gud-bad-tw-counts.txt" is a table with counts and percentages of tokens and lexemes in "raw.evt", for each of the sections in "sections-use.tags", including the invalid ("bad") and valid ("gud") tokens and lexemes (see below). Apart form '#'-comments and blank lines, each line of this file has the format "SEC.K RAWTK GUDTK GUDTKPPM BADTK BADTKPPM RAWWD GUDWD GUDWDPPM BADWD BADWDPPM" where SEC is a section tag, like "hea.1", "cos.2", "unk.3". RAWTK,GUDTK,BADTK counts of total, good, and bad tokens in sction. RAWWD,GUDWD,BADWD counts of total, good, and bad lexemes in section, {xx}TKPPM = 100*{xx}TK/RAWTK, witk 1 decimal, where {xx} is "GUD" or "BAD". {xx}WDPPM = 100*{xx}WD/RAWWD, witk 1 decimal. Each folder "tex/LANG/BUK/" contains the following files: "raw-gud-bad-tw-counts.tex" the table "raw-gud-bad-tw-counts.txt" from "gen/LANG/BUK/", formatted as a LaTeX table. "raw-gud-bad-summary.tex", that defines the entries of that table as separate LaTeX macros. OUTPUT DATA FILES PER SECTION The main output files are "DIR/raw.evt" and "DIR/XXX.EEE" where DIR is any of the folders "gen/LANG/BUK/SEC.K", XXX is "raw", "gud", or "bad", and EEE is "tlw", "wfr", or "wdf" The file "DIR/raw.evt" contains the lines from "gen/LANG/BUK/raw.evt" that belong to section SEC.K The file "DIR/raw.tlw" contains the raw sequence of tokens and paragraph delimiters that occur in the "DIR/raw.evt" file, one token per line, in the format "TYPE LOC STRING", where LOC is the EVT-style line location code TYPE is the type of the token ("p" = punctuation, "s" = symbol, "a" = alpha word) STRING is the token in EVA encoding. There is a line "# =" wherever raw.evt has a "=" delimiter -- namely, between paragraphs, labels, titles, etc (but not at line breaks within paragraphs). The file "DIR/raw.wfr" contains the corresponding lexemes with occurrence counts and relative frequencies. The file "DIR/raw.wdf" contains the lexemes of "DIR/raw.tlw" as a running text, with ~72 characters per line, separated by simple spaces or line breaks, without locators,paragraph breaks, section breaks, etc. The files "DIR/gud.EEE" and "DIR/bad.EEE", where EEE is "tlw", "wfr", or "wdf" are the subsets of "DIR/raw.EEE". List of Voynichese "books": REMOVING BAD WORDS The "bad" tokens and lexemes are those with unreadable characters weirdos, or combinations that are considered "invalid" for some reason. The excluded lexemes are saved in "DIR/bad.wfr", and the balance is saved in "DIR/gud.wfr". Most other files are derived from `gud.wfr', the frequency file for good lexemes. Weirdos are defined as characters and combinations that are not part of the basic glyph set e i a o q y d l r s n m k t f p ch sh ckh cth cfh cph Note that we exclude { g j u v x z } as well as any { c h } that are not part of the compound glyphs listed above. We believe that this selection will not introduce a significant bias in the grammar-fitting percentages. Tokens that contain weirdos are probably abbreviations or symbols, which should not be counted in the totals; or embellished words, which are likely to be chosen for embellishment independently of their fitness or not to the grammar. As for tokens that have discrepant readings, the divergence should not be strongly correlated to their fitness to the grammar. DO IT Do it. (See the output at end of this note.) make_all_data_and_tables.sh >> OLD >> Here are the numbers. type nbad [?] [bchv...] [ai?n] -------------- ------------ ------------ ------------ ------------ voyn/maj/tot.1 1708(2526) 1612(2407) 96(119) 114(396) voyn/prs/tot.1 1580(2358) 1501(2257) 79(101) 114(396) voyn/lab/tot.1 161(168) 143(150) 18(18) 0(0) voyn/ini/tot.1 246(282) 241(277) 5(5) 29(44) voyn/mid/tot.1 1147(1698) 1100(1646) 47(52) 86(316) voyn/fin/tot.1 294(335) 267(306) 27(29) 26(36) voyn/tak/tot.1 497(626) 127(154) 370(472) 2(2) Qad words that were rejected only because of lowercase weirdos: From voyp/vms/tot.1.wfr: v(7) x(7) c(4) cheg(3) xar(3) amg(2) cto(2) g(2) aikhckhy(1) aithy(1) arg(1) arxor(1) axor(1) chckshy(1) chcpar(1) chcs(1) checta(1) chepchx(1) chocty(1) chodalg(1) choekchcey(1) choikhy(1) chokolg(1) cholxy(1) chxar(1) ckcho(1) ckchol(1) ckshy(1) cky(1) coy(1) cpheeg(1) cseo(1) ctar(1) ctchy(1) ctechy(1) ctoiin(1) ctos(1) dag(1) daing(1) dchog(1) dkeeeg(1) docodal(1) doithy(1) gaiin(1) kedarxy(1) lxor(1) ockey(1) ockhh(1) oetalchg(1) ogam(1) olgy(1) org(1) oxar(1) oxor(1) oxy(1) pchocty(1) qocky(1) qodaikhy(1) qokeefcy(1) qokg(1) rokaix(1) salxar(1) sarg(1) shecphhedy(1) shhy(1) shokog(1) shxam(1) soleeg(1) teyteg(1) todashx(1) vo(1) vr(1) vs(1) xoiin(1) xol(1) yhal(1) ykceol(1) ypcheg(1) ytcharg(1) From voyl/vms/tot.1.wfr: cfhhy(1) chockhhy(1) chodalg(1) ddsschx(1) docfhhy(1) gy(1) oalcheg(1) ocsesy(1) oecs(1) ofacfom(1) okaramog(1) okeeog(1) opalg(1) opchaldg(1) oteedyg(1) soshxar(1) ydashgarain(1) yskhy(1) OUTPUT OF "MAKE" Sample voyn/maj: lines words bytes file ------- ------- --------- ------------ 1066 2132 64512 gen/voyn/maj/hea.1/raw.evt 134 268 8660 gen/voyn/maj/hea.2/raw.evt 316 632 24711 gen/voyn/maj/heb.1/raw.evt 61 122 4644 gen/voyn/maj/heb.2/raw.evt 13 26 1132 gen/voyn/maj/cos.1/raw.evt 393 786 19115 gen/voyn/maj/cos.2/raw.evt 186 372 9994 gen/voyn/maj/cos.3/raw.evt 902 1804 62353 gen/voyn/maj/bio.1/raw.evt 335 670 15343 gen/voyn/maj/zod.1/raw.evt 174 348 10021 gen/voyn/maj/pha.1/raw.evt 284 568 15718 gen/voyn/maj/pha.2/raw.evt 80 160 6158 gen/voyn/maj/str.1/raw.evt 1084 2168 90650 gen/voyn/maj/str.2/raw.evt 28 56 1835 gen/voyn/maj/unk.1/raw.evt 26 52 1801 gen/voyn/maj/unk.2/raw.evt 7 14 461 gen/voyn/maj/unk.3/raw.evt 48 96 2972 gen/voyn/maj/unk.4/raw.evt 35 70 2844 gen/voyn/maj/unk.5/raw.evt 45 90 3845 gen/voyn/maj/unk.6/raw.evt 39 78 3002 gen/voyn/maj/unk.7/raw.evt 1 2 67 gen/voyn/maj/unk.8/raw.evt 5514 11901 360159 gen/voyn/maj/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 7047 20961 164869 gen/voyn/maj/hea.1/raw.tlw 882 2632 21493 gen/voyn/maj/hea.2/raw.tlw 2959 8819 70279 gen/voyn/maj/heb.1/raw.tlw 570 1697 13835 gen/voyn/maj/heb.2/raw.tlw 205 605 4403 gen/voyn/maj/cos.1/raw.tlw 2032 5810 44962 gen/voyn/maj/cos.2/raw.tlw 1123 3252 26380 gen/voyn/maj/cos.3/raw.tlw 7171 21317 174100 gen/voyn/maj/bio.1/raw.tlw 1674 4718 36687 gen/voyn/maj/zod.1/raw.tlw 1123 3269 26971 gen/voyn/maj/pha.1/raw.tlw 1763 5114 42370 gen/voyn/maj/pha.2/raw.tlw 763 2281 19088 gen/voyn/maj/str.1/raw.tlw 11056 32880 283328 gen/voyn/maj/str.2/raw.tlw 220 653 5215 gen/voyn/maj/unk.1/raw.tlw 142 424 3534 gen/voyn/maj/unk.2/raw.tlw 49 145 1129 gen/voyn/maj/unk.3/raw.tlw 337 991 7977 gen/voyn/maj/unk.4/raw.tlw 351 1044 8939 gen/voyn/maj/unk.5/raw.tlw 492 1473 12624 gen/voyn/maj/unk.6/raw.tlw 392 1171 9897 gen/voyn/maj/unk.7/raw.tlw 2 6 49 gen/voyn/maj/unk.8/raw.tlw 40372 119300 978182 gen/voyn/maj/tot.1/raw.tlw lines file ------- ------------ 2132 gen/voyn/maj/hea.1/raw.wfr 554 gen/voyn/maj/hea.2/raw.wfr 1189 gen/voyn/maj/heb.1/raw.wfr 331 gen/voyn/maj/heb.2/raw.wfr 83 gen/voyn/maj/cos.1/raw.wfr 1019 gen/voyn/maj/cos.2/raw.wfr 620 gen/voyn/maj/cos.3/raw.wfr 1597 gen/voyn/maj/bio.1/raw.wfr 884 gen/voyn/maj/zod.1/raw.wfr 561 gen/voyn/maj/pha.1/raw.wfr 808 gen/voyn/maj/pha.2/raw.wfr 483 gen/voyn/maj/str.1/raw.wfr 3225 gen/voyn/maj/str.2/raw.wfr 162 gen/voyn/maj/unk.1/raw.wfr 103 gen/voyn/maj/unk.2/raw.wfr 46 gen/voyn/maj/unk.3/raw.wfr 239 gen/voyn/maj/unk.4/raw.wfr 246 gen/voyn/maj/unk.5/raw.wfr 297 gen/voyn/maj/unk.6/raw.wfr 235 gen/voyn/maj/unk.7/raw.wfr 2 gen/voyn/maj/unk.8/raw.wfr 8591 gen/voyn/maj/tot.1/raw.wfr lines file ------- ------------ 1981 gen/voyn/maj/hea.1/gud.wfr 509 gen/voyn/maj/hea.2/gud.wfr 1111 gen/voyn/maj/heb.1/gud.wfr 288 gen/voyn/maj/heb.2/gud.wfr 72 gen/voyn/maj/cos.1/gud.wfr 868 gen/voyn/maj/cos.2/gud.wfr 429 gen/voyn/maj/cos.3/gud.wfr 1382 gen/voyn/maj/bio.1/gud.wfr 555 gen/voyn/maj/zod.1/gud.wfr 483 gen/voyn/maj/pha.1/gud.wfr 694 gen/voyn/maj/pha.2/gud.wfr 402 gen/voyn/maj/str.1/gud.wfr 2779 gen/voyn/maj/str.2/gud.wfr 153 gen/voyn/maj/unk.1/gud.wfr 97 gen/voyn/maj/unk.2/gud.wfr 43 gen/voyn/maj/unk.3/gud.wfr 228 gen/voyn/maj/unk.4/gud.wfr 214 gen/voyn/maj/unk.5/gud.wfr 247 gen/voyn/maj/unk.6/gud.wfr 208 gen/voyn/maj/unk.7/gud.wfr 2 gen/voyn/maj/unk.8/gud.wfr 6883 gen/voyn/maj/tot.1/gud.wfr lines file ------- ------------ 151 gen/voyn/maj/hea.1/bad.wfr 45 gen/voyn/maj/hea.2/bad.wfr 78 gen/voyn/maj/heb.1/bad.wfr 43 gen/voyn/maj/heb.2/bad.wfr 11 gen/voyn/maj/cos.1/bad.wfr 151 gen/voyn/maj/cos.2/bad.wfr 191 gen/voyn/maj/cos.3/bad.wfr 215 gen/voyn/maj/bio.1/bad.wfr 329 gen/voyn/maj/zod.1/bad.wfr 78 gen/voyn/maj/pha.1/bad.wfr 114 gen/voyn/maj/pha.2/bad.wfr 81 gen/voyn/maj/str.1/bad.wfr 446 gen/voyn/maj/str.2/bad.wfr 9 gen/voyn/maj/unk.1/bad.wfr 6 gen/voyn/maj/unk.2/bad.wfr 3 gen/voyn/maj/unk.3/bad.wfr 11 gen/voyn/maj/unk.4/bad.wfr 32 gen/voyn/maj/unk.5/bad.wfr 50 gen/voyn/maj/unk.6/bad.wfr 27 gen/voyn/maj/unk.7/bad.wfr 0 gen/voyn/maj/unk.8/bad.wfr 1708 gen/voyn/maj/tot.1/bad.wfr Good/bad statistics for voyn/maj: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 6867 6704 976 163 23 2132 1981 928 151 70 hea.2 868 823 947 45 51 554 509 917 45 81 heb.1 2901 2820 971 81 27 1189 1111 933 78 65 heb.2 557 510 913 47 84 331 288 867 43 129 cos.1 195 155 790 40 204 83 72 857 11 130 cos.2 1746 1590 910 156 89 1019 868 850 151 148 cos.3 1006 795 789 211 209 620 429 690 191 307 bio.1 6975 6697 960 278 39 1597 1382 864 215 134 zod.1 1370 988 720 382 278 884 555 627 329 371 pha.1 1023 944 921 79 77 561 483 859 78 138 pha.2 1588 1452 913 136 85 808 694 857 114 140 str.1 755 670 886 85 112 483 402 830 81 167 str.2 10768 10097 937 671 62 3225 2779 861 446 138 unk.1 213 202 943 11 51 162 153 938 9 55 unk.2 140 134 950 6 42 103 97 932 6 57 unk.3 47 44 916 3 62 46 43 914 3 63 unk.4 317 306 962 11 34 239 228 950 11 45 unk.5 342 309 900 33 96 246 214 866 32 129 unk.6 489 431 879 58 118 297 247 828 50 167 unk.7 387 357 920 30 77 235 208 881 27 114 unk.8 2 2 666 0 0 2 2 666 0 0 tot.1 38556 36030 934 2526 65 8591 6883 801 1708 198 Sample voyn/prs: lines words bytes file ------- ------- --------- ------------ 1065 2130 64485 gen/voyn/prs/hea.1/raw.evt 134 268 8660 gen/voyn/prs/hea.2/raw.evt 316 632 24711 gen/voyn/prs/heb.1/raw.evt 61 122 4644 gen/voyn/prs/heb.2/raw.evt 4 8 870 gen/voyn/prs/cos.1/raw.evt 206 412 13662 gen/voyn/prs/cos.2/raw.evt 85 170 7150 gen/voyn/prs/cos.3/raw.evt 775 1550 58885 gen/voyn/prs/bio.1/raw.evt 36 72 6945 gen/voyn/prs/zod.1/raw.evt 89 178 7635 gen/voyn/prs/pha.1/raw.evt 135 270 11650 gen/voyn/prs/pha.2/raw.evt 80 160 6158 gen/voyn/prs/str.1/raw.evt 1084 2168 90650 gen/voyn/prs/str.2/raw.evt 28 56 1835 gen/voyn/prs/unk.1/raw.evt 26 52 1801 gen/voyn/prs/unk.2/raw.evt 7 14 461 gen/voyn/prs/unk.3/raw.evt 33 66 2563 gen/voyn/prs/unk.4/raw.evt 35 70 2844 gen/voyn/prs/unk.5/raw.evt 45 90 3845 gen/voyn/prs/unk.6/raw.evt 39 78 3002 gen/voyn/prs/unk.7/raw.evt 0 0 0 gen/voyn/prs/unk.8/raw.evt 4540 9953 332777 gen/voyn/prs/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 7045 20956 164841 gen/voyn/prs/hea.1/raw.tlw 882 2632 21493 gen/voyn/prs/hea.2/raw.tlw ,2959 8819 70279 gen/voyn/prs/heb.1/raw.tlw 570 1697 13835 gen/voyn/prs/heb.2/raw.tlw 186 557 4105 gen/voyn/prs/cos.1/raw.tlw 1606 4703 37794 gen/voyn/prs/cos.2/raw.tlw 904 2692 22690 gen/voyn/prs/cos.3/raw.tlw 6915 20658 170013 gen/voyn/prs/bio.1/raw.tlw 1015 3040 25966 gen/voyn/prs/zod.1/raw.tlw 942 2810 24107 gen/voyn/prs/pha.1/raw.tlw 1455 4336 37509 gen/voyn/prs/pha.2/raw.tlw 763 2281 19088 gen/voyn/prs/str.1/raw.tlw 11056 32880 283328 gen/voyn/prs/str.2/raw.tlw 220 653 5215 gen/voyn/prs/unk.1/raw.tlw 142 424 3534 gen/voyn/prs/unk.2/raw.tlw 49 145 1129 gen/voyn/prs/unk.3/raw.tlw 307 916 7559 gen/voyn/prs/unk.4/raw.tlw 351 1044 8939 gen/voyn/prs/unk.5/raw.tlw 492 1473 12624 gen/voyn/prs/unk.6/raw.tlw 392 1171 9897 gen/voyn/prs/unk.7/raw.tlw 0 0 0 gen/voyn/prs/unk.8/raw.tlw 38269 113923 943994 gen/voyn/prs/tot.1/raw.tlw lines file ------- ------------ 2131 gen/voyn/prs/hea.1/raw.wfr 554 gen/voyn/prs/hea.2/raw.wfr 1189 gen/voyn/prs/heb.1/raw.wfr 331 gen/voyn/prs/heb.2/raw.wfr 73 gen/voyn/prs/cos.1/raw.wfr 868 gen/voyn/prs/cos.2/raw.wfr 533 gen/voyn/prs/cos.3/raw.wfr 1536 gen/voyn/prs/bio.1/raw.wfr 641 gen/voyn/prs/zod.1/raw.wfr 485 gen/voyn/prs/pha.1/raw.wfr 684 gen/voyn/prs/pha.2/raw.wfr 483 gen/voyn/prs/str.1/raw.wfr 3225 gen/voyn/prs/str.2/raw.wfr 162 gen/voyn/prs/unk.1/raw.wfr 103 gen/voyn/prs/unk.2/raw.wfr 46 gen/voyn/prs/unk.3/raw.wfr 226 gen/voyn/prs/unk.4/raw.wfr 246 gen/voyn/prs/unk.5/raw.wfr 297 gen/voyn/prs/unk.6/raw.wfr 235 gen/voyn/prs/unk.7/raw.wfr 0 gen/voyn/prs/unk.8/raw.wfr 8105 gen/voyn/prs/tot.1/raw.wfr lines file ------- ------------ 1980 gen/voyn/prs/hea.1/gud.wfr 509 gen/voyn/prs/hea.2/gud.wfr 1111 gen/voyn/prs/heb.1/gud.wfr 288 gen/voyn/prs/heb.2/gud.wfr 63 gen/voyn/prs/cos.1/gud.wfr 733 gen/voyn/prs/cos.2/gud.wfr 380 gen/voyn/prs/cos.3/gud.wfr 1325 gen/voyn/prs/bio.1/gud.wfr 379 gen/voyn/prs/zod.1/gud.wfr 418 gen/voyn/prs/pha.1/gud.wfr 587 gen/voyn/prs/pha.2/gud.wfr 402 gen/voyn/prs/str.1/gud.wfr 2779 gen/voyn/prs/str.2/gud.wfr 153 gen/voyn/prs/unk.1/gud.wfr 97 gen/voyn/prs/unk.2/gud.wfr 43 gen/voyn/prs/unk.3/gud.wfr 216 gen/voyn/prs/unk.4/gud.wfr 214 gen/voyn/prs/unk.5/gud.wfr 247 gen/voyn/prs/unk.6/gud.wfr 208 gen/voyn/prs/unk.7/gud.wfr 0 gen/voyn/prs/unk.8/gud.wfr 6525 gen/voyn/prs/tot.1/gud.wfr lines file ------- ------------ 151 gen/voyn/prs/hea.1/bad.wfr 45 gen/voyn/prs/hea.2/bad.wfr 78 gen/voyn/prs/heb.1/bad.wfr 43 gen/voyn/prs/heb.2/bad.wfr 10 gen/voyn/prs/cos.1/bad.wfr 135 gen/voyn/prs/cos.2/bad.wfr 153 gen/voyn/prs/cos.3/bad.wfr 211 gen/voyn/prs/bio.1/bad.wfr 262 gen/voyn/prs/zod.1/bad.wfr 67 gen/voyn/prs/pha.1/bad.wfr 97 gen/voyn/prs/pha.2/bad.wfr 81 gen/voyn/prs/str.1/bad.wfr 446 gen/voyn/prs/str.2/bad.wfr 9 gen/voyn/prs/unk.1/bad.wfr 6 gen/voyn/prs/unk.2/bad.wfr 3 gen/voyn/prs/unk.3/bad.wfr 10 gen/voyn/prs/unk.4/bad.wfr 32 gen/voyn/prs/unk.5/bad.wfr 50 gen/voyn/prs/unk.6/bad.wfr 27 gen/voyn/prs/unk.7/bad.wfr 0 gen/voyn/prs/unk.8/bad.wfr 1580 gen/voyn/prs/tot.1/bad.wfr Good/bad statistics for voyn/prs: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 6866 6703 976 163 23 2131 1980 928 151 70 hea.2 868 823 947 45 51 554 509 917 45 81 heb.1 2901 2820 971 81 27 1189 1111 933 78 65 heb.2 557 510 913 47 84 331 288 867 43 129 cos.1 185 146 784 39 209 73 63 851 10 135 cos.2 1491 1353 906 138 92 868 733 843 135 155 cos.3 884 713 805 171 193 533 380 711 153 286 bio.1 6828 6555 959 273 39 1536 1325 862 211 137 zod.1 1010 701 693 309 305 641 379 590 262 408 pha.1 926 858 925 68 73 485 418 860 67 137 pha.2 1426 1309 917 117 81 684 587 856 97 141 str.1 755 670 886 85 112 483 402 830 81 167 str.2 10768 10097 937 671 62 3225 2779 861 446 138 unk.1 213 202 943 11 51 162 153 938 9 55 unk.2 140 134 950 6 42 103 97 932 6 57 unk.3 47 44 916 3 62 46 43 914 3 63 unk.4 302 292 963 10 33 226 216 951 10 44 unk.5 342 309 900 33 96 246 214 866 32 129 unk.6 489 431 879 58 118 297 247 828 50 167 unk.7 387 357 920 30 77 235 208 881 27 114 unk.8 0 0 0 0 0 0 0 0 0 0 tot.1 37385 35027 936 2358 63 8105 6525 804 1580 194 Sample voyn/lab: lines words bytes file ------- ------- --------- ------------ 1 2 27 gen/voyn/lab/hea.1/raw.evt 0 0 0 gen/voyn/lab/hea.2/raw.evt 0 0 0 gen/voyn/lab/heb.1/raw.evt 0 0 0 gen/voyn/lab/heb.2/raw.evt 9 18 262 gen/voyn/lab/cos.1/raw.evt 187 374 5453 gen/voyn/lab/cos.2/raw.evt 101 202 2844 gen/voyn/lab/cos.3/raw.evt 127 254 3468 gen/voyn/lab/bio.1/raw.evt 299 598 8398 gen/voyn/lab/zod.1/raw.evt 85 170 2386 gen/voyn/lab/pha.1/raw.evt 149 298 4068 gen/voyn/lab/pha.2/raw.evt 0 0 0 gen/voyn/lab/str.1/raw.evt 0 0 0 gen/voyn/lab/str.2/raw.evt 0 0 0 gen/voyn/lab/unk.1/raw.evt 0 0 0 gen/voyn/lab/unk.2/raw.evt 0 0 0 gen/voyn/lab/unk.3/raw.evt 15 30 409 gen/voyn/lab/unk.4/raw.evt 0 0 0 gen/voyn/lab/unk.5/raw.evt 0 0 0 gen/voyn/lab/unk.6/raw.evt 0 0 0 gen/voyn/lab/unk.7/raw.evt 1 2 67 gen/voyn/lab/unk.8/raw.evt 1231 3335 37703 gen/voyn/lab/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 1 3 24 gen/voyn/lab/hea.1/raw.tlw 0 0 0 gen/voyn/lab/hea.2/raw.tlw 0 0 0 gen/voyn/lab/heb.1/raw.tlw 0 0 0 gen/voyn/lab/heb.2/raw.tlw 18 46 294 gen/voyn/lab/cos.1/raw.tlw 425 1105 7164 gen/voyn/lab/cos.2/raw.tlw 218 558 3687 gen/voyn/lab/cos.3/raw.tlw 255 657 4084 gen/voyn/lab/bio.1/raw.tlw 658 1676 10717 gen/voyn/lab/zod.1/raw.tlw 180 457 2862 gen/voyn/lab/pha.1/raw.tlw 307 776 4859 gen/voyn/lab/pha.2/raw.tlw 0 0 0 gen/voyn/lab/str.1/raw.tlw 0 0 0 gen/voyn/lab/str.2/raw.tlw 0 0 0 gen/voyn/lab/unk.1/raw.tlw 0 0 0 gen/voyn/lab/unk.2/raw.tlw 0 0 0 gen/voyn/lab/unk.3/raw.tlw 29 73 415 gen/voyn/lab/unk.4/raw.tlw 0 0 0 gen/voyn/lab/unk.5/raw.tlw 0 0 0 gen/voyn/lab/unk.6/raw.tlw 0 0 0 gen/voyn/lab/unk.7/raw.tlw 2 6 49 gen/voyn/lab/unk.8/raw.tlw 2102 5375 34187 gen/voyn/lab/tot.1/raw.tlw lines file ------- ------------ 1 gen/voyn/lab/hea.1/raw.wfr 0 gen/voyn/lab/hea.2/raw.wfr 0 gen/voyn/lab/heb.1/raw.wfr 0 gen/voyn/lab/heb.2/raw.wfr 10 gen/voyn/lab/cos.1/raw.wfr 225 gen/voyn/lab/cos.2/raw.wfr 112 gen/voyn/lab/cos.3/raw.wfr 127 gen/voyn/lab/bio.1/raw.wfr 303 gen/voyn/lab/zod.1/raw.wfr 92 gen/voyn/lab/pha.1/raw.wfr 155 gen/voyn/lab/pha.2/raw.wfr 0 gen/voyn/lab/str.1/raw.wfr 0 gen/voyn/lab/str.2/raw.wfr 0 gen/voyn/lab/unk.1/raw.wfr 0 gen/voyn/lab/unk.2/raw.wfr 0 gen/voyn/lab/unk.3/raw.wfr 15 gen/voyn/lab/unk.4/raw.wfr 0 gen/voyn/lab/unk.5/raw.wfr 0 gen/voyn/lab/unk.6/raw.wfr 0 gen/voyn/lab/unk.7/raw.wfr 2 gen/voyn/lab/unk.8/raw.wfr 882 gen/voyn/lab/tot.1/raw.wfr lines file ------- ------------ 1 gen/voyn/lab/hea.1/gud.wfr 0 gen/voyn/lab/hea.2/gud.wfr 0 gen/voyn/lab/heb.1/gud.wfr 0 gen/voyn/lab/heb.2/gud.wfr 9 gen/voyn/lab/cos.1/gud.wfr 208 gen/voyn/lab/cos.2/gud.wfr 72 gen/voyn/lab/cos.3/gud.wfr 122 gen/voyn/lab/bio.1/gud.wfr 233 gen/voyn/lab/zod.1/gud.wfr 81 gen/voyn/lab/pha.1/gud.wfr 136 gen/voyn/lab/pha.2/gud.wfr 0 gen/voyn/lab/str.1/gud.wfr 0 gen/voyn/lab/str.2/gud.wfr 0 gen/voyn/lab/unk.1/gud.wfr 0 gen/voyn/lab/unk.2/gud.wfr 0 gen/voyn/lab/unk.3/gud.wfr 14 gen/voyn/lab/unk.4/gud.wfr 0 gen/voyn/lab/unk.5/gud.wfr 0 gen/voyn/lab/unk.6/gud.wfr 0 gen/voyn/lab/unk.7/gud.wfr 2 gen/voyn/lab/unk.8/gud.wfr 721 gen/voyn/lab/tot.1/gud.wfr lines file ------- ------------ 0 gen/voyn/lab/hea.1/bad.wfr 0 gen/voyn/lab/hea.2/bad.wfr 0 gen/voyn/lab/heb.1/bad.wfr 0 gen/voyn/lab/heb.2/bad.wfr 1 gen/voyn/lab/cos.1/bad.wfr 17 gen/voyn/lab/cos.2/bad.wfr 40 gen/voyn/lab/cos.3/bad.wfr 5 gen/voyn/lab/bio.1/bad.wfr 70 gen/voyn/lab/zod.1/bad.wfr 11 gen/voyn/lab/pha.1/bad.wfr 19 gen/voyn/lab/pha.2/bad.wfr 0 gen/voyn/lab/str.1/bad.wfr 0 gen/voyn/lab/str.2/bad.wfr 0 gen/voyn/lab/unk.1/bad.wfr 0 gen/voyn/lab/unk.2/bad.wfr 0 gen/voyn/lab/unk.3/bad.wfr 1 gen/voyn/lab/unk.4/bad.wfr 0 gen/voyn/lab/unk.5/bad.wfr 0 gen/voyn/lab/unk.6/bad.wfr 0 gen/voyn/lab/unk.7/bad.wfr 0 gen/voyn/lab/unk.8/bad.wfr 161 gen/voyn/lab/tot.1/bad.wfr Good/bad statistics for voyn/lab: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 1 1 500 0 0 1 1 500 0 0 hea.2 0 0 0 0 0 0 0 0 0 0 heb.1 0 0 0 0 0 0 0 0 0 0 heb.2 0 0 0 0 0 0 0 0 0 0 cos.1 10 9 818 1 90 10 9 818 1 90 cos.2 255 237 925 18 70 225 208 920 17 75 cos.3 122 82 666 40 325 112 72 637 40 353 bio.1 147 142 959 5 33 127 122 953 5 39 zod.1 360 287 795 73 202 303 233 766 70 230 pha.1 97 86 877 11 112 92 81 870 11 118 pha.2 162 143 877 19 116 155 136 871 19 121 str.1 0 0 0 0 0 0 0 0 0 0 str.2 0 0 0 0 0 0 0 0 0 0 unk.1 0 0 0 0 0 0 0 0 0 0 unk.2 0 0 0 0 0 0 0 0 0 0 unk.3 0 0 0 0 0 0 0 0 0 0 unk.4 15 14 875 1 62 15 14 875 1 62 unk.5 0 0 0 0 0 0 0 0 0 0 unk.6 0 0 0 0 0 0 0 0 0 0 unk.7 0 0 0 0 0 0 0 0 0 0 unk.8 2 2 666 0 0 2 2 666 0 0 tot.1 1171 1003 855 168 143 882 721 816 161 182 Statistics for voyn/tak: lines words bytes file ------- ------- --------- ------------ 5391 11713 361531 gen/voyn/tak/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 39548 116936 960572 gen/voyn/tak/tot.1/raw.tlw lines file ------- ------------ 8150 gen/voyn/tak/tot.1/raw.wfr lines file ------- ------------ 7653 gen/voyn/tak/tot.1/gud.wfr lines file ------- ------------ 497 gen/voyn/tak/tot.1/bad.wfr Good/bad statistics for voyn/tak: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 37840 37214 983 626 16 8150 7653 938 497 60 Statistics for voyn/ini: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 5726 16301 126670 gen/voyn/ini/tot.1/raw.tlw lines file ------- ------------ 2159 gen/voyn/ini/tot.1/raw.wfr lines file ------- ------------ 1913 gen/voyn/ini/tot.1/gud.wfr lines file ------- ------------ 246 gen/voyn/ini/tot.1/bad.wfr Good/bad statistics for voyn/ini: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 4849 4567 941 282 58 2159 1913 885 246 113 Statistics for voyn/fin: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 5726 16301 122721 gen/voyn/fin/tot.1/raw.tlw lines file ------- ------------ 2042 gen/voyn/fin/tot.1/raw.wfr lines file ------- ------------ 1748 gen/voyn/fin/tot.1/gud.wfr lines file ------- ------------ 294 gen/voyn/fin/tot.1/bad.wfr Good/bad statistics for voyn/fin: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 4849 4514 930 335 69 2042 1748 855 294 143 Statistics for voyn/mid: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 28240 83880 694625 gen/voyn/mid/tot.1/raw.tlw lines file ------- ------------ 5633 gen/voyn/mid/tot.1/raw.wfr lines file ------- ------------ 4486 gen/voyn/mid/tot.1/gud.wfr lines file ------- ------------ 1147 gen/voyn/mid/tot.1/bad.wfr Good/bad statistics for voyn/mid: # tokens lexemes # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 27400 25702 937 1698 61 5633 4486 796 1147 203 voyn/{prs,lab}/hea.1/raw.wfr: 6867 voyn/maj/hea.1/raw.wfr: 6867 voyn/{prs,lab}/hea.2/raw.wfr: 868 voyn/maj/hea.2/raw.wfr: 868 voyn/{prs,lab}/heb.1/raw.wfr: 2901 voyn/maj/heb.1/raw.wfr: 2901 voyn/{prs,lab}/heb.2/raw.wfr: 557 voyn/maj/heb.2/raw.wfr: 557 voyn/{prs,lab}/cos.1/raw.wfr: 195 voyn/maj/cos.1/raw.wfr: 195 voyn/{prs,lab}/cos.2/raw.wfr: 1746 voyn/maj/cos.2/raw.wfr: 1746 voyn/{prs,lab}/cos.3/raw.wfr: 1006 voyn/maj/cos.3/raw.wfr: 1006 voyn/{prs,lab}/bio.1/raw.wfr: 6975 voyn/maj/bio.1/raw.wfr: 6975 voyn/{prs,lab}/zod.1/raw.wfr: 1370 voyn/maj/zod.1/raw.wfr: 1370 voyn/{prs,lab}/pha.1/raw.wfr: 1023 voyn/maj/pha.1/raw.wfr: 1023 voyn/{prs,lab}/pha.2/raw.wfr: 1588 voyn/maj/pha.2/raw.wfr: 1588 voyn/{prs,lab}/str.1/raw.wfr: 755 voyn/maj/str.1/raw.wfr: 755 voyn/{prs,lab}/str.2/raw.wfr: 10768 voyn/maj/str.2/raw.wfr: 10768 voyn/{prs,lab}/unk.1/raw.wfr: 213 voyn/maj/unk.1/raw.wfr: 213 voyn/{prs,lab}/unk.2/raw.wfr: 140 voyn/maj/unk.2/raw.wfr: 140 voyn/{prs,lab}/unk.3/raw.wfr: 47 voyn/maj/unk.3/raw.wfr: 47 voyn/{prs,lab}/unk.4/raw.wfr: 317 voyn/maj/unk.4/raw.wfr: 317 voyn/{prs,lab}/unk.5/raw.wfr: 342 voyn/maj/unk.5/raw.wfr: 342 voyn/{prs,lab}/unk.6/raw.wfr: 489 voyn/maj/unk.6/raw.wfr: 489 voyn/{prs,lab}/unk.7/raw.wfr: 387 voyn/maj/unk.7/raw.wfr: 387 voyn/{prs,lab}/unk.8/raw.wfr: 2 voyn/maj/unk.8/raw.wfr: 2 # END