Hacking at the Voynich manuscript
Notebook - volume 4

Warning: these notebooks aren't strictly chronological logs.
  Sometimes I go back and redo things, clarify comments,
  delete garbage, etc.

Summary of previous notebooks
=============================

  On 97-07-05 I obtained Landini's interlinear transcription of the VMs, version 1.6
  (landini-interln16.evt) from
  http://sun1.bham.ac.uk/G.Landini/evmt/intrln16.zip
  
  I manually extracted from it a homogeneous, full-text sample
  bio-m-evt.evt, consisting of pages 147-166 (f75r--f84v) of the
  "biological" section, in Currier's Language B, hand 2.  This section
  includes Currier's and Friedman's transcriptions.  Currier's seems
  to be the most complete of them.
  
  The two versions have many differences (affecting 5-10% of the
  words), and often disagree even in the grouping of symbols: where
  one sees two words the other sees a single word, what is [A] for one
  may be [CI] for the other, and so on.
  
  So I decided to break all characters doen to individual "logical"
  strokes, and use one (computer) character to encode each stroke.
  I called this new encoding "jsa" (Jorge's Super-Analytic). 
  
  After mapping to jsa, I generated a "consensus" version
  of the biological section 
  
    cat bio-m-evt.evt \
      | fsg2jsa \
      > bio-m-jsa.evt
      
    cat bio-m-jsa.evt \
      | make-consensus-interlin \
      > bio-x-jsa.evt
  
    cat bio-x-jsa.evt \
      | egrep '^<.*;J> ' \
      | sed \
          -e 's/{[^}]*}//g' \
      > bio-j-jsa.evt

    extract-words-from-interlin \
        -chars "qocilgysxju" \
        bio-j-jsa.evt \
        bio-j-jsa
        
     lines   words     bytes file        
    ------ ------- --------- ------------
      7054    7054     62690 bio-j-jsa.wds
      2132    2132     24925 bio-j-jsa.dic
      4661    4661     40897 bio-j-jsa-gut.wds
       992     992      9720 bio-j-jsa-gut.dic
       840     840      2445 bio-j-jsa-fun.wds
         2       2         5 bio-j-jsa-fun.dic
      1553    1553     19348 bio-j-jsa-bad.wds
      1138    1138     15200 bio-j-jsa-bad.dic

   Digraph counts:

                  q     o     c     i     l     g     y     s     x     j     u   TOT
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
            .  1398   965  1877   361    60     .     .     .     .     .     .  4661
      q     1     .  1229    18     .     1   154     .     .     .   700     .  2103
      o    21   486     1    63  1087  1071     .     .     .     .     .     .  2729
      c     4   167   176  6137  1209   232  2114  2921  1019     .     .     . 13979
      i     4     1     1     8  1997     2     .     .   560  1616    37   457  4683
      l     .     .     .     .     .     .    16     .     .     .  1566     .  1582
      g    52     .    74  2150     4     4     .     .     .     .     .     .  2284
      y  2790    26     2    47    13    43     .     .     .     .     .     .  2921
      s   463     1    99  1013     1     2     .     .     .     .     .     .  1579
      x   827    24   105   488     5   167     .     .     .     .     .     .  1616
      j    46     .    76  2175     6     .     .     .     .     .     .     .  2303
      u   453     .     1     3     .     .     .     .     .     .     .     .   457
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT  4661  2103  2729 13979  4683  1582  2284  2921  1579  1616  2303   457 40897

  Some conclusions we get from this and other data:
  
    The valid \i/ sequences are \ij/  \is/ \iis/ \iiu/ \iiiu/ \ix/;
    the others are likely to be scription or transcription errors.
    
    \ci/ and \o/ are lexically similar but distinct glyphs. 
    
    The suffixes \ij/, \iis/, \iiu/, and \iiiu/ are preceded 
    almost exclusively by \ci/ and strictly word-final.  It seems 
    plausible that these are errors:
       
       \oij/     (4 occurrences) should be \ciij/    ( 32 occurrences)    
       \oiiu/    (2 occurrences) should be \ciiiu/   (109 occurrences)    
       \ciiu/    (4 occurrences) should be \ciiiu/   (109 occurrences)    
       \oiiiu/   (9 occurrences) should be \ciiiiu/  (329 occurrences)   
       \ciiiiiu/ (4 occurrences) should be \ciiiiu/  (329 occurrences) 
       \ciiix/   (2 occurrences) should be \ciix/    (403 occurrences) 
       
    \ciiis/ (19 occurrences) may also be a misreading of \ciis/ (291 occurrences).

    \cg/ is always a glyph.
    
    \qo/ is a combination that occurs only in word-initial position.
    
    \qc/ is likely to be a misreading/miswriting of \qo/.
    
    \cy/ is always a glyph, almost certainly a final form of \ci/.
    
    \qj/, \lj/, \qg/, \lg/ are glyphs.
    
    \cs/ is a glyph closely related to (but distinct from) \c/.
    
    \ccg/ is almost always followed by \ci/ or \cy/.
    
  Here "glyph" means a group of strokes that can be treated as a single symbol
  for analysis; it may actually be part of a larger, still unrecognized symbol.
  
  Summarizing again:
  
    \iiiu/, \iiu/, \iis/, \ij/  
    
        The ziggies: strictly final, preceded always by \ci/ or,
        more rarely, by \o/.
        
    \cy/ 
    
        Almost always final, but occasionaly followed by other letters.
        Preceded by about the same letters as \ci/; indeed, it is 
        probably the final form of \ci/.
        
    \cg/ 
    
        May be followed by many letters, most often \cy/ and \ci/.
        Almost always prededed by \c/, or initial; rarely by \ix/
        or \o/.
        
    \cs/ 
    
        Most often followed by \c/, somewhat less often by \o/,
        \ci/, or word break.  Most often initial, but also 
        preceded by \ix/, gallows, \c/, \cy/, \cg/, \is/.
        
    \lg/, \qg/, \lj/, \qj/ 
    
        The capitals: Very similar to each other, different from the rest.
        probably to be combined with \c/ on both sides.
        It is very likely that \l/ and \q/ are exactly equivalent.
        Also, \lg/ and \qg/ may be the capital form of \cg/, used 
        mainly in the first line of each paragraph (and perhaps of each page?)
        
    \qo/ 
    
        Strictly initial, almost always followed by a capital.
        Sometimes misread as \qc/?
    
    \ix/ 
     
        Usually initial or preceded by \ci/ or \o/;
        followed by any letter except ziggies and \qo/,
        \ix/, \is/
        
    \is/ 
    
        Similar to \ix/ except that it cannot be
        followed by capitals or \cg/, either.

    \ci/
    
        May be followed only by the ziggies, \ix/, or \ir/
        only.  Often follows a capital, but also \cg/,
        \cs/, \c/, \ix/, \is/, or word break.
        
    \o/ 
    
        Similar to \a/, but is very often word-initial.
                   

  Other conclusions:
  
    * The manuscript does not appear to use any hyphenation mark.  Either
      words are not broken across lines, which would be unusual, or they
      are broken without any extra marks.  Such word breaks may 
      result in statistical anomalies at the beginning and end of lines.
      Could this explain Currier's claim that lines are "functional units"?

    * Note that parsing sequences like \cij/, \ciis/, and \ciiis/ requires
      some care: the right parsings are c+ij, c+iis, ci+iis.  

    * The parsing of \ciis/ is ambiguous: ci+is or c+iis.  Declaring 
      \ciiis/ to be a misreading of \ciis/ would remove the ambiguity.

    * The parsing of \ciiiu/ is ambiguous, too; but since the \iu/
      series does not seem to follow a bare \c/, it seems safe to parse
      it as ci+iiu.
    
    * The gallows characters \qj/ and \lj/ appear to be closely related:
      for every common word with \lj/, there appears to be a 
      a word with \qj/ that occurs with about 1/4 the frequency.
      
    * The seems to be a kinship between the glyphs \cs/ 
      (when not attached to the following \c/s)
      \ir/, and the gallows \lj/ and \qj/ (also, when unattached).
      
    * The same phenomenon can be noted with respect to prefixes
      containing \cc/ and \csc/: for every word beginning with \cc/,
      there is a word where the first \cc/ is replaced by \csc/,
      and practically the same frequency.
      
    * There apepars to be much confusion between the suffixes \iu/
      and \iiiu/. 
      
    * There appears to be much confusions between \o/ and \ci/
      
  The strings of \c/, \cs/, \lj/, \qj/, \lg/, \qg/ must be treated
  together, after collapsing the glyphs listed above, since there
  seems to be glyphs consisting of gallows preceded and followed by 
  \c/ or \cc/.  When this is taken into account, we can see that 
  a single \c/ is not a glyph, but \cs/ is.  In fact, after
  shrinking \ci/ to `a', \cs/ to `z', the gallows to `H' or `P', the 
  only possible glyphs of the form [czHp]* with length at most 3 are
  
       freq glyph    
       ---- -----
        795 H       
         52 P       
        152 z       
        138 cc      
         70 zc      
        482 Hc      
        484 ccc     
        439 zcc    ? 
        493 Hcc    ?
         19 cHc     
          4 cPc  
          
  The ones marked `?' may be composite, z+cc and H+cc, but this hypothesis
  does not seem very likely (perhaps they are *sometimes* composite?)
  
  The significant strings of length 4 that cannot be parsed into the glyphs above
  are 
          
         20 cHcc    
          4 cPcc    

  Strings with 4 or more [czHP]'s tend to be quite ambiguous.
  
97-07-29 stolfi
===============

  Let's try to determine whether the \lg/ and \qg/ gallows
  are legitimate glyphs, or merely an ornate form of some other
  glyph.  It seems thath these sequences only occur at the beginning
  of a paragraph.  Let's check:
  
    cat bio-j-jsa.txt \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
          -e 's/ir/v/g' \
          -e 's/iin/m/g' \
          -e 's/in/m/g' \
      | tac \
      | number-lines-from-end-of-paragraph \
      | tac \
      | number-lines-in-paragraph \
      | egrep 'P' \
      | sort +0 -1n
  
      1   3 _Pccoe cPcoe zoe?Hcoe Hc8a qoHc8a qoHcc8a qoHcca oeHcc8a Hca?qoHc8a qoPor oea //_
      1   4 _H??e ccc8a qoHcca oHc8a 8aHc8a oHcc8a oezcc8 oPzcc8 aHzcc8a qo?jc8a oPoea //_
      1   4 _Pcccoe 8ar qoHcca ccccHa qoH??e 8ae c?cc8a Pcc8a roe qoHc8a roe //_
      1   4 _PoeHczcoe oPa zcca qoPzcc8a qoHcc8a qoHoe zca Hoezc8 qoHa //_
      1   4 _Poeccc8a zcc8a qoHcc8a qoHan o8a cccH?cz oHae //_
      1   4 _Poezcca oeHzcc8a zccoe aHcca oHcca cccor zccc8a oe //_
      1   4 _Por?ar?or aHcca Hcca o??ar ozcca qoHa ccca oHcca e8a or??e //_
      1   4 _Pzccoe8a 8Hzcca qoHoPa ror oPor oePocHca oeHa8a //_
      1   5 _Hzcc8a qoHc?m zcc8a qoHaz oHae qoPzcc8a qoHa eccc8a qoP??eo?? //_
      1   5 _P8aezcor zcHoe qoHa Pzcai? zcc8a oHae8a 8ar oHar oHc8a 8a roe //_
      1   5 _Pccca Hzccoe qoHc?m o?gccc8a oHaezc8a oeHav oHam oHcc8a //_
      1   5 _Poe zcar zcar Pccca oHzcca oHaoz am oHzcca 8aeHccca?ra //_
      1   5 _Poeam oeHcc8a qoHccca 8aHcc8a qoHcc8a oPccc8a zcoe ora //_
      1   6 _HoHoe oePccc8a qoHcc8a qoHc8ae zcoe qoHae oH8ae 8??e oezcc8a //_
      1   6 _P?cc8a 8oePccc8a qoHcc8a qoHc8a qoHoe?Pccc8a roea //_
      1   6 _Pcccoe?Hae 8ae Horcca qoHca qoHa rccc8a qoHae oeHae?zcc8a ccHa //_
      1   6 _Poe oe zcae??jc?m oHcca eHcca qoHae oHcc?s8a oHczc8a //_
      1   6 _PoeHcca qoHoe oHzc8a oeHa orH??r zcccPc?c8a oeHae //_
      1   6 _Poeccc8a qoHar zcca qoHe oe?c?cca qoHan ccca qoHa qoHar //_
      1   6 _c?jcc??a rcccHa zccPccc8a qoH??r oe qocHcca ?eccca qoHc8a Horom?a //_
      1   6 _qHor zcc8a zcca Hcc8a zae ram ???Pcc8a 8ar ccc8a qoPccc8a roroe //_
      1   6 _qo8a zcar acH?ca qoHccca Hccca oeHan oPccc8a q?goe zcHa orae //_
      1   7 _?g??ae?zc8a zcocPcc8a ??H??r zcc8a oPzcc8a oHzc8a qoHc8c?e zc8a zoe8a //_
      1   7 _Horoezcz8a oPccca zccPcca qoHam zccHca qoHc8a 8aea //_
      1   7 _Poe?Hc8a ezcccHca oeHa oH oeHcca rccca qcHca rccca rae //_
      1   7 _Poe?zcoe Hccca qoHoe zcc8a qoHzcca zaea Hccca zHoe?Pcca //_
      1   7 _Poecca cPaeov o?jc?m oHam cc?cca Ham aeor oeHcca qoHae //_
      1   7 _Pzcoe?PcccPc8a qoHcc8a 8a qoHc8a 8am zccHcc8a qoHam ccccHca 8ar ccccHca ak //_
      1   8 _Hccc8a ePccc8a oPzcc8a cccPoe Pccc8ar zcc8a qoPccc8a //_
      1   8 _Poe8aHa 8aeoe oHc8a ccHav oPccc8a q?Hae c?cc8a cccPccc8a oPccca //_
      1   8 _Pzcaroe zccHca qoHzc8a qo?jae8a oPc?cc8a qoHar or am?oe //_
      1   8 _zaHam oHcc8a ccc8a qoHc?m cPcca oPcccca oHa?z?am ??Hara //_
      1   9 _Hoecc8a qoHc8a qoPoe qoHc8or cco???ccc8a qoHae?ccc8a Hak //_
      1   9 _P??ecc8 ??ccc8a z?ccHca oeHa 8ar oPaeHam oqoPccc8a ??r?a?m oPoea oroea //_
      1   9 _Par zcca ?jcc8a zccHae 8ae 8ar oe Pccc8a zccH 8c?m oPae?zccHa //_
      1   9 _Pccc8ar zcc8ae qoHar zcc8ae oHcc8a qoHc8a ?oHaie zcc8a qoHa??ezcc?? //_
      1   9 _Pccor ccccPcc8a qoHc8a ezc?c8a qoHcc8a rzcc8?Hc8a qoPzc8a qoPa //_
      1   9 _Poe8zcc8a oeH??ra qoHoeoe oH??e8a oHc8oe?or oeoroe //_
      1   9 _Poearar oHor oPcccca aHcca oPccaea ezcc8a qo?gcc8ae eHo8ae oPa Horoe?s //_
      1   9 _Pzc?c?e8a oPaezcc8a qoHzcc8a qoHc8a 8or zcca oPccc8a 8ae ?so?Pccak //_
      1  10 _qcHc8a zcc8a qoHoe o8ae ccae 8ar qoPzcc8a qoHc8a qoHc8a qoHc8a 8ae //_
      1  11 _Pccc8ar oPccc8a qoHc8a oPccc8a qoP8a 8an cccHa?s ccc?gcca qoHak z //_
      1  11 _Poe zcc8a qocc8a qoHam cccPcca qoe eHam zcc8a qoe //_
      1  11 _Pzcoe Hc?m oeHar zcca qoHc?m 8??e oeHam oHan zcae qoHa //_
      1  12 _Poeaecc8a Pz?cc8a oPccc8a qoHa?s aHzccoe qoHcc8a oHa ezcc8 ccPzccc8a aHae //_
      1  13 _Har zcccHca qoHae qoHc?m cc?c?e Hc8a rccc8a Pccc8a rzc?oe 8ae //_
      1  13 _Pccc8a ezcccHcc8a qoHcca qoHam oeHa oPccc8a Pccc8a //_
      1  14 _Hcc??8cca ezcc8a 8aea 8ae 8zccc8a Pcccoe ePoe oe?c?cc8a qoHa //_
      1  14 _Po?sae?zca qoHc8a cPcae zcc8a zccoe Hccc8a eHam zcc8a qoea //_
      1  15 _Poecc?a qoHc8a zcoe ?Pccc8a oePccc8a o?gzc8a oea //_
      1  15 _Poezca 8ae zcc8a qoHan 8ar cc?s8a ?c?c??ga qo?jar zcc8a e8a //_
      1  15 _cPc8or zcc8ae qoPcc8a 8zcc8a zcc8a Hccc8a ezcccHae zcccPca 8am 8a //_
      1  16 _P??e?8ar?av??e qoHoe ccca qoPccc8a qoPccc8a 8a?ecccz ??eHc8a e??ea //_
      1  16 _Po??c?m oe qoHcc8a Hccoe Hccae?oeHcc aPccca oPccc8a Haea //_
      1  16 _Poecc8a qoPccoe qoHor oePccc8a oPoe or???sa //_
      1  17 _P8oe ?gzcc8a q?PoeHc?m ocHcor oHcc8a qoHcc8a qoq?cccoe oe?jom a8arzcca //_
      1  18 _Hor zcc8a oHc8a qoHc8a oHc8ar cPc8??roe //_
      1  18 _PoHan oHcc8a or cccza zoe?zcca qoHcca qoHc8a oeHc8a ccca?Hae 8a qoe //_
      1  18 _Poe?zca ?c??sca?Hcc8 qoP oHcc8a o??jc?c??ga ?oHc???8c??? ??ai??? //_
      1  19 _PaHc8a oePccc8a qoHc8a zPcca ccc8a roe 8??r oPccc8a qoHc8a //_
      1  19 _qo?gcccoe oPccc8a qoHc?m oPccc8a?eccca cPcar oe oHaeor //_
      1  21 _Pccoe?zcc8a zcH??e oHc8a oPccoe?or Hcc8a oPccc??a?z?am cPccca oez //_
      1  25 _Hzcc8??r zcc8a qoPccc8a qoHc8a 8a?qoHoe o?ja //_
      1  29 _???q??cc???c??? ???Pc?c??a ?o??gzc??a ?q???g?c???c???c???n aza //_
      4   1 _Poe oeHccc?i?zccoe qoHcca // =_
      4   1 _azccca oezcca oeHzcc??a z?ccPzcca // =_
      5   1 _??o8c??cca ?jar oHc?m oPar oHc?m oeHca // =_
      6   1 _Hcccoe Hc8a Pcccoe Hc?m zcc8a qoHam oHca qoHc8a ccc8a // =_
      6   1 _Pcccoe zcc8a qoHam zcoe8a // =_
      8   1 _Pccc8ae oHc8a zcccHcc8cca qoHa ccc8a ccara // =_
     11   1 _Poezc8a qoHcc8a zcHa oeoeccca // =_
     14   1 _zcca oPccca cHcca oecca // =_
     16   1 _8am 8oe z?c?r cPcca e?gcc?c?e zccar qoecca // =_
     16   1 _eccc8a??Pccc8a qoHam 8??r // =_
      5   2 _zor oeHa qoHa?Ha?Hor c?cca?Ha HoHoe oPccc8a qoHc?m zccHa qoHc?m oe //_
      6   2 _Pccc8a qoHccc8a oHam cccHca zcc8a oHc?8a qoHa qoHc8a oe?oHc8a oHc8a rak //_
      8   2 _Poeccc8 oHan zcc8a zcc8a 8ae ccc8ar qoHcca ??Hcca ez??r?am ora //_
      8   2 _aH???e or zcc8a ???qoe?Hcc8a 8am 8Hc?m cPc?8a oe8a //_
     12   2 _8zcc8a Pccc8a qoHan ccc8a 8??ecce qoHcc8a qoH??e oeccca //_
     12   2 _Hzcc8a qoHc?m ccc8ae Pccc8a cccHa zc?m ccc8a qoHan oe //_
     15   2 _8an ccPam oPaea HoHaea //_
      3   3 _Poe oeor ccca qoHc?m zcc8a qoHc?m 8eccc?sa oe r?c?m?8ar //_
      5   3 _Poe ??e am oeHae zcar zcc8a qoHoe cc8a e8oe 8ar ae //_
      5   3 _Poe Har zcc8a qoHc8a oHae zcca qoHar?cccHca oHccca qoHccc8a cc???ca qoHam //_
      9   3 _8am cceccPzccca Hc?e cccoe 8a?? c?roHca 8am //_
      2   4 _8c?m zcca??jcc8a eH?? oPccc8a qoHc8a oHca Hae 8an oHcca oHa //_
      2   4 _??ccca 8ae?zccc8a cPccc8a oHc?m zcca qoPccca oe 8or?ccc8ccca //_
      2   4 _Hccc8a Pccc8a qoHcca ?soe oe8av zcccHca qoe e c?ccc8a qoHcc8a eoe?ccc8a //_
      3   4 _???qoHcc8a aHcca zccca or or am acPam ?cccHcca 8??? aHa //_
      3   4 _zam zcc8a Pz?cc8a qoHar zccoe qoeccca qoHa qoHae qoHa?? //_
     23   4 _Pccca Hzcca qoH?ca qoHae ?zc??ca qo?? ccc8a qoHak //_
      5   5 _zae?cccoe Har zcc8a zae?Hc8a zav qoHc8a q?Pc?c8a ecccPcc8a e8ar //_
      7   5 _zccca q?Pccc8a qoe cccc8a qoHcar cccca eoea 8a //_
     13   5 _azcca e?zcca?Ham eor a?m zccHca ?P?cca ???oe?Hcc8a?Har oHa ear //_
     25   5 _qoHc8a eHca Hae zcc8a qoPccca qoe Pccc8a oHcca cccHca zcca eoe ???e zccca?8a? //_
      4   6 _zaeHcca zor zcccHca 8am oHar ccPccc8a cPcca oHa oeor oHcca raecce //_
     11   6 _Pccc8a qoHca oHcoe qoe zccor zcc8a qcHc8az //_
     14   6 _Horc?m zcc8a ??ccor or zccH oHar?Pcc8a oPc?c?eor oHae zcc8a //_
      6   7 _Poecc8arcn zccHca qoHa aHar aeoe eHam oezcc8a oHc?m oHar oPar HoePa //_
      8   7 _Pccc8a rzccae 8ae8a qoHcc8a rzcc8a qo?jcc8a qoHcc8a eoccc8a //_
      9   7 _qoHccza qoHc8a qo?jcc8a cccPcca //_
      4   8 _Poe or oeHc?m ocHca qoHam oHc?m oHar zcca qoeHa //_
      9   8 _Par?o8a zcccPca cccoe qoHae?8ar oHc8a oea //_
      7   9 _Poezc8ae oHc8av oPzcc8ae qoHc8a zcc8a Pzcc?c8ae HzccoHcc8a ozccPoez //_
      7  10 _8zcc8a qoHcc8a qoPccc8a qoHaea 8ar oHam oHccae ak //_
      8  10 _???Pam zc?c?e qoHc?m cccHa qoHcc8a qoHar zccHca qoH??e zccc?jca qoHc?m oeHak //_
     10  10 _8zcc8a qo??c8a or?zcc8a Pccc8a qoHc?c8a oHc8a oPccc8a //_
     12  10 _qoHc8a zcca Hae oHc8a aPccc8a Hc8a eoeor //_
     17  10 _Pccc8ar zccPcca ec?cc8ara 8ae c?cae zca?Hoe zcc8a qoHak //_
      2  11 _qoeccca zccHca qoHca ecca oPccca 8an zc8a qoHc8ar ??ecc8a zo?? oHa 8a eccc8a //_
      3  11 _Pzc8a oPcc8a qoHc8a qoHcc8a qoHc8a qoe?Hc8a qoHc8a oHa //_
      8  11 _oHae oPae oHcca eoe oeoe??joe?Hc8a qoHc8a qoHa ePcc8a qoHcc8a eoe //_
     15  11 _zom Har Hc8a?P?c?ca Hcc8oeH8a //_
      5  12 _qoPccc8a qoe ccc8a qoHcca o8am rc?m 8aea //_
      7  12 _???s?c??c8a qocc8a oe cca rzc8a ezcc8a 8ar cc8a Pcc8a //_
      7  12 _Por?zcca oHan ccc8a qoe?zccoe oeccc8a //_
     18  12 _8aezcc8a qoe?zccc8a qoHae8a cccPa 8c?m ae?oeor oec?m ccc8a zcccHca qoHc?? eor //_
      5  14 _8zcc8a aHcc8a zcccH?a 8am oHc8a qoHcc8 qoHc8a e?ccP?c8a //_
      5  14 _Pom oe Hc8a oHc8a qoHa oHcc8a qoHca //_
     13  14 _Pccc8a Hcc8a qoHc8a qoHc8a qoHc8a qoHc8a qoHan oezcc8a //_
      4  15 _aHcz?ca 8c?cc8?a?Hc8a aHc8a 8ar aHc8a?Pca qoHa aHc8a oHae //_
      7  20 _Pccc8a qoHzc8a aHan ccc8a qoHar cca eoe ccc8a qoHa //_
      6  21 _Pccca?H??c?r oeHa 8ar oHca qoHan cccHca qoHcc8a qoHa //_
      7  23 _zcc8 ae?zccl?ca ?sc?m cccPcc8a an oeHcca eHar an oHcca eHan ccc8a 8ar 8aea //_
      5  25 _zoe?ccc8a qoHcc8a qoPccc8a qoH?cc? ??oe zcc8a qoHc8a z?cca o?jcc?s ae ae ccc8c?m?8a //_
      2  28 _8zcc8a qoH8?8aar cHca?s cccP c?an oHan qoHcor zcc8a qoe ?i??iu zccae?s qoHcca //_

  So, `P's occur mostly , but not exclusively, at the beginning of the paragraph (64 lines out of 126).
  
  It seems that the end-of-paragraph is often missing in the transcript file, especially when it
  precedes a page break.
  

97-07-30 stolfi
===============
  
  Let's repeat the investigation of [czHP] strings, but including the `8' letter:

    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/' \
          -e 's/[ql]g/P/' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
      | enum-contexts -vPAT='[czHP8][czHP8]*' -vCTX=0 \
      | wfreq

       793 0.19 H
       382 0.09 8
       374 0.09 Hc8
       314 0.07 Hcc8
       305 0.07 ccc8
       277 0.06 zcc8
       178 0.04 Hcc
       163 0.04 ccc
       152 0.04 z
       140 0.03 zcc
       102 0.02 Hc
        74 0.02 cc
        56 0.01 cccHc
        49 0.01 zccc8
        49 0.01 Pccc8
        49 0.01 P
        48 0.01 cc8
        46 0.01 zc
        41 0.01 ccccHc
        40 0.01 cccc
        39 0.01 zccHc
        35 0.01 zcccHc
        35 0.01 Hccc8
        34 0.01 zccc
        27 0.01 cccc8
        25 0.01 Hccc
        24 0.01 zccH
        20 0.00 zc8
        18 0.00 cccH
        18 0.00 8zcc8
        16 0.00 Hzcc
        15 0.00 Hzcc8
        14 0.00 cHc
        14 0.00 Pccc
        13 0.00 zcccH
        12 0.00 ccccH
        12 0.00 cHcc
        11 0.00 cccz
        11 0.00 ccH
        11 0.00 8ccc8
         9 0.00 zccHcc
         9 0.00 cccHc8
         8 0.00 cHcc8
         7 0.00 cccHcc8
         7 0.00 Pzcc8
         7 0.00 Hzc8
         6 0.00 zccHcc8
         6 0.00 8cc
         5 0.00 zcccHcc8
         5 0.00 zcH
         5 0.00 ccccz
         5 0.00 cHc8
         5 0.00 c
         5 0.00 Pcc8
         5 0.00 8cc8
         4 0.00 zzcc8
         4 0.00 ccccHcc
         4 0.00 cccHcc
         4 0.00 cH
         4 0.00 Pcc
         4 0.00 Hcccc
         4 0.00 8ccc
         3 0.00 zcccz
         3 0.00 cccP
         3 0.00 ccHc8
         3 0.00 cPcc
         3 0.00 cPc
         3 0.00 P8
         3 0.00 8zcc
         3 0.00 8zc
         2 0.00 zzcc
         2 0.00 zcccPc
         2 0.00 zcccHcc
         2 0.00 zcccHc8
         2 0.00 zccPcc
         2 0.00 zccHc8
         2 0.00 ccz
         2 0.00 ccccHcc8
         2 0.00 cccPcc8
         2 0.00 cccPcc
         2 0.00 cP
         2 0.00 Pzc8
         2 0.00 Pzc
         2 0.00 Pcccc
         2 0.00 Hzc
         2 0.00 Hczcc
         2 0.00 Hczc
         2 0.00 Hccz
         2 0.00 Hcccc8
         2 0.00 H8
         2 0.00 8zccc8
         2 0.00 8zccc
         2 0.00 8cccc
         2 0.00 8c8
         2 0.00 8Hc8
         1 0.00 zzcccHc
         1 0.00 zzccH
         1 0.00 zzcHcc8
         1 0.00 zcz8
         1 0.00 zccz
         1 0.00 zccccHcc
         1 0.00 zcccc
         1 0.00 zcccHcc8cc
         1 0.00 zccPccc8
         1 0.00 zccP
         1 0.00 zccHccc
         1 0.00 zcHcc
         1 0.00 zcHc
         1 0.00 zPcc
         1 0.00 zHcc
         1 0.00 zH
         1 0.00 ccccc
         1 0.00 ccccPcc8
         1 0.00 cccPccc8
         1 0.00 cccHccc8
         1 0.00 ccc8cc
         1 0.00 ccPzccc8
         1 0.00 ccPzccc
         1 0.00 ccPccc8
         1 0.00 ccP
         1 0.00 ccHcc8
         1 0.00 ccHcc
         1 0.00 cc8cc
         1 0.00 cPccc8
         1 0.00 cPccc
         1 0.00 cPcc8
         1 0.00 cPc8
         1 0.00 cHccz
         1 0.00 cHccc8
         1 0.00 Pzcc
         1 0.00 Hczc8
         1 0.00 Hcz8
         1 0.00 Hcz
         1 0.00 Hccz8
         1 0.00 Hc8zcc8
         1 0.00 Hc8cc
         1 0.00 Hc8c8
         1 0.00 Hc8c
         1 0.00 8zcccz
         1 0.00 8zc8
         1 0.00 8cccc8
         1 0.00 8cccHcc8
         1 0.00 8Hzcc
         1 0.00 8Hcc
         1 0.00 88
     ----- ---- ----
      4282 1.00 TOT

  Apparently the `8' (\cg/) does not tend to be surrounded by [czHP] strokes, it is either preceded
  or followed by them.  Thus `8' seems quite unlike `P'.
  
  Let's look at some `P' strings and try to find similar words with the `P' replaced by something else:
  
    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/' \
          -e 's/[ql]g/P/' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
      | egrep '[^czHP]P[^czHP]' \
      | wfreq
      
         8 0.15 _Poe_
         2 0.04 _oPoea_
         2 0.04 _oPar_
         2 0.04 _oPa_
         2 0.04 _Poeccc8a_
         1 0.02 _qoPor_
         1 0.02 _qoPoe_
         1 0.02 _qoPa_
         1 0.02 _qoP_
         1 0.02 _qoP8a_
         1 0.02 _qoHoPa_
         1 0.02 _oePocHca_
         1 0.02 _oPor_
         1 0.02 _oPoe_
         1 0.02 _oPaezcc8a_
         1 0.02 _oPaea_
         1 0.02 _oPae_
         1 0.02 _oPaeHain_
         1 0.02 _ePoe_
         1 0.02 _Poin_
         1 0.02 _Poezcca_
         1 0.02 _Poezca_
         1 0.02 _Poezc8ae_
         1 0.02 _Poezc8a_
         1 0.02 _Poeccc8_
         1 0.02 _Poecca_
         1 0.02 _Poecc8arcn_
         1 0.02 _Poecc8a_
         1 0.02 _Poearar_
         1 0.02 _Poeain_
         1 0.02 _Poeaecc8a_
         1 0.02 _PoeHczcoe_
         1 0.02 _PoeHcca_
         1 0.02 _Poe8zcc8a_
         1 0.02 _Poe8aHa_
         1 0.02 _PoHan_
         1 0.02 _Par_
         1 0.02 _PaHc8a_
         1 0.02 _P8oe_
         1 0.02 _P8aezcor_
         1 0.02 _HoePa_
     ----- ---- ----
        52 1.00 TOT

    set noglob
    foreach f ( \
      '_'.'oe_'  \
      '_o'.'oea_' \
      '_o'.'ar_' \
      '_o'.'a_' \
      '_'.'oeccc8a_' \
    )
      echo " "
      echo "-----------------------------------------------------------------------"
      echo " "
      cat bio-j-jsa-gut.wds \
        | sed \
            -e 's/^/_/g' \
            -e 's/$/_/g' \
            -e 's/[ql]j/H/' \
            -e 's/[ql]g/P/' \
            -e 's/cs/z/g' \
            -e 's/ij/k/g' \
            -e 's/ix/e/g' \
            -e 's/is/r/g' \
            -e 's/iiu/n/g' \
            -e 's/y/i/g' \
            -e 's/ci/a/g' \
            -e 's/cg/8/g' \
        | compare-contexts -rctx 0 -lctx 0 -colw 24 \
            "${f:r}P${f:e}" "${f:r}[^P]${f:e}"  "${f:r}[^P][^P]${f:e}"
    end
    unset noglob
      
    -----------------------------------------------------------------------

         8 1.00 _Poe_             81 0.52 _qoe_             11 0.19 _zcoe_
     ----- ---- ----              25 0.16 _zoe_             11 0.19 _oHoe_
         8 1.00 TOT               17 0.11 _eoe_             11 0.19 _ccoe_
                                  17 0.11 _8oe_              8 0.14 _oeoe_
                                   9 0.06 _roe_              5 0.09 _oroe_
                                   6 0.04 _Hoe_              3 0.05 _Hcoe_
                               ----- ---- ----               2 0.04 _aroe_
                                 155 1.00 TOT                1 0.02 _qooe_
                                                             1 0.02 _oqoe_
                                                             1 0.02 _eHoe_
                                                             1 0.02 _e8oe_
                                                             1 0.02 _aeoe_
                                                             1 0.02 _8roe_
                                                         ----- ---- ----
                                                            57 1.00 TOT

    -----------------------------------------------------------------------

         2 1.00 _oPoea_            2 1.00 _oroea_        ----- ---- ----
     ----- ---- ----           ----- ---- ----               0 1.00 TOT
         2 1.00 TOT                2 1.00 TOT           

    -----------------------------------------------------------------------

         2 1.00 _oPar_            35 0.92 _oHar_             7 1.00 _oeHar_
     ----- ---- ----               1 0.03 _orar_         ----- ---- ----
         2 1.00 TOT                1 0.03 _oear_             7 1.00 TOT
                                   1 0.03 _o8ar_        
                               ----- ---- ----          
                                  38 1.00 TOT           

    -----------------------------------------------------------------------

         2 1.00 _oPa_             25 0.45 _oHa_             21 0.53 _oHca_
     ----- ---- ----              23 0.41 _oea_             10 0.25 _oeHa_
         2 1.00 TOT                6 0.11 _ora_              9 0.23 _oe8a_
                                   2 0.04 _o8a_          ----- ---- ----
                               ----- ---- ----              40 1.00 TOT
                                  56 1.00 TOT           

    -----------------------------------------------------------------------

         2 1.00 _Poeccc8a_         6 0.67 _qoeccc8a_     ----- ---- ----
     ----- ---- ----               2 0.22 _zoeccc8a_         0 1.00 TOT
         2 1.00 TOT                1 0.11 _8oeccc8a_    
                               ----- ---- ----          
                                   9 1.00 TOT           

    -----------------------------------------------------------------------

  It ssems that isolated `P' = {\lg/,\qg/} is closely related to `r'=\is/, 
  `q' = \q/, `z' = \cs/, `8' = \cg/, `e' = \ix/, `H' = {\lj/,\qj/}.
  
    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/' \
          -e 's/[ql]g/P/' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
      | egrep '[^czHP]Pccc[^czHP]' \
      | wfreq
      
        14 0.22 _oPccc8a_
        14 0.22 _Pccc8a_
         8 0.13 _qoPccc8a_
         4 0.06 _oePccc8a_
         4 0.06 _oPccca_
         4 0.06 _Pcccoe_
         4 0.06 _Pccc8ar_
         3 0.05 _Pccca_
         2 0.03 _qoPccca_
         1 0.02 _oqoPccc8a_
         1 0.02 _ePccc8a_
         1 0.02 _aPccca_
         1 0.02 _aPccc8a_
         1 0.02 _Pccc8ae_
         1 0.02 _8oePccc8a_
     ----- ---- ----
        63 1.00 TOT
      
    set noglob
    foreach f ( \
      '_o'.'ccc8a_' \
      '_'.'ccc8a_' \
      '_qo'.'ccc8a_' \
      '_oe'.'ccc8a_' \
      '_o'.'ccca_' \
      '_'.'cccoe_' \
      '_'.'ccc8ar_' \
      '_'.'ccca_' \
      '_qo'.'ccca_' \
    )
      echo " "
      echo "-----------------------------------------------------------------------"
      echo " "
      cat bio-j-jsa-gut.wds \
        | sed \
            -e 's/^/_/g' \
            -e 's/$/_/g' \
            -e 's/[ql]j/H/' \
            -e 's/[ql]g/P/' \
            -e 's/cs/z/g' \
            -e 's/ij/k/g' \
            -e 's/ix/e/g' \
            -e 's/is/r/g' \
            -e 's/iiu/n/g' \
            -e 's/y/i/g' \
            -e 's/ci/a/g' \
            -e 's/cg/8/g' \
        | compare-contexts -rctx 0 -lctx 0 -colw 24 \
            "${f:r}P${f:e}" "${f:r}[^P]${f:e}"  "${f:r}[^P][^P]${f:e}"
    end
    unset noglob

    -----------------------------------------------------------------------

        14 1.00 _oPccc8a_         23 0.79 _oeccc8a_          1 0.50 _orzccc8a_
     ----- ---- ----               5 0.17 _oHccc8a_          1 0.50 _oecccc8a_
        14 1.00 TOT                1 0.03 _o8ccc8a_      ----- ---- ----
                               ----- ---- ----               2 1.00 TOT
                                  29 1.00 TOT           

    -----------------------------------------------------------------------

        14 1.00 _Pccc8a_          52 0.37 _eccc8a_          23 0.49 _oeccc8a_
     ----- ---- ----              36 0.26 _zccc8a_           5 0.11 _oHccc8a_
        14 1.00 TOT               19 0.13 _cccc8a_           4 0.09 _ezccc8a_
                                  14 0.10 _Hccc8a_           2 0.04 _rzccc8a_
                                   9 0.06 _8ccc8a_           2 0.04 _qoccc8a_
                                   5 0.04 _rccc8a_           2 0.04 _ecccc8a_
                                   4 0.03 _accc8a_           2 0.04 _eHccc8a_
                                   1 0.01 _qccc8a_           1 0.02 _o8ccc8a_
                                   1 0.01 _occc8a_           1 0.02 _eoccc8a_
                               ----- ---- ----               1 0.02 _azccc8a_
                                 141 1.00 TOT                1 0.02 _aeccc8a_
                                                             1 0.02 _acccc8a_
                                                             1 0.02 _8zccc8a_
                                                             1 0.02 _8cccc8a_
                                                         ----- ---- ----
                                                            47 1.00 TOT

    -----------------------------------------------------------------------

         8 1.00 _qoPccc8a_        11 0.65 _qoHccc8a_         2 0.40 _qoezccc8a_
     ----- ---- ----               6 0.35 _qoeccc8a_         2 0.40 _qoHcccc8a_
         8 1.00 TOT            ----- ---- ----               1 0.20 _qoecccc8a_
                                  17 1.00 TOT            ----- ---- ----
                                                             5 1.00 TOT

    -----------------------------------------------------------------------

         4 1.00 _oePccc8a_         1 1.00 _oecccc8a_         1 1.00 _oeoHccc8a_
     ----- ---- ----           ----- ---- ----           ----- ---- ----
         4 1.00 TOT                1 1.00 TOT                1 1.00 TOT

    -----------------------------------------------------------------------

         4 1.00 _oPccca_          12 0.63 _oeccca_           2 0.40 _oezccca_
     ----- ---- ----               6 0.32 _oHccca_           2 0.40 _oecccca_
         4 1.00 TOT                1 0.05 _orccca_           1 0.20 _oeHccca_
                               ----- ---- ----           ----- ---- ----
                                  19 1.00 TOT                5 1.00 TOT

    -----------------------------------------------------------------------

         4 1.00 _Pcccoe_           3 0.27 _zcccoe_           1 0.33 _oecccoe_
     ----- ---- ----               3 0.27 _8cccoe_           1 0.33 _cccccoe_
         4 1.00 TOT                2 0.18 _ecccoe_           1 0.33 _8ccccoe_
                                   2 0.18 _Hcccoe_       ----- ---- ----
                                   1 0.09 _acccoe_           3 1.00 TOT
                               ----- ---- ----          
                                  11 1.00 TOT           

    -----------------------------------------------------------------------

         4 1.00 _Pccc8ar_          1 0.25 _eccc8ar_      ----- ---- ----
     ----- ---- ----               1 0.25 _cccc8ar_          0 1.00 TOT
         4 1.00 TOT                1 0.25 _accc8ar_     
                                   1 0.25 _Hccc8ar_     
                               ----- ---- ----          
                                   4 1.00 TOT           

    -----------------------------------------------------------------------

         3 1.00 _Pccca_           31 0.39 _cccca_           12 0.35 _oeccca_
     ----- ---- ----              23 0.29 _zccca_            6 0.18 _oHccca_
         3 1.00 TOT               15 0.19 _eccca_            3 0.09 _ezccca_
                                   5 0.06 _Hccca_            3 0.09 _azccca_
                                   4 0.05 _rccca_            2 0.06 _ecccca_
                                   1 0.01 _accca_            2 0.06 _8zccca_
                                   1 0.01 _8ccca_            1 0.03 _rcccca_
                               ----- ---- ----               1 0.03 _qoccca_
                                  80 1.00 TOT                1 0.03 _orccca_
                                                             1 0.03 _acccca_
                                                             1 0.03 _Hcccca_
                                                             1 0.03 _8cccca_
                                                         ----- ---- ----
                                                            34 1.00 TOT

    -----------------------------------------------------------------------

         2 1.00 _qoPccca_          8 0.50 _qoeccca_          1 0.50 _qoeHccca_
     ----- ---- ----               8 0.50 _qoHccca_          1 0.50 _qoHcccca_
         2 1.00 TOT            ----- ---- ----           ----- ---- ----
                                  16 1.00 TOT                2 1.00 TOT

  It seems that `Pccc' is closely related to `Hccc' `eccc' `zccc' `8ccc' `cccc'. 

    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/' \
          -e 's/[ql]g/P/' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
      | egrep '[^czHP]Pzcc[^czHP]' \
      | wfreq

         3 0.38 _qoPzcc8a_
         2 0.25 _oPzcc8a_
         1 0.12 _oPzcc8ae_
         1 0.12 _oPzcc8_
         1 0.12 _Pzccoe8a_
     ----- ---- ----
         8 1.00 TOT

    set noglob
    foreach f ( \
      '_qo'.'zcc8a_' \
      '_o'.'zcc8a_' \
      '_o'.'zcc8ae_' \
      '_o'.'zcc8_' \
      '_'.'zccoe8a_' \
    )
      echo " "
      echo "-----------------------------------------------------------------------"
      echo " "
      cat bio-j-jsa-gut.wds \
        | sed \
            -e 's/^/_/g' \
            -e 's/$/_/g' \
            -e 's/[ql]j/H/' \
            -e 's/[ql]g/P/' \
            -e 's/cs/z/g' \
            -e 's/ij/k/g' \
            -e 's/ix/e/g' \
            -e 's/is/r/g' \
            -e 's/iiu/n/g' \
            -e 's/y/i/g' \
            -e 's/ci/a/g' \
            -e 's/cg/8/g' \
        | compare-contexts -rctx 0 -lctx 0 -colw 24 \
            "${f:r}P${f:e}" "${f:r}[^P]${f:e}"  "${f:r}[^P][^P]${f:e}"
    end
    unset noglob

    -----------------------------------------------------------------------

         3 1.00 _qoPzcc8a_         7 0.88 _qoHzcc8a_     ----- ---- ----
     ----- ---- ----               1 0.12 _qoezcc8a_         0 1.00 TOT
         3 1.00 TOT            ----- ---- ----          
                                   8 1.00 TOT           

    -----------------------------------------------------------------------

         2 1.00 _oPzcc8a_         14 0.88 _oezcc8a_          1 1.00 _oeHzcc8a_
     ----- ---- ----               2 0.12 _oHzcc8a_      ----- ---- ----
         2 1.00 TOT            ----- ---- ----               1 1.00 TOT
                                  16 1.00 TOT           

    -----------------------------------------------------------------------

         1 1.00 _oPzcc8ae_     ----- ---- ----           ----- ---- ----
     ----- ---- ----               0 1.00 TOT                0 1.00 TOT
         1 1.00 TOT                                     

    -----------------------------------------------------------------------

         1 1.00 _oPzcc8_           2 1.00 _oezcc8_       ----- ---- ----
     ----- ---- ----           ----- ---- ----               0 1.00 TOT
         1 1.00 TOT                2 1.00 TOT           

    -----------------------------------------------------------------------

         1 1.00 _Pzccoe8a_     ----- ---- ----           ----- ---- ----
     ----- ---- ----               0 1.00 TOT                0 1.00 TOT
         1 1.00 TOT                                     

  Again the `P' seems to be similar to `H' and `e'.
  
  And now for something completely different.  Let's look at how the words are distributed among the 
  paragraphs:
  
    cat bio-j-jsa.wds \
        | sed \
            -e 's/[ql]j/H/g' \
            -e 's/[ql]g/P/g' \
            -e 's/ij/k/g' \
            -e 's/ix/e/g' \
            -e 's/is/r/g' \
            -e 's/iiu/n/g' \
            -e 's/cy/a/g' \
            -e 's/ci/a/g' \
            -e 's/in/m/g' \
            -e 's/ir/w/g' \
            -e 's/cs/z/g' \
            -e 's/cg/8/g' \
        | enum-words-in-blocks -vWPB=100 \
        | egrep -v '[^a-zA-Z0-9_ ]' \
        | sort +1 -2 +0 -1n \
        | make-word-location-map -vNBLOCKS=71 \
        > .foo

  The result has been posted as http://www.dcc.unicamp.br/~stolfi/voynich/word-distr-map.html
  
  Recomputing with fewer large blocks:
  
    cat bio-j-jsa.wds \
      | sed \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/cy/a/g' \
          -e 's/ci/a/g' \
          -e 's/in/m/g' \
          -e 's/ir/w/g' \
          -e 's/cs/z/g' \
          -e 's/cg/8/g' \
      | enum-words-in-blocks -vWPB=1010 \
      | egrep -v '[^a-zA-Z0-9_ ]' \
      | sort +1 -2 +0 -1n \
      | make-word-location-map -vCTWD=3 -vNBLOCKS=7 \
      > .foo
        
    cat .foo \
      | gawk '/./ { printf"%5d %-16s ", $1, $2; for (i=3; i<=NF; i++) printf " %2d", int(($(i)*99/$1)+0.5); printf "\n" }' \
      > .bar

  Results posted in my Voynich page.
  
97-08-01 stolfi
===============

  Recomputed the word distributions, adding the average positions and deviation,
  and using only good words (so that the blocks would be more uniform):.
  
    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/cy/a/g' \
          -e 's/ci/a/g' \
          -e 's/in/m/g' \
          -e 's/ir/w/g' \
          -e 's/cs/z/g' \
          -e 's/cg/8/g' \
      | enum-words-in-blocks -vWPB=100 \
      | sort +1 -2 +0 -1n \
      | make-word-location-map -vCTWD=1 -vPERCENT=1 -vNBLOCKS=47 \
      > .baz
    
  Here are the table lines for the most popular words (at least 20 occurrences) sorted by length
  (L) and total frequency:
  
    TOTAL   AVG   DEV  L WORD             ABSOLUTE FREQUENCY BY BLOCK                      RELATIVE FREQUENCY BY BLOCK   
    ----- ----- -----  - ---------------- -----------------------------------------------  -----------------------------------------------
      127  24.7  12.2  2 oe               15.66...2.11.12.3134926354.623697511211.4..2361  00.00...0.00.00.0000100000.000010000000.0..0000
       40  21.4  12.9  2 or               21.121..4........2124411.2...22.2.......1.121.1  00.000..1........0001100.0...00.0.......0.000.0
       35  22.8  15.1  2 8a               222.2..1.2...11..1111....1...21.3.2112...11..3.  111.1..0.1...00..0000....0...10.1.1001...00..1.
       20  21.7  11.5  2 am               ...1.11..1.12......2..11.3..1.2....1...1......1  ...0.00..0.01......1..00.1..0.1....0...0......0
       81  22.9  12.7  3 qoe              .4.441.1111.3.341..553.13.214123442211...141.12  .0.000.0000.0.000..110.00.000000000000...000.00
       73  26.3  13.1  3 8am              ..1.3..1.3.322334.611.21....12..7224.12.2312224  ..0.0..0.0.000000.100.00....00..1000.00.0000000
       51  22.8  16.1  3 8ar              125312111.....12.122.11.2.1...1.121.2.1..12721.  001100000.....00.000.00.0.0...0.000.0.0..00100.
       50  21.0  13.3  3 8ae              1131211.1.11.13341142.....1..2..111221.21.3...1  0010000.0.00.01110010.....0..0..000000.00.1...0
       31  23.7  14.0  3 zam              1....1..211412..11.1.1.1....1...1..1.1132...21.  0....0..100101..00.0.0.0....0...0..0.0011...10.
       25  23.3  11.4  3 oHa              ..1....221.1......21....213..1.2.11..21....1...  ..0....110.0......10....101..0.1.00..10....0...
       25  26.7  12.7  3 zoe              11.1.......11.....1.1.12..121.1.1.1...21311....  00.0.......00.....0.0.01..010.0.0.0...10100....
       23  23.1  13.4  3 oea              21..1..1.1.........11.2211.1...22...1.....1.11.  10..0..0.0.........00.1100.0...11...0.....0.00.
       79  24.4  13.6  4 qoHa             521311.21..3111.32...2.31352161.141.62323..222.  100000.00..0000.00...0.00010010.000.10000..000.
       69  22.8  14.7  4 zcca             62.21.16111.2..22...224.2.2114311.1.12212121321  10.00.01000.0..00...001.0.0001000.0.00000000000
       67  24.3  13.6  4 ccca             .2.3.154.2213.1.1....21..111332.4123236121111..  .0.0.011.0000.0.0....00..000000.1000001000000..
       39  25.6  12.4  4 oHae             ..2.2...21.....1.1111321.11214..21...211...1.22  ..0.0...00.....0.0000100.00001..00...000...0.00
       37  24.7  11.8  4 oHam             ..1.11.1...3.211..1.112131.123..1....11.2311...  ..0.00.0...1.000..0.000010.001..0....00.0100...
       35  15.4  12.6  4 oHar             21511.13221......211.1..2111..12.1.........1.1.  10100.01110......100.0..1000..01.0.........0.0.
       25  25.9  15.0  4 Hc8a             1..3........1....4112....1..1..1.....1...112121  0..1........0....1001....0..0..0.....0...001010
       21  21.8  14.1  4 oHca             1.1..4.....1.....2..11...2..1..21..1........21.  0.0..2.....0.....1..00...1..0..10..0........10.
      204  25.0  14.4  5 zcc8a            26431595524383364713542211463334552525467789574  00000000000000000000000000000000000000000000000
      172  22.7  13.8  5 ccc8a            46.1438463556719332362.413133235343254436532272  00.0000000000001000000.000000000000000000000000
      113  25.8  12.3  5 qoHae            221411.311...31955...32.148.5432.45.665118312..  000000.000...00100...00.001.0000.00.000001000..
       91  25.3  11.4  5 qoHam            ......221..3874251.5.1111114521.1.74.362.433...  ......000..0110000.0.0000000000.0.10.010.000...
       83  25.2  16.4  5 oHc8a            .15421.23432.211.417...111.....2211....32.31994  .01000.00000.000.001...000.....0000....00.00110
       54  13.6  10.6  5 qoHan            8214212....11621426111.1..11..1..11.1........1.  1001000....00100101000.0..00..0..00.0........0.
       48  23.2  13.4  5 qoHar            4..23.3....1........171..34211..121.222.11111..  1..01.1....0........010..11000..000.000.00000..
       43  21.7  14.4  5 qoHca            .22.1243...1.1.2..1.22.1.12..112111.3.....11112  .00.0011...0.0.0..0.00.0.00..000000.1.....00000
       34  23.8  12.9  5 oHcca            1...2122.1.1.......1..1233.1...3.311.....2111..  0...1011.0.0.......0..0111.0...1.100.....1000..
       31  25.0  12.0  5 cccca            1..11....1...23.11..1.11.1..31.1.131.21.11...1.  0..00....0...11.00..0.00.0..10.0.010.10.00...0.
       23  19.8   9.9  5 zccca            ....1.1.2111.11.1..11.21211.1......2......1....  ....0.0.1000.00.0..00.10100.0......1......0....
       21  25.4   9.9  5 qoHoe            ..........1.3.1..11..1.12...1311....11.....11..  ..........0.1.0..00..0.01...0100....00.....00..
      198  24.0  14.2  6 qoHc8a           41946411238.556699393.1271123..5345284285377583  00000000000.000001000.0000000..0000000000000000
       81  25.3  12.7  6 qoHcca           .11.244.11111311222..2.355.4.113.1264.31224.211  .00.000.00000000000..0.011.0.000.0010.00000.000
       56  22.4  13.7  6 oHcc8a           111.22.128.11....113111335....1211..1.1..11.62.  000.00.001.00....000000001....0000..0.0..00.10.
       52  21.7  12.6  6 eccc8a           11.1..4111331311221.21.2.....3..1.134.242......  00.0..1000110100000.00.0.....1..0.011.010......
       50  23.0  12.8  6 cccHca           31...1321.1.12.11.11.212.25.2.22211.111..1221..  10...0100.0.00.00.00.000.01.0.00000.000..0000..
       37  24.7  14.8  6 zccHca           ..2.1.251.1.....1...211...222.2...1....2.113211  ..0.0.010.0.....0...000...000.0...0....0.001000
       36  21.3  12.9  6 zccc8a           11.21.1..4.1.11.21..3.122...2..2..11..11.2..2..  00.10.0..1.0.00.10..1.011...1..1..00..00.1..1..
       21  24.7  13.5  6 ezcc8a           ...1..1.1.12..111...1.....1..2...2..2...1..11.1  ...0..0.0.01..000...0.....0..1...1..1...0..00.0
      183  21.4  13.3  7 qoHcc8a          449.44212683996225532.3863523222.1584335343353.  001.00000000100000000.0000000000.0000000000000.
       35  24.5  11.4  7 ccccHca          .1...11..121.1.1111..1..213611...1...1111..2.1.  .0...00..010.0.0000..0..101200...0...0000..1.0.
       31  23.6  12.2  7 zcccHca          .1.21.1..112....1.1.......4521....2.11.21.1....  .0.10.0..001....0.0.......1110....1.00.10.0....
       23  20.4  14.8  7 oeccc8a          ..1131..2.2....2...11.....1...1...2..11.1.....2  ..0010..1.1....1...00.....0...0...1..00.0.....1

  Recomputing the coarse table:

    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/cy/a/g' \
          -e 's/ci/a/g' \
          -e 's/in/m/g' \
          -e 's/ir/w/g' \
          -e 's/cs/z/g' \
          -e 's/cg/8/g' \
      | enum-words-in-blocks -vWPB=666 \
      | sort +1 -2 +0 -1n \
      | make-word-location-map -vCTWD=3 -vPERCENT=1 -vNBLOCKS=7 \
      > .bar
      
  Results posted in my Voynich page.
  
  For comparison, let's try English and Portuguese:
  
    cat engl.wds | tr '[A-Z]' '[a-z]' | head -4661 \
      | enum-words-in-blocks -vWPB=100 \
      | sort +1 -2 +0 -1n \
      | make-word-location-map -vCTWD=1 -vPERCENT=1 -vNBLOCKS=47 \
      > .baz
    
    TOTAL   AVG   DEV WORD             ABSOLUTE FREQUENCY BY BLOCK                      RELATIVE FREQUENCY BY BLOCK   
    ----- ----- ----- ---------------- -----------------------------------------------  -----------------------------------------------
      199  24.3  14.0 the              9.134354855516419225114342267572.41699419576541  0.000000000000000000000000000000.00000000000000
      165  23.2  13.0 a                14335233324554364463275662314144313.55413634432  00000000000000000000000000000000000.00000000000
      117  25.5  13.1 and              21122322313312.3323521..52255444113423313334332  00000000000000.0000000..00000000000000000000000
      114  23.4  12.9 of               211242314.3123632464.23432.3141.1262.431842..31  000000000.0000000000.00000.0000.0000.000100..00
      114  24.1  14.3 to               233324312221261.2325331321212321332331122246423  000000000000000.0000000000000000000000000000000
      105  23.3  13.3 i                356.12.114113.34.312233117213512125221445132..1  001.00.000000.00.000000001000000000000000000..0
       80  24.7  13.6 in               32.1.21.12.35132212221321.3221221..22221.324421  00.0.00.00.01000000000000.0000000..00000.000000
       59  25.1  12.5 she              ..21.214..11112...1311.14332.114222111...12.321  ..00.001..00000...0000.01000.001000000...00.000
       58  24.4  15.0 was              1312.11131.221..112213.11.12.3..1.12....125432.  0000.00000.000..000000.00.00.0..0.00....001100.
       54  27.0  13.5 her              ..2311.2....1112.1.31...313...1..3652..112.1213  ..0100.0....0000.0.10...101...0..1110..000.0001
       51  26.1  13.2 that             ..122.11111...22..2.3121..21.51.1..11.423.23.2.  ..000.00000...00..0.1000..00.10.0..00.101.01.0.
       50  22.8  11.3 you              ..2..21411..22..11..3.3.144...119.12...3.....1.  ..0..00100..00..00..1.1.011...002.00...1.....0.
       45  21.7  17.0 had              142361...1.11..1.1.1.1.......32.....11124.1213.  010110...0.00..0.0.0.0.......10.....00001.0001.
       43  23.7  15.9 as               11223121.2.1...1..1.1...111.1.1.312.21.1.3.1122  00001000.0.0...0..0.0...000.0.0.100.00.0.1.0000
       42  22.3  13.3 my               222..1...121.123.1..11.1111222.2..2....412....1  000..0...000.001.0..00.0000000.0..0....100....0
       38  18.6  12.6 he               .222311.1.12...1.23..4.1....4.1.2....112...1...  .000100.0.00...0.01..1.0....1.0.0....000...0...
       38  20.6  13.6 at               11.121111142...112...11..21.2.1111.....112111..  00.000000010...000...00..00.0.0000.....000000..
       38  24.1  14.4 with             1.11.1.111.2311..32....111.12.1....1..1.2.2.33.  0.00.0.000.0100..10....000.00.0....0..0.0.0.11.
       34  19.9  10.9 it               11.....1323.1...131.1115..1..2..21....1...11...  00.....0111.0...010.0001..0..1..10....0...00...
       30  24.5  15.4 for              .2.212.11.1..1..1........111..1.11.121.2221...1  .1.101.00.0..0..0........000..0.00.010.1110...0
       30  26.1   9.4 me               .1......1.....11.11.22.2113211..1112..1.12.....  .0......0.....00.00.11.1001100..0001..0.01.....
       27  23.2  10.3 is               ......12..2..1.2..2.22..1212.......12121.......  ......01..1..0.1..1.11..0101.......01010.......
       27  29.8  12.5 mrs              ..1..1.........121.11..2..2.11.1.1..1.1..21231.  ..0..0.........010.00..1..1.00.0.0..0.0..10110.
       26  14.3  11.7 his              .3.324....1..1...22111.......11..2........1....  .1.111....0..0...11000.......00..1........0....
       24  30.9  14.2 we               11.......1.1....2..........2.13..1...2....13.5.  00.......0.0....1..........1.01..0...1....01.2.
       23  25.0  12.9 on               ...1..2.1.1..1...121.1...2.....11.11.2..1.2..1.  ...0..1.0.0..0...010.0...1.....00.00.1..0.1..0.
       22  23.2  13.2 be               ..2....2....22.1.....112..1.1..1......12.111...  ..1....1....11.0.....001..0.0..0......01.000...
       21  25.3  13.1 up               .1....1.1.21.1........1..1121....111...3..1...1  .0....0.0.10.0........0..0010....000...1..0...0
       20  21.9  14.9 an               1.11..1.1.111.11..1........1...2.1.1...1..1..2.  0.00..0.0.000.00..0........0...1.0.0...0..0..1.
       20  22.2  12.1 john             .1..11..11...111.1.......1111.2..1..112........  .0..00..00...000.0.......0000.1..0..001........
       20  23.3  13.5 but              ....2.22..........1..122..1.....111..1....1.11.  ....1.11..........0..011..0.....000..0....0.00.
    
    cat port.wds | tr '[A-Z]' '[a-z]' \
      | egrep -v '^x$' \
      | head -4661 \
      | enum-words-in-blocks -vWPB=100 \
      | sort +1 -2 +0 -1n \
      | make-word-location-map -vCTWD=1 -vPERCENT=1 -vNBLOCKS=47 \
      > .boh
    
97-08-02 stolfi
===============

  A wild guess: the `P' gallows may be just an ornate form of the 
  \s/ plume that an occur on top of the \cc/ ligature (FSG [T], 
  Currier [S], Frogguy <et>).  I.e. the `cPc' combination
  is a variant of `zc' (FSG [S], Currier [Z], Frogguy <e't>).
  
  Let's check:
  
    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
          -e 's/ir/w/g' \
          -e 's/in/m/g' \
      | compare-contexts -lctx 1 -rctx 1 -colw 24 \
          'cPc' \
          'cHc' \
          'zc'
          
    10 0.45 ccPcc            170 0.61 ccHca            529 0.64 _zcc
     5 0.23 _cPcc             46 0.16 ccHcc             87 0.11 ezcc
     2 0.09 ccPca             16 0.06 ccHc8             32 0.04 Hzcc
     2 0.09 _cPca             10 0.04 ocHcc             27 0.03 8zcc
     1 0.05 ocPcc              8 0.03 _cHcc             19 0.02 azcc
     1 0.05 _cPco              6 0.02 ocHca             19 0.02 _zco
     1 0.05 _cPc8              4 0.01 qcHcc             16 0.02 _zca
 ----- ---- ----               3 0.01 _cHc8             13 0.02 ezc8
    22 1.00 TOT                2 0.01 zcHcc             12 0.01 rzcc
                               2 0.01 qcHc8             10 0.01 Pzcc
                               2 0.01 ccHco              8 0.01 zzcc
                               2 0.01 acHca              7 0.01 Hzc8
                               2 0.01 _cHco              6 0.01 _zcH
                               2 0.01 _cHca              6 0.01 _zc8
                               1 0.00 zcHco              4 0.00 ozcc
                               1 0.00 qcHca              3 0.00 _zce
                               1 0.00 ocHco              3 0.00 8zco
                               1 0.00 ccHc_              2 0.00 ezco
                               1 0.00 acHcc              2 0.00 ezca
                           ----- ---- ----               2 0.00 czcc
                             280 1.00 TOT                2 0.00 Pzc8
                                                         2 0.00 Hzca
                                                         1 0.00 zzcH
                                                         1 0.00 rzc8
                                                         1 0.00 ezcz
                                                         1 0.00 ezce
                                                         1 0.00 ezc_
                                                         1 0.00 ezcH
                                                         1 0.00 czco
                                                         1 0.00 czca
                                                         1 0.00 czc8
                                                         1 0.00 _zcr
                                                         1 0.00 _zc_
                                                         1 0.00 Pzco
                                                         1 0.00 Pzca
                                                         1 0.00 8zc8
                                                     ----- ---- ----
                                                       825 1.00 TOT 
                                                                    

  Hmmm... `cPc' is not much like `zc' but not to unlike either.
  It resembles `zc' more than it resembles `cHc'.
  
    cat bio-j-jsa-gut.wds \
      | sed \
          -e 's/^/_/g' \
          -e 's/$/_/g' \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
          -e 's/ir/w/g' \
          -e 's/in/m/g'\
      > .wds
      
    set ff = ( 'cPc' 'zc' )
    set ofiles = ( )
    foreach f ( $ff )
      cat .wds \
        | grep $f \
        | sed -e "s/${f}/@/g" \
        | sort | uniq -c \
        > ${f}.wds
      set ofiles = ( ${ofiles} ${f}.wds )
    end
    
    /n/gnu/bin/join -a1 -e '---' -j1 2 -j2 2 -o 0,1.1,2.1 ${ofiles} \
      | gawk '/./ { printf "%5d %5d  %-16s\n", $2, $3, $1 }' \
      | sort -nr
    unset ff
    unset ofiles
                                                                    
       zc   cPc  word
     ----  ----  -------
       69     3  _@ca_           
       36     1  _@cc8a_         
       23     1  _@cca_          
       11     1  _@oe_           
        5     1  _@ar_           
        1     1  _@ae_           

        0     2  _zcc@a_         
        0     2  _zc@ca_         
        0     2  _cc@ca_         
        0     1  _zco@c8a_       
        0     1  _zc@cc8a_       
        0     1  _ecc@c8a_       
        0     1  _ccc@c8a_       
        0     1  _cc@cc8a_       
        0     1  _cc@c8a_        
        0     1  _c@cc8a_        
        0     1  _@8or_          
                                                                    
  Hmm.  Not exactly impressive.  But not discouraging either...
  

  OK, now something completely different again. Let's look for colocates of the popular words:
  
  cat bio-j-jsa.wds \
      | sed \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
          -e 's/ir/w/g' \
          -e 's/in/m/g'\
      > .wds
      
    foreach f ( `cat .popular.wds` )
      ( echo ' ' ;\
        echo '      after '"$f" ;\
        echo ' -----------------------' ;\
        cat .wds \
          | enum-words-after -vWORD=${f} \
          | wfreq \
          | head -5 \
      ) \
      | gawk '/./ { printf "%-24s\n", $0 }'
    end

          after zcca              after ccca        
     ----------------------- -----------------------
         4 0.06 qoHc?m           4 0.06 qoHc?m      
         4 0.06 qoHc8a           4 0.06 //          
         4 0.06 qoHam            3 0.04 qoHcc8a     
         3 0.04 qoHcc8a          2 0.03 rae         
         3 0.04 qoHa             2 0.03 qoe         

          after zcc8a             after ccc8a             after oeHc8a      
     ----------------------- ----------------------- -----------------------
        16 0.08 qoHc8a          11 0.06 //               4 0.21 //          
        13 0.06 qoHcc8a          9 0.05 qoHc8a           2 0.11 qoHc8a      
        11 0.05 qoHam            8 0.05 qoe              1 0.05 zcccHcc8a   
         9 0.04 //               8 0.05 qoHc?m           1 0.05 r??r        
         8 0.04 qoe              6 0.03 qoHar            1 0.05 qoe?ccca    

          after ezcc8a            after oeccc8a     
     ----------------------- -----------------------
         3 0.14 //               6 0.26 //          
         2 0.10 qoHcc8a          2 0.09 qoHcc8a     
         2 0.10 qoHa             1 0.04 qor         
         1 0.05 qoeccc8a         1 0.04 qoe?zcc8a   
         1 0.05 qoe              1 0.04 qoe         
    

          after qoHc8a            after qoHcc8a           after Hc8a              after oHc8a       
     ----------------------- ----------------------- ----------------------- -----------------------
        14 0.07 qoHcc8a         15 0.08 qoHc8a           2 0.08 qoHc8a           8 0.10 qoHc8a      
        14 0.07 qoHc8a           8 0.04 qoHcc8a          2 0.08 oHc8a            4 0.05 zcc8a       
         8 0.04 zcc8a            8 0.04 qoHae            2 0.08 ccc8a            4 0.05 oHc8a       
         8 0.04 ccc8a            6 0.03 ccc8a            2 0.08 8ar              3 0.04 Hc8a        
         7 0.04 oHc8a            5 0.03 qoHcca           2 0.08 //               3 0.04 //          

          after zccc8a            after cccc8a            after eccc8a      
     ----------------------- ----------------------- -----------------------
         5 0.14 qoHcc8a          3 0.16 qoHcc8a          9 0.17 //          
         3 0.08 qoe              2 0.11 qoHam            3 0.06 zcc8a       
         3 0.08 qoHc8a           2 0.11 qoHae            3 0.06 qoHcc8a     
         2 0.06 qoHcca           2 0.11 //               3 0.06 qoHam       
         1 0.03 zccHa            1 0.05 z??e             3 0.06 qoHa        

          after oHcc8a      
     -----------------------
         5 0.09 //          
         4 0.07 qoHa        
         3 0.05 zcc8a       
         3 0.05 qoHcc8a     
         3 0.05 qoHc8a      

          after zccHca      
     -----------------------
         3 0.08 qoHc8a      
         2 0.05 qoea        
         2 0.05 qoHca       
         1 0.03 zoea        
         1 0.03 zcc8a       

          after cccHca      
     -----------------------
         4 0.08 qoHa        
         4 0.08 //          
         3 0.06 qoHc?m      
         2 0.04 zae         
         2 0.04 qoHcca      

          after ccccHca           after zcccHca     
     ----------------------- -----------------------
         3 0.09 qoHae            3 0.10 qoHc?m      
         2 0.06 qoHc?m           3 0.10 qoHae       
         2 0.06 qoHa             2 0.06 qoHcc8a     
         2 0.06 oHae             2 0.06 qoHc8a      
         2 0.06 eor              1 0.03 zcca        

          after oHcca       
     -----------------------
         2 0.06 qoe         
         2 0.06 oHa         
         1 0.03 raecce      
         1 0.03 qoe?ccca    
         1 0.03 qoHoe       

          after cccca             after zccca       
     ----------------------- -----------------------
         2 0.06 ram              2 0.09 qoHc?m      
         2 0.06 qoHc?m           2 0.09 qoHc8a      
         2 0.06 qoHam            2 0.09 qoHam       
         1 0.03 zcccHcca         2 0.09 or          
         1 0.03 zcccHca          1 0.04 ra          

          after qoHca             after oHca        
     ----------------------- -----------------------
         3 0.07 qoHae            2 0.10 qoHc8a      
         3 0.07 qoHa             1 0.05 zccHcc8a    
         3 0.07 ezcc8a           1 0.05 qoezca      
         2 0.05 qoHc?m           1 0.05 qoHca       
         2 0.05 qoHc8a           1 0.05 qoHan       


          after qoHcca            after oeHcca      
     ----------------------- -----------------------
         6 0.07 qoHc8a           1 0.05 zcoHam      
         4 0.05 oHcca            1 0.05 rccz        
         3 0.04 zcc8a            1 0.05 rccca       
         3 0.04 ram              1 0.05 ram         
         3 0.04 qoHae            1 0.05 rae         

          after oe                after qoe               after or          
     ----------------------- ----------------------- -----------------------
        16 0.13 //               9 0.11 ccc8a            3 0.07 zcc8a       
        10 0.08 zcc8a            9 0.11 //               3 0.07 //          
         6 0.05 ccc8a            6 0.07 zcc8a            2 0.05 or          
         3 0.02 zccc8a           4 0.05 cccc8a           2 0.05 ccca        
         3 0.02 oHcc8a           3 0.04 oe               2 0.05 ae          

          after eoe         
     -----------------------
        10 0.59 //          
         1 0.06 oeoe??joe?Hc8a
         1 0.06 cccor       
         1 0.06 ccca??scca  
         1 0.06 ccca           

          after qoHoe       
     -----------------------
         3 0.14 zcc8a       
         2 0.10 ccoe        
         2 0.10 cc8a        
         1 0.05 zccc8a      
         1 0.05 zccHcca     

          after qoHae             after qoHam             after qoHan             after qoHar       
     ----------------------- ----------------------- ----------------------- -----------------------
         9 0.08 ccc8a            6 0.07 zcc8a            6 0.11 cccHca           7 0.15 zcc8a       
         9 0.08 //               5 0.05 zccHca           3 0.06 ccc8a            4 0.08 oe          
         7 0.06 zcc8a            5 0.05 ccccHca          3 0.06 8ar              2 0.04 zcca        
         5 0.04 8ar              5 0.05 ccc8a            2 0.04 zccoe            2 0.04 cccHca      
         3 0.03 zcccHca          5 0.05 //               2 0.04 oHc8a            2 0.04 ccc8a       


          after oHae              after oHam              after oHar        
     ----------------------- ----------------------- -----------------------
         4 0.10 ccc8a            3 0.08 zcc8a            4 0.11 zcc8a       
         4 0.10 //               3 0.08 //               3 0.09 oHc8a       
         2 0.05 8ae              2 0.05 oHc?m            3 0.09 ccc8a       
         1 0.03 zccca            2 0.05 oHc8a            2 0.06 oe          
         1 0.03 zccc8a           2 0.05 cccHca           2 0.06 8ar         

          after qoHa        
     -----------------------
        20 0.25 //          
         4 0.05 8am         
         3 0.04 zam         
         3 0.04 qoHae       
         3 0.04 ccc8a       

          after 8am               after 8ar               after 8ae         
     ----------------------- ----------------------- -----------------------
         8 0.11 //               5 0.10 //               8 0.16 //          
         5 0.07 ccca             4 0.08 zcc8a            4 0.08 zcc8a       
         3 0.04 zcca             4 0.08 oe               2 0.04 or          
         3 0.04 zcc8a            2 0.04 zcca             2 0.04 oeccc8a     
         2 0.03 zccHca           1 0.02 zccor            2 0.04 eccc8a      

          after 8a          
     -----------------------
        11 0.31 //          
         3 0.09 8am         
         2 0.06 qoHae       
         1 0.03 zcccHa      
         1 0.03 zcca        

          after oHa         
     -----------------------
         7 0.28 //          
         1 0.04 zcoe        
         1 0.04 zcca        
         1 0.04 qoHor       
         1 0.04 qoHca       

          after zam               after zoe         
     ----------------------- -----------------------
         3 0.10 //               3 0.12 zcc8a       
         2 0.06 zcca             2 0.08 ccca        
         2 0.06 zcc8a            2 0.08 ccc8a       
         2 0.06 ccc8a            1 0.04 zcccoe      
         1 0.03 zcccHca          1 0.04 zccca       

          after am          
     -----------------------
         2 0.10 oHc?m       
         2 0.10 //          
         1 0.05 zcc8a       
         1 0.05 z?cc?       
         1 0.05 z?Hae       

          after oea         
     -----------------------
        18 0.78 //          
         2 0.09 zcca        
         1 0.04 qoHcc8a     
         1 0.04 oHae        
         1 0.04 cccc8a      

          after //          
     -----------------------
        75 0.10 =           
        28 0.04 qoHcc8a     
        15 0.02 zoe         
        15 0.02 qoHc8a      
        15 0.02 8zcc8a      

          after =           
     -----------------------
         3 0.04 Poe         
         2 0.03 Poeccc8a    
         2 0.03 Pccc8ar     
         1 0.01 zcca        
         1 0.01 zaHam       

  Once again, for the words that are often followed by `//' (which
  may be actually half-words), skipping the `//':
  
    foreach f ( ccc8a oeHc8a ezcc8a oeccc8a eccc8a oHcc8a oe qoe or eoe 8am 8ar 8ae 8a oHa zam oea )
      ( echo ' ' ;\
        echo '      after '"$f" ;\
        echo ' -----------------------' ;\
        cat .wds \
          | grep -v '//' \
          | enum-words-after -vWORD=${f} \
          | wfreq \
          | head -5 \
      ) \
      | gawk '/./ { printf "%-24s\n", $0 }'
    end

          after ccc8a             after oeHc8a      
     ----------------------- -----------------------
         9 0.05 qoHc8a           2 0.11 qoHc8a      
         8 0.05 qoe              1 0.05 zcccHcc8a   
         8 0.05 qoHc?m           1 0.05 r??r        
         6 0.03 qoHar            1 0.05 qoe?ccca    
         6 0.03 qoHae            1 0.05 qoHam       

          after ezcc8a            after eccc8a            after oeccc8a           after oHcc8a      
     ----------------------- ----------------------- ----------------------- -----------------------
         2 0.10 qoHcc8a          3 0.06 zcc8a            2 0.09 qoHcc8a          5 0.09 qoHcc8a     
         2 0.10 qoHa             3 0.06 qoHcc8a          2 0.09 qoHc?m           4 0.07 qoHa        
         2 0.10 =                3 0.06 qoHam            1 0.04 qor              3 0.05 zcc8a       
         1 0.05 qoeccc8a         3 0.06 qoHa             1 0.04 qoe?zcc8a        3 0.05 qoHc8a      
         1 0.05 qoe              2 0.04 qoHc8a           1 0.04 qoe              2 0.04 oeHc?m      

          after oe                after qoe         
     ----------------------- -----------------------
        10 0.08 zcc8a            9 0.11 ccc8a       
         6 0.05 ccc8a            6 0.07 zcc8a       
         3 0.02 zccc8a           4 0.05 cccc8a      
         3 0.02 qoHcc8a          3 0.04 qoe         
         3 0.02 oHcc8a           3 0.04 oe          

          after or          
     -----------------------
         3 0.07 zcc8a       
         2 0.05 or          
         2 0.05 ccca        
         2 0.05 ae          
         1 0.03 zccoeo      

          after eoe         
     -----------------------
         1 0.06 zzcc8a      
         1 0.06 qocHca      
         1 0.06 qoHcca      
         1 0.06 qoHcc8a     
         1 0.06 qoHan       

          after 8am               after 8ar               after 8ae         
     ----------------------- ----------------------- -----------------------
         5 0.07 ccca             4 0.08 zcc8a            4 0.08 zcc8a       
         3 0.04 zcca             4 0.08 oe               3 0.06 qoHc8a      
         3 0.04 zcc8a            2 0.04 zcca             2 0.04 or          
         3 0.04 oHc8a            2 0.04 qoHcca           2 0.04 oeccc8a     
         2 0.03 zccHca           2 0.04 8ar              2 0.04 eccc8a      

          after 8a          
     -----------------------
         3 0.09 qoHcc8a     
         3 0.09 qoHae       
         3 0.09 8am         
         2 0.06 qoHam       
         1 0.03 zzcc8a      

          after oHa         
     -----------------------
         2 0.08 qoHca       
         1 0.04 zoeHa       
         1 0.04 zcoe        
         1 0.04 zcca        
         1 0.04 zc?m        

          after zam         
     -----------------------
         2 0.06 zcca        
         2 0.06 zcc8a       
         2 0.06 ccc8a       
         1 0.03 zoeHcc8a    
         1 0.03 zcccHca     

          after oea         
     -----------------------
         2 0.09 zcca        
         2 0.09 zc?m        
         2 0.09 qoe         
         1 0.04 zoeccca     
         1 0.04 zcoe        

  Let's compare with English:
  
    cat engl.wds \
      | head -7053 \
      | tr '[A-Z]' '[a-z]' \
      > .e.wds
    
    foreach f ( `cat .popengl.wds` )
      ( echo ' ' ;\
        echo '      after '"$f" ;\
        echo ' -----------------------' ;\
        cat .e.wds \
          | enum-words-after -vWORD=${f} \
          | wfreq \
          | head -5 \
      ) \
      | gawk '/./ { printf "%-24s\n", $0 }'
    end

          after the         
     -----------------------
        11 0.03 door        
         9 0.03 house       
         7 0.02 hall        
         6 0.02 village     
         4 0.01 same        

          after a           
     -----------------------
         7 0.03 very        
         7 0.03 great       
         7 0.03 few         
         5 0.02 little      
         4 0.02 man         

          after and         
     -----------------------
        13 0.07 i           
         9 0.05 the         
         4 0.02 we          
         4 0.02 that        
         4 0.02 he          

          after of          
     -----------------------
        34 0.20 the         
        12 0.07 a           
         9 0.05 his         
         8 0.05 her         
         8 0.05 course      

          after to          
     -----------------------
        23 0.12 the         
        11 0.06 be          
         9 0.05 me          
         7 0.04 her         
         5 0.03 see         

          after i           
     -----------------------
        15 0.09 had         
         8 0.05 was         
         6 0.04 will        
         6 0.04 have        
         6 0.04 asked       

          after in          
     -----------------------
        31 0.27 the         
        15 0.13 a           
         7 0.06 his         
         6 0.05 her         
         3 0.03 an          

          after she         
     -----------------------
         9 0.11 was         
         4 0.05 is          
         3 0.04 seemed      
         3 0.04 said        
         3 0.04 looked      

          after was         
     -----------------------
        19 0.18 a           
         5 0.05 to          
         4 0.04 in          
         3 0.03 waiting     
         3 0.03 very        

          after her         
     -----------------------
         5 0.07 hand        
         5 0.07 as          
         3 0.04 own         
         3 0.04 husband     
         2 0.03 to          

          after that        
     -----------------------
         6 0.08 i           
         3 0.04 we          
         3 0.04 she         
         3 0.04 night       
         3 0.04 he          

          after you         
     -----------------------
         5 0.06 think       
         5 0.06 know        
         4 0.05 could       
         4 0.05 are         
         3 0.04 in          

          after had         
     -----------------------
        13 0.19 a           
         9 0.13 been        
         2 0.03 taken       
         2 0.03 seen        
         2 0.03 occurred    

          after as          
     -----------------------
         9 0.16 i           
         8 0.14 a           
         6 0.11 we          
         5 0.09 she         
         4 0.07 he          

          after my          
     -----------------------
         5 0.08 mind        
         4 0.07 dear        
         3 0.05 first       
         2 0.03 window      
         2 0.03 wife        

          after he          
     -----------------------
         9 0.14 was         
         8 0.13 had         
         3 0.05 turned      
         2 0.03 looked      
         2 0.03 came        

          after at          
     -----------------------
        11 0.22 the         
         5 0.10 once        
         4 0.08 styles      
         3 0.06 her         
         2 0.04 tadminster  

          after with        
     -----------------------
        14 0.24 a           
         9 0.16 the         
         3 0.05 us          
         2 0.03 you         
         2 0.03 some        

          after it          
     -----------------------
        11 0.17 was         
         5 0.08 to          
         5 0.08 is          
         3 0.05 seemed      
         2 0.03 the         

          after for         
     -----------------------
         6 0.15 the         
         5 0.13 some        
         5 0.13 a           
         3 0.08 me          
         2 0.05 us          

          after me          
     -----------------------
         3 0.07 that        
         2 0.04 with        
         2 0.04 to          
         2 0.04 the         
         2 0.04 she         

          after is          
     -----------------------
         4 0.10 a           
         3 0.08 very        
         2 0.05 up          
         2 0.05 to          
         2 0.05 mrs         

          after mrs         
     -----------------------
        24 0.56 inglethorp  
        10 0.23 cavendish   
         3 0.07 inglethorps 
         2 0.05 cavendishs  
         1 0.02 rolleston   

          after his         
     -----------------------
         4 0.09 wife        
         4 0.09 face        
         2 0.04 mothers     
         2 0.04 manner      
         2 0.04 brother     

          after we          
     -----------------------
         7 0.16 had         
         4 0.09 are         
         3 0.07 drove       
         2 0.04 were        
         2 0.04 should      

          after on          
     -----------------------
        15 0.44 the         
         3 0.09 a           
         2 0.06 our         
         2 0.06 his         
         1 0.03 to          

          after be          
     -----------------------
         4 0.12 done        
         4 0.12 a           
         2 0.06 mine        
         2 0.06 able        
         1 0.03 was         

          after up          
     -----------------------
         6 0.19 in          
         4 0.12 to          
         4 0.12 at          
         2 0.06 the         
         2 0.06 my          

          after an          
     -----------------------
         3 0.12 old         
         1 0.04 otherwise   
         1 0.04 orphan      
         1 0.04 inaccessible
         1 0.04 impression  

          after john        
     -----------------------
         5 0.18 cavendish   
         2 0.07 he          
         1 0.04 with        
         1 0.04 will        
         1 0.04 was         

          after but         
     -----------------------
         4 0.12 she         
         4 0.12 im          
         2 0.06 there       
         2 0.06 the         
         2 0.06 as          

          after him         
     -----------------------
         3 0.09 i           
         3 0.09 and         
         2 0.06 from        
         2 0.06 at          
         1 0.03 youve       

  Seems great, let's try it for Portuguese:
  
    cat port.wds \
      | head -7053 \
      | tr '[A-Z]' '[a-z]' \
      > .p.wds
    
    foreach f ( `cat .popport.wds` )
      ( echo ' ' ;\
        echo '      after '"$f" ;\
        echo ' -----------------------' ;\
        cat .p.wds \
          | enum-words-after -vWORD=${f} \
          | wfreq \
          | head -5 \
      ) \
      | gawk '/./ { printf "%-24s\n", $0 }'
    end
  
          after de          
     -----------------------
        94 0.23 x           
        36 0.09 um          
        16 0.04 colagem     
        14 0.03 triângulos  
        13 0.03 tipo        

          after a           
     -----------------------
        18 0.07 superfície  
        18 0.07 figura      
        13 0.05 topologia   
        12 0.04 x           
        12 0.04 aresta      

          after e           
     -----------------------
        46 0.25 x           
        11 0.06 a           
         7 0.04 o           
         7 0.04 faces       
         6 0.03 que         

          after que         
     -----------------------
        13 0.07 a           
        11 0.06 x           
         8 0.05 os          
         8 0.05 cada        
         6 0.03 são         

          after um          
     -----------------------
        22 0.16 complexo    
        10 0.07 vértice     
         7 0.05 ladrilho    
         6 0.04 modelo      
         6 0.04 arco        

          after é           
     -----------------------
        16 0.15 um          
         9 0.08 uma         
         9 0.08 o           
         6 0.06 a           
         5 0.05 possível    

          after da          
     -----------------------
        20 0.19 superfície  
        16 0.15 aresta      
         9 0.08 triangulação
         9 0.08 mesma       
         6 0.06 figura      

          after uma         
     -----------------------
        11 0.11 aresta      
         8 0.08 função      
         5 0.05 variedade   
         5 0.05 superfície  
         5 0.05 configuração

          after o           
     -----------------------
         6 0.06 ladrilho    
         5 0.05 complexo    
         5 0.05 arco        
         4 0.04 problema    
         4 0.04 primeiro    

          after do          
     -----------------------
        26 0.34 complexo    
        11 0.14 ladrilho    
         5 0.06 objeto      
         3 0.04 x           
         3 0.04 plano       

          after aresta      
     -----------------------
        19 0.28 x           
         9 0.13 de          
         5 0.07 orientada   
         5 0.07 dual        
         2 0.03 veja        

          after para        
     -----------------------
        13 0.17 a           
         7 0.09 o           
         6 0.08 todo        
         6 0.08 cada        
         4 0.05 os          

          after complexo    
     -----------------------
        25 0.41 celular     
         5 0.08 original    
         4 0.07 x           
         2 0.03 dado        
         2 0.03 com         

          after em          
     -----------------------
         6 0.09 x           
         5 0.08 cada        
         4 0.06 que         
         3 0.05 vez         
         3 0.05 três        

          after os          
     -----------------------
        14 0.21 vértices    
         5 0.07 arcos       
         4 0.06 pontos      
         4 0.06 dois        
         3 0.04 ângulos     

          after por         
     -----------------------
        11 0.18 um          
         9 0.15 x           
         6 0.10 uma         
         6 0.10 exemplo     
         3 0.05 outro       

          after cada        
     -----------------------
        12 0.20 aresta      
        10 0.16 ladrilho    
         6 0.10 um          
         5 0.08 vértice     
         5 0.08 face        

          after as          
     -----------------------
        16 0.23 arestas     
         8 0.11 faces       
         6 0.09 funções     
         5 0.07 duas        
         3 0.04 relações    

          after com         
     -----------------------
         8 0.14 x           
         5 0.09 a           
         4 0.07 vértices    
         4 0.07 o           
         4 0.07 mesma       

          after arestas     
     -----------------------
         9 0.15 e           
         5 0.08 x           
         5 0.08 de          
         5 0.08 da          
         2 0.03 ve          

          after superfície  
     -----------------------
         5 0.10 e           
         5 0.10 de          
         3 0.06 que         
         3 0.06 como        
         2 0.04 seja        

          after no          
     -----------------------
         8 0.20 sentido     
         4 0.10 mesmo       
         3 0.07 espaço      
         2 0.05 plano       
         2 0.05 máximo      

          after são         
     -----------------------
         4 0.10 os          
         2 0.05 suficientes 
         2 0.05 similares   
         2 0.05 representados
         2 0.05 partes      

          after vértices    
     -----------------------
        17 0.31 de          
         7 0.13 e           
         3 0.05 do          
         2 0.04 distintos   
         2 0.04 a           

          after face        
     -----------------------
         7 0.23 x           
         4 0.13 de          
         3 0.10 esquerda    
         2 0.06 seja        
         2 0.06 e           

          after faces       
     -----------------------
         5 0.15 de          
         4 0.12 adjacentes  
         3 0.09 e           
         2 0.06 vértices    
         2 0.06 triangulares

          after arco        
     -----------------------
        11 0.39 x           
         4 0.14 encontrado  
         3 0.11 de          
         2 0.07 inicial     
         2 0.07 com         

          after como        
     -----------------------
         5 0.14 sendo       
         5 0.14 a           
         4 0.11 uma         
         3 0.08 um          
         2 0.05 x           

          after na          
     -----------------------
         9 0.23 figura      
         4 0.10 fronteira   
         3 0.07 verdade     
         3 0.07 superfície  
         3 0.07 etapa       

          after se          
     -----------------------
         6 0.16 x           
         2 0.05 toda        
         2 0.05 elas        
         2 0.05 comporta    
         2 0.05 a           

          after figura      
     -----------------------
         7 0.18 a           
         3 0.08 note        
         2 0.05 portanto    
         2 0.05 mostra      
         2 0.05 mais        

          after ser         
     -----------------------
         2 0.06 feita       
         2 0.06 colado      
         1 0.03 úteis       
         1 0.03 visto       
         1 0.03 variedades  

          after ao          
     -----------------------
         6 0.19 mesmo       
         5 0.16 longo       
         4 0.12 redor       
         4 0.12 avançarmos  
         2 0.06 ladrilho    

          after ou          
     -----------------------
         6 0.16 seja        
         5 0.14 x           
         3 0.08 em          
         2 0.05 dois        
         2 0.05 curvas      

          after celular     
     -----------------------
         8 0.32 x           
         3 0.12 é           
         3 0.12 e           
         2 0.08 original    
         2 0.08 existem     

          after vértice     
     -----------------------
        11 0.31 de          
         5 0.14 x           
         2 0.06 minimizar   
         2 0.06 mais        
         2 0.06 l           

          after ladrilho    
     -----------------------
         5 0.11 de          
         3 0.07 x           
         3 0.07 vértice     
         2 0.04 é           
         2 0.04 tem         

          after não         
     -----------------------
         3 0.11 é           
         2 0.07 tem         
         2 0.07 são         
         2 0.07 ocorrem     
         1 0.04 têm         

          after dos         
     -----------------------
         3 0.10 vértices    
         3 0.10 triângulos  
         3 0.10 quais       
         2 0.06 x           
         2 0.06 polinômios  

          after topologia   
     -----------------------
         6 0.30 de          
         4 0.20 do          
         3 0.15 da          
         2 0.10 das         
         1 0.05 prova-se    

          after complexos   
     -----------------------
        11 0.61 celulares   
         4 0.22 que         
         1 0.06 orientáveis 
         1 0.06 não         
         1 0.06 admitem     

          after lado        
     -----------------------
         4 0.24 da          
         2 0.12 x           
         2 0.12 do          
         2 0.12 de          
         2 0.12 a           

          after colagem     
     -----------------------
         6 0.18 de          
         2 0.06 só          
         2 0.06 que         
         2 0.06 e           
         2 0.06 deve        

          after tem         
     -----------------------
         2 0.10 um          
         2 0.10 todos       
         1 0.05 x           
         1 0.05 uma         
         1 0.05 todas       

          after das         
     -----------------------
         9 0.36 arestas     
         5 0.20 funções     
         4 0.16 faces       
         2 0.08 energias    
         1 0.04 idéias      
  
97-08-08 stolfi
===============

  Over the past weekend I stayed home and played a bit with the 
  word-pair tables above.  I printed the Voynich word-pair table
  and cut it up into little index cards, one for each left-word.
  Then I tried to group the left-words into classes, based on the
  most popular words that followed them.  I identified the following
  classes:
  
    (1) positional class: a coarse classification, based on how often
    the word occurs in line-final position, i.e. right before "//".
    
      Very often final:
      
        oea 8a qoHa oHa eoe
        
      Moderately often final:
      
        czcc8a oeccc8a eccc8a
        am 
        ccca ccc8a oeHc8a
        or
        oHae qoHae
        qoe oe
        qoHam oHam
        zam
        8am 8ar 8ae
        oHcc8a cccHca
        Hc8a qoHcca oHc8a
        
      Rarely if ever final:
      
        cccca
        oeHcca oHcca
        oHca zccHca
        zcccHca
        qoHan
        qoHcc81 qoHc8a
        qoHca ccccHca
        zcc8a zccca zcca
        cccc8a zccc8a

    Presumably, if a word is unusually common in that position, the
    cause is that it often occurs at the end of sentences, hence at
    the end of paragraphs, which always end at the end of a line.

    (2) post-contextual class: a finer classification, based on 
    the few most common words following the word in question
    (including "//", if common enough).
    
      Mostly followed by {// zcca}:
      
          oHa oea
          
      Mostly followed by {// zcc8a, ccc8a}:
      
          qoHam qoHc?m oham
          qohae oHae
          zamm ram, oHc?m
          8am 8ar 8ae
          or oe
          qoe

      Mostly followed by {// 8am zam}:
      
          qoHa
          
      Mostly followed by {// qoHc8a}:
      
          ccc8a oeHc8a
          ccca
          
      Mostly followed by {// qoHa}:
      
          cccHca oHcc8a
      
      Mostly followed by {// qoHcc8a qoe}:
      
          oeccc8a eccc8a ezcc8a
      
      Mostly followed by {qoHc8a qohcc8a ccc8a}:
      
          qoHc8a qoHcc8a
          
      Mostly followed by {zcc8a ccc8a oe}:
      
          qoHar oHar
          qoHoe zoe
          
      Mostly followed by {qoHam qoHc?m qoHc8a qohar //}:
      
          zcca zccca zcc8a 
          zcccHca
          
      Mostly followed by {qoHcc8a}:
      
          zccc8a cccc8a
      
      Mostly followed by {qoHae qoHc?m qoHa}:
      
          ccccHca qoHca
          
      Mostly followed by {qoHc8a qoHca}:
      
          oHca zccHca
      
      Mostly followed by {qoHc8a oHc8a}:
      
          Hc8a oHc8a qoHcca
      
    The `qoHc?m' words are generally instances where Friedman
    has [4ODAM] and Currier has [4ODAN].

  The general impression is that of words in a natural language
  (as opposed to random words).
  
  I wrote a script to compute and print word-pair frequencies.
  To save memory, the words are divided into two sets,
  the "keys" K (usually the 20-so most common words) and the
  "bores" B (all the rest); and only the K-K, K-B, and B-K sub-tables
  are computed.
  
    cat bio-j-jsa.wds \
      | sed \
          -e 's/[ql]j/H/g' \
          -e 's/[ql]g/P/g' \
          -e 's/cs/z/g' \
          -e 's/ij/k/g' \
          -e 's/ix/e/g' \
          -e 's/is/r/g' \
          -e 's/iiu/n/g' \
          -e 's/y/i/g' \
          -e 's/ci/a/g' \
          -e 's/cg/8/g' \
          -e 's/ir/w/g' \
          -e 's/in/m/g'\
      > .wds

    cat .keys
    
      //
      zcc8a
      ccc8a
      oe
      oHc8a
      qoHcc8a
      qoHc8a
      qoHa
      qoHae
      qoe
      eccc8a
      oHcc8a
      zccc8a
      ccca
      zcca
      cccHca
      zccHca
      ccccHca
      zcccHca
      zccca
      cccca
      zam
      8am
      8ar
      8ae
      oHae
      oHam
      oHar
      qoHan
      qoHam
      qoHar
      qoHcca
      oHcca
      or
    

     lines   words     bytes file        
    ------ ------- --------- ------------
      7054    7054     43161 .wds
        34      34       192 .keys

  To avoid excessive words, I decided to replace all words containing any `?' by `???'.
  Here are the tables (as redone on 97-08-08):

    cat .wds \
      | sed -e '/?/s/^.*$/???/g' \
      | enum-word-pairs \
      | count-diword-freqs -v keyfile=.keys

    max word length = 11

    (key,key) word pair counts: 

                ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                            q                                               c   z                                                            
                                            o   q               e   o   z           c   z   c   c                                                   q        
                            z   c       o   H   o       q       c   H   c           c   c   c   c   z   c                               q   q   q   o   o    
                            c   c       H   c   H   q   o       c   c   c   c   z   c   c   c   c   c   c                   o   o   o   o   o   o   H   H    
                    T       c   c       c   c   c   o   H   q   c   c   c   c   c   H   H   H   H   c   c   z   8   8   8   H   H   H   H   H   H   c   c    
                    O   /   8   8   o   8   8   8   H   a   o   8   8   8   c   c   c   c   c   c   c   c   a   a   a   a   a   a   a   a   a   a   c   c   o
                    T   /   a   a   e   a   a   a   a   e   e   a   a   a   a   a   a   a   a   a   a   a   m   m   r   e   e   m   r   n   m   r   a   a   r
                ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    //            765   .   2   4   1   3  28  15   2   9  10   3   .   4   1   2   .   .   .   .   1   1  13  15   4   3   1   .   .   5  12   2  13   2   1
    zcc8a         204   9   5   5   5   3  13  16   5   3   8   .   1   .   2   3   .   .   1   .   .   .   .   .   .   2   1   .   .   2  11   2   4   1   .
    ccc8a         172  11   1   4   1   3   5   9   1   6   8   3   .   .   .   .   .   3   1   1   .   .   .   2   1   .   .   1   1   4   4   6   4   .   .
    oe            127  16  10   6   2   1   1   .   2   .   .   1   3   3   2   1   2   .   1   .   2   2   1   1   1   1   .   3   .   .   .   1   .   .   1
    oHc8a          83   3   4   1   1   4   2   8   2   1   1   2   .   1   .   .   .   1   .   .   .   1   .   .   .   .   1   1   .   .   .   1   .   1   .
    qoHcc8a       183   4   2   6   .   2   8  15   2   8   2   5   5   .   1   .   1   .   2   .   1   1   .   3   2   .   1   2   .   2   4   5   5   .   .
    qoHc8a        198   6   8   8   2   7  14  14   5   6   4   2   2   1   1   1   2   1   .   .   .   .   .   1   2   3   2   1   2   3   .   .   2   1   .
    qoHa           79  20   1   3   .   .   .   2   1   3   2   2   2   .   1   1   1   .   .   .   .   .   3   4   .   .   .   1   .   .   .   1   .   .   .
    qoHae         113   9   7   9   .   .   .   1   1   2   2   1   .   1   2   2   3   1   1   3   .   .   .   2   5   3   .   .   1   .   .   .   .   1   1
    qoe            81   9   6   9   3   1   .   .   .   1   2   .   .   2   2   .   .   .   .   1   2   1   1   .   .   .   .   2   1   .   .   .   .   1   1
    eccc8a         52   9   3   1   1   .   3   2   3   .   .   .   .   .   1   .   1   .   .   .   .   .   .   .   .   .   .   .   .   .   3   .   .   .   .
    oHcc8a         56   5   3   2   1   2   3   3   4   1   1   .   .   .   .   1   .   .   .   .   .   .   .   .   .   .   1   .   .   .   .   1   .   .   1
    zccc8a         36   .   .   1   1   .   5   3   .   1   3   1   .   .   .   .   .   .   .   .   .   1   .   1   .   .   .   .   .   .   1   1   2   .   .
    ccca           67   4   .   .   1   .   3   1   2   .   2   2   .   .   .   .   .   .   .   .   .   .   1   1   1   .   .   .   1   1   2   1   1   1   .
    zcca           69   3   1   1   2   .   3   4   3   .   .   1   .   .   .   .   .   .   .   .   .   .   .   1   1   .   .   .   .   1   4   1   1   .   .
    cccHca         50   4   1   .   .   .   1   .   4   2   .   1   1   .   .   1   .   1   .   .   .   .   .   1   .   .   .   1   1   2   1   .   2   .   1
    zccHca         37   .   1   .   .   1   .   3   1   .   .   .   .   .   .   .   1   .   .   .   .   .   1   .   .   .   .   .   .   1   .   .   1   .   .
    ccccHca        35   .   .   .   .   .   1   .   2   3   .   .   1   .   .   .   .   .   .   .   .   .   .   .   1   .   2   .   .   1   .   .   .   .   1
    zcccHca        31   .   .   .   1   .   2   2   1   3   1   .   .   .   1   1   .   .   .   .   .   .   .   1   .   1   1   .   .   .   .   1   1   .   .
    zccca          23   .   .   .   1   .   .   2   .   1   .   .   .   .   .   .   1   .   .   .   .   .   .   .   .   1   .   .   1   .   2   .   1   .   2
    cccca          31   .   .   .   .   1   .   .   .   .   .   1   .   .   .   1   .   .   .   1   .   .   .   1   .   1   .   .   .   .   2   .   .   .   .
    zam            31   3   2   2   .   .   .   .   .   .   1   .   .   1   1   2   .   1   .   1   .   .   .   1   .   1   .   .   .   .   .   .   .   .   .
    8am            73   8   3   2   2   2   .   .   .   .   .   .   .   .   5   3   1   2   2   1   .   .   .   1   .   .   1   2   1   .   .   .   .   .   .
    8ar            51   5   4   1   4   .   .   .   .   .   .   .   .   .   .   2   .   1   1   .   .   .   .   .   1   .   1   1   1   .   1   1   1   .   1
    8ae            50   8   4   2   1   .   .   1   .   .   1   2   .   1   .   .   1   .   .   .   .   .   .   2   1   1   .   .   .   .   .   .   .   .   2
    oHae           39   4   1   4   1   .   .   .   .   .   .   .   1   1   .   1   1   .   .   .   1   .   .   .   1   2   .   .   .   .   .   .   .   .   .
    oHam           37   3   3   1   1   2   .   .   .   .   .   .   1   .   1   .   2   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
    oHar           35   1   4   3   2   3   .   1   .   .   .   .   .   .   .   1   .   .   .   .   .   .   .   .   2   1   .   .   1   .   .   .   .   .   .
    qoHan          54   1   1   3   1   2   .   .   .   .   .   .   .   1   2   .   6   1   .   1   1   1   .   1   3   2   1   2   1   1   .   .   .   .   1
    qoHam          91   5   6   5   1   1   1   1   .   1   .   .   1   .   2   .   2   5   5   1   .   .   2   .   .   1   1   1   .   .   2   1   .   .   .
    qoHar          48   1   7   2   4   .   .   .   .   1   .   .   1   1   1   2   2   1   1   .   .   .   .   .   .   .   .   .   1   .   .   .   .   .   1
    qoHcca         81   3   3   3   1   1   2   6   1   3   .   1   2   .   .   .   .   .   .   .   .   .   .   2   1   1   1   .   .   1   2   .   1   4   1
    oHcca          34   1   .   .   1   .   .   .   .   1   2   1   .   .   .   .   1   .   .   .   .   .   .   1   .   .   .   .   .   .   1   1   1   .   .
    or             40   3   3   1   1   .   .   .   .   .   .   .   .   1   2   .   .   .   1   .   1   .   .   .   .   .   .   .   .   .   .   .   .   .   2
                ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    TOT          7054 765 204 172 127  83 183 198  79 113  81  52  56  36  67  69  50  37  35  31  23  31  31  73  51  50  39  37  35  54  91  48  81  34  40


  Incidentally:  Rene Zandberger just posted his guess at the names of the planets:
  
    EVA            Frogguy         JSA                  Tables above   freq 
    -------------  --------------  -------------------  -------------  ----
    okal           olpax           oljciix              oHae             97
    dolchsody      8oxctso89       cgoixcccsocgcy       8oecczo8a         0
    yfain          9ljaiv          cylgciiiu            aPan              0
    ytoaiin        9qpoaiiv        cyqjociiiiu          aHoam             0
    ofar,oeoldain  olja2,ocox8aiv  olgciis,ocoixcgciiiu oPar,ocoe8an    2,0
    opcholdy       oqjctox89       oqgccoixcgcy         oPccoe8a          0
    okain.am       olpaiv aig      oljciiiu ciiij       oHan aik       16 0
      
97-08-09 stolfi
===============

  I prepared a WWW page with the word pair tables above.
  
  I also prepared analogous tables for the English and Portuguese texts:
  
    cat engl.txt | sed -e 's@$@ //@g' | tr ' ' '\012' | egrep '.' > engl2.wds
    
    cat engl2-keys.dic
    
      //
      the
      a
      an
      and
      of
      in
      on
      at
      to
      for
      with
      as
      up
      but
      i
      you
      he
      she
      it
      is
      was
      had
      be
      my
      his
      her
      him
      me
      that
      mrs
      john
      cynthia
      inglethorp

    cat engl2.wds | tr '[A-Z]' '[a-z]' | head -4661 \
      | enum-word-pairs \
      | count-diword-freqs -v keyfile=engl2-keys.dic \
      > .baz
       
    cat port.txt | sed -e 's@$@ //@g' | tr ' ' '\012' | egrep '.' > port2.wds
     
    cat port2-keys.dic
    
      //
      a
      da
      na
      o
      do
      no
      ao
      as
      das
      os
      dos
      um
      uma
      cada
      de
      em
      por
      para
      e
      ou
      como
      que
      é
      ser
      não
      são
      aresta
      face
      arestas
      faces
      complexo
      vértices
      celular

    cat port2.wds | sed -e 's/^x$/???/g' | head -7000 \
      | enum-word-pairs \
      | count-diword-freqs -v keyfile=port2-keys.dic \
      > .baz 
      
  The results were posted on my Voynich WWW page.
  
  Decided to recompute the tables, adding the left and right probabilities. 
  
     cat .wds \
      | sed -e '/?/s/^.*$/???/g' \
      | enum-word-pairs \
      | count-diword-freqs -v keyfile=.keys \
      > .baz

97-08-10 stolfi
===============

  Fiddled with the right-probability table, obtaining the following clustering
  
                -----  -- -- -- -- -- -- -- -- --  -- -- --  -- -- -- -- -- --  --  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
                                       c  z                      q                                                               
                                 c  z  c  c         z         q  o  q     e  o                                                   
                           c  z  c  c  c  c         c  c  z   o  H  o  o  c  H       q              q        q        q        o 
                           c  c  c  c  c  c  c  z   c  c  c   H  c  H  H  c  c   q   o           o  o     o  o     o  o        H 
                   T       c  c  H  H  H  H  c  c   c  c  c   c  c  c  c  c  c   o   H  q     8  H  H  8  H  H  8  H  H  z     c 
                   O    /  c  c  c  c  c  c  c  c   8  8  8   8  8  c  8  8  8   H   a  o  o  a  a  a  a  a  a  a  a  a  a  o  c 
                   T    /  a  a  a  a  a  a  a  a   a  a  a   a  a  a  a  a  a   a   n  e  e  r  r  r  e  e  e  m  m  m  m  r  a 
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    //            20 ||  |  |  |  |  |  |  |  |  ||  |     || 1| 3| 1|  |     ||  ||  | 1   |        |       1| 1     1  1|  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    cccca         25 ||  |  |  |  |  |  | 3|  | 3||  |     ||  |  |  | 3| 3   ||  ||  |     |        | 3      | 3     6   |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    zccca         52 ||  |  |  | 4|  |  |  |  |  ||  |     || 8|  | 4|  |     ||  ||  |    4|    4   | 4     4|       8   | 8|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    cccHca        49 || 7|  |  |  | 1|  |  |  | 1||  |    1||  | 1| 3|  | 1  1|| 7|| 3|     |    1   |       3| 1  1  1   | 1|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    zccHca        27 ||  |  |  | 2|  |  |  |  |  ||  |    2|| 8|  | 2| 2|     || 2|| 2|     |        |        |          2|  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    ccccHca       34 ||  |  |  |  |  |  |  |  |  ||  |     ||  | 2|  |  |    2|| 5|| 2|     | 2      |    5  8|           | 2|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    zcccHca       54 ||  |  |  |  |  |  |  | 3| 3||  |     || 6| 6| 3|  |     || 3||  | 3  3|       3| 3  3  9| 3         |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    ccca          37 || 5|  |  |  |  |  |  |  |  ||  |     || 1| 4| 1|  | 2   || 2|| 1| 2  1| 1  1  1|        | 1     2  1|  | 1|
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    zcca          39 || 4|  |  |  |  |  |  |  |  ||  | 1  1|| 5| 4| 1|  | 1   || 4|| 1|    2| 1     1|        | 1     5   |  |  |
                =====++==+==+==+==+==+==+==+==+==++==+=====++==+==+==+==+=====++==++==+=====+========+========+===========+==+==+
    zccc8a        58 ||  | 2|  |  |  |  |  |  |  ||  | 2   || 8|13| 5|  | 2   ||  ||  | 8  2|       2|       2| 2     2   |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    ccc8a         46 || 6|  |  |  | 1|  |  |  |  ||  | 2   || 5| 2| 2| 1| 1   ||  || 2| 4   |       3|       3| 1     2   |  |  |
    zcc8a         49 || 4|  |  |  |  |  |  |  | 1||  | 2  2|| 7| 6| 1| 1|     || 2||  | 3  2|        |       1|       5   |  |  |
                =====++==+==+==+==+==+==+==+==+==++==+=====++==+==+==+==+=====++==++==+=====+========+========+===========+==+==+
    qoHc8a        51 || 3|  |  | 1|  |  |  |  |  ||  | 4  4|| 7| 7| 1| 3| 1  1|| 2|| 1| 2  1| 1  1   | 1  1  3|           |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    qoHcc8a       48 || 2|  |  |  |  | 1|  |  |  ||  | 3  1|| 8| 4| 2| 1| 2  2|| 1|| 1| 1   | 1     2|       4| 1  1  2   |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    qoHcca        49 || 3|  |  |  |  |  |  |  |  ||  | 3  3|| 7| 2| 1| 1| 1  2|| 1|| 1|    1| 1      | 1  1  3| 2     2   | 1| 4|
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    oHc8a         43 || 3| 1|  |  | 1|  |  |  |  || 1| 1  4|| 9| 2|  | 4| 2   || 2||  | 1  1|       1|    1  1|    1      |  | 1|
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    eccc8a        51 ||17|  |  | 1|  |  |  | 1|  ||  | 1  5|| 3| 5|  |  |     || 5||  |    1|        |        |       5   |  |  |
    oHcc8a        51 || 8|  |  |  |  |  |  |  | 1||  | 3  5|| 5| 5|  | 3|     || 7||  | 1  1|       1|    1  1|           | 1|  |
                =====++==+==+==+==+==+==+==+==+==++==+=====++==+==+==+==+=====++==++==+=====+========+========+===========+==+==+
    qoHa          60 ||25|  |  | 1|  |  |  | 1| 1||  | 3  1|| 2|  |  |  | 2  2|| 1||  | 2   |       1|       3| 5  1     3|  |  |
                =====++==+==+==+==+==+==+==+==+==++==+=====++==+==+==+==+=====++==++==+=====+========+========+===========+==+==+
    qoHan         61 || 1| 1| 1|11| 1|  | 1| 3|  || 1| 5  1||  |  |  | 3|     ||  || 1|    1| 5  1   | 3  1   | 1  3      | 1|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    qoe           55 ||11| 1| 2|  |  |  | 1| 2|  || 2|11  7||  |  |  | 1|     ||  ||  | 2  3|    1   |       1|    2     1| 1| 1|
    oe            50 ||12| 1| 1| 1|  |  |  | 1|  || 2| 4  7||  |  |  |  |    2|| 1||  |    1|        |        |    2      |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    8ar           50 || 9|  |  |  | 1| 1|  |  | 3||  | 1  7||  |  | 1|  |     ||  ||  |    7| 1  1  1|    1   |    1  1   | 1|  |
    oHar          54 || 2|  |  |  |  |  |  |  | 2||  | 8 11|| 2|  |  | 8|     ||  ||  |    5| 5  2   | 2      |           |  |  |
    qoHar         54 || 2|  |  | 4| 2| 2|  | 2| 4|| 2| 4 14||  |  |  |  |    2||  ||  |    8|    2   |       2|           | 2|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    8ae           53 ||15|  |  | 1|  |  |  |  |  || 1| 3  7|| 1|  |  |  | 3   ||  ||  | 1  1| 1      | 1      | 3         | 3|  |
    oHae          46 ||10|  | 2| 2|  |  |  |  | 2|| 2|10  2||  |  |  |  |    2||  ||  |    2| 2      | 5      |           |  |  |
    qoHae         51 || 7|  |  | 2|  |  | 2| 1| 1||  | 7  6||  |  |  |  |     ||  ||  | 1   | 4      | 2     1| 1         |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    8am           49 ||10|  |  | 1| 2| 2| 1| 6| 4||  | 2  4||  |  |  | 2|     ||  ||  |    2|    1   |    1   | 1  2      |  |  |
    oHam          37 || 8|  |  | 5|  |  |  | 2|  ||  | 2  8||  |  |  | 5|    2||  ||  |    2|        |        |           |  |  |
    qoHam         49 || 5|  |  | 2| 5| 5| 1| 2|  ||  | 5  6|| 1| 1|  | 1|    1||  ||  |    1|       1| 1  1  1|    1  2  2|  |  |
    zam           51 || 9|  |  |  | 3|  | 3| 3| 6|| 3| 6  6||  |  |  |  |     ||  ||  | 3   |        | 3      | 3         |  |  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    or            37 || 7|  | 2|  |  | 2|  | 4|  || 2| 2  7||  |  |  |  |     ||  ||  |    2|        |        |           | 4|  |
                -----++--+--+--+--+--+--+--+--+--++--+-----++--+--+--+--+-----++--++--+-----+--------+--------+-----------+--+--+
    oHcca         32 || 2|  |  | 2|  |  |  |  |  ||  |     ||  |  | 2|  | 2   ||  ||  | 5  2|       2|       2| 2     2   |  |  |
                =====++==+==+==+==+==+==+==+==+==++==+=====++==+==+==+==+=====++==++==+=====+========+========+===========+==+==+
    TOT           44 ||10|  |  |  |  |  |  |  |  ||  | 2  2|| 2| 2| 1| 1|     || 1||  | 1  1|        |       1| 1     1   |  |  |
  

  It seems that the ending of one word determines somewhat the beginning of the next one.  Here is the same table, with independent clustering of
  rows and columns:
  
                -----  -- -- -- -- -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
                                       c  z                     q                                                             
                                 c  z  c  c        z         q  o        q                                   o        e       
                           c  z  c  c  c  c        c  c  z   o  H        o  q  q  q  q                    o  H  o     c       
                           c  c  c  c  c  c  c  z  c  c  c   H  c  q     H  o  o  o  o           o  o  o  H  c  H     c       
                   T       c  c  H  H  H  H  c  c  c  c  c   c  c  o  q  c  H  H  H  H  8  8  8  H  H  H  c  c  c  z  c       
                   O    /  c  c  c  c  c  c  c  c  8  8  8   8  8  H  o  c  a  a  a  a  a  a  a  a  a  a  8  8  c  a  8  o  o 
                   T    /  a  a  a  a  a  a  a  a  a  a  a   a  a  a  e  a  e  m  n  r  r  m  e  e  r  m  a  a  a  m  a  e  r 
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    //            20 ||  |  |  |  |  |  |  |  |  |  |     || 1| 3|  | 1| 1| 1  1      |    1   |        |     |  | 1|  |  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    oHcca         32 || 2|  |  | 2|  |  |  |  |  |  |     ||  |  |  | 5| 2| 2  2     2|    2   |        |     |  |  | 2| 2|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    zccca         52 ||  |  |  | 4|  |  |  |  |  |  |     || 8|  |  |  | 4| 4  8      |       4|    4   |     |  |  |  | 4| 8|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    zccHca        27 ||  |  |  | 2|  |  |  |  |  |  |    2|| 8|  | 2|  | 2|       2   |        |        | 2   |  | 2|  |  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    zcccHca       54 ||  |  |  |  |  |  |  | 3| 3|  |     || 6| 6| 3| 3| 3| 9        3|    3  3| 3      |     |  |  |  | 3|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    cccca         25 ||  |  |  |  |  |  | 3|  | 3|  |     ||  |  |  |  |  |    6      |    3  3|        | 3   |  |  | 3|  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    cccHca        49 || 7|  |  |  | 1|  |  |  | 1|  |    1||  | 1| 7|  | 3| 3  1  3   |    1   |    1  1|    1|  |  | 1|  | 1|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    ccccHca       34 ||  |  |  |  |  |  |  |  |  |  |     ||  | 2| 5|  |  | 8     2   | 2      | 5      |    2|  |  |  |  | 2|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    ccca          37 || 5|  |  |  |  |  |  |  |  |  |     || 1| 4| 2| 2| 1|    2  1  1| 1  1   |    1   |     | 1| 1| 2| 1|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    zcca          39 || 4|  |  |  |  |  |  |  |  |  | 1  1|| 5| 4| 4|  | 1|    5  1  1| 1  1   |        |     |  |  | 1| 2|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    zccc8a        58 ||  | 2|  |  |  |  |  |  |  |  | 2   || 8|13|  | 8| 5| 2  2     2|    2   |        |     |  |  | 2| 2|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    ccc8a         46 || 6|  |  |  | 1|  |  |  |  |  | 2   || 5| 2|  | 4| 2| 3  2  2  3|    1   |        | 1   |  |  | 1|  |  |
    zcc8a         49 || 4|  |  |  |  |  |  |  | 1|  | 2  2|| 7| 6| 2| 3| 1| 1  5      |        |        | 1   |  |  |  | 2|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    qoHc8a        51 || 3|  |  | 1|  |  |  |  |  |  | 4  4|| 7| 7| 2| 2| 1| 3     1   | 1     1| 1  1   | 3  1|  |  | 1| 1|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    qoHcc8a       48 || 2|  |  |  |  | 1|  |  |  |  | 3  1|| 8| 4| 1| 1| 2| 4  2  1  2| 1  1   |       1| 1  2|  |  | 2|  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    qoHcca        49 || 3|  |  |  |  |  |  |  |  |  | 3  3|| 7| 2| 1|  | 1| 3  2  1   | 1  2  1| 1      | 1  2| 4|  | 1| 1| 1|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    oHc8a         43 || 3| 1|  |  | 1|  |  |  |  | 1| 1  4|| 9| 2| 2| 1|  | 1        1|        | 1     1| 4   | 1|  | 2| 1|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    eccc8a        51 ||17|  |  | 1|  |  |  | 1|  |  | 1  5|| 3| 5| 5|  |  |    5      |        |        |     |  |  |  | 1|  |
    oHcc8a        51 || 8|  |  |  |  |  |  |  | 1|  | 3  5|| 5| 5| 7| 1|  | 1        1|        | 1      | 3   |  |  |  | 1| 1|
                =====++==+==+==+==+==+==+==+==+==+==+=====++==+==+==+==+==+===========+========+========+=====+==+==+==+==+==+
    qoHa          60 ||25|  |  | 1|  |  |  | 1| 1|  | 3  1|| 2|  | 1| 2|  | 3        1|    5   |       1|    2|  | 3| 2|  |  |
                =====++==+==+==+==+==+==+==+==+==+==+=====++==+==+==+==+==+===========+========+========+=====+==+==+==+==+==+
    qoHan         61 || 1| 1| 1|11| 1|  | 1| 3|  | 1| 5  1||  |  |  |  |  |       1   | 5  1  3| 1  1  3| 3   |  |  |  | 1| 1|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    qoe           55 ||11| 1| 2|  |  |  | 1| 2|  | 2|11  7||  |  |  | 2|  | 1         |        |    1  2| 1   | 1| 1|  | 3| 1|
    oe            50 ||12| 1| 1| 1|  |  |  | 1|  | 2| 4  7||  |  | 1|  |  |           |        |       2|    2|  |  |  | 1|  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    8ar           50 || 9|  |  |  | 1| 1|  |  | 3|  | 1  7||  |  |  |  | 1|    1     1| 1      | 1  1  1|     |  |  |  | 7| 1|
    oHar          54 || 2|  |  |  |  |  |  |  | 2|  | 8 11|| 2|  |  |  |  |           | 5     2|    2   | 8   |  |  |  | 5|  |
    qoHar         54 || 2|  |  | 4| 2| 2|  | 2| 4| 2| 4 14||  |  |  |  |  | 2         |        |    2   |    2|  |  |  | 8| 2|
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    8ae           53 ||15|  |  | 1|  |  |  |  |  | 1| 3  7|| 1|  |  | 1|  |           | 1  3  1|        |     |  |  | 3| 1| 3|
    oHae          46 ||10|  | 2| 2|  |  |  |  | 2| 2|10  2||  |  |  |  |  |           | 2     5|        |    2|  |  |  | 2|  |
    qoHae         51 || 7|  |  | 2|  |  | 2| 1| 1|  | 7  6||  |  |  | 1|  | 1         | 4  1  2|        |     |  |  |  |  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    8am           49 ||10|  |  | 1| 2| 2| 1| 6| 4|  | 2  4||  |  |  |  |  |           |    1   | 1  1  2| 2   |  |  |  | 2|  |
    oHam          37 || 8|  |  | 5|  |  |  | 2|  |  | 2  8||  |  |  |  |  |           |        |        | 5  2|  |  |  | 2|  |
    qoHam         49 || 5|  |  | 2| 5| 5| 1| 2|  |  | 5  6|| 1| 1|  |  |  | 1  2     1|       1| 1     1| 1  1|  | 2|  | 1|  |
    zam           51 || 9|  |  |  | 3|  | 3| 3| 6| 3| 6  6||  |  |  | 3|  |           |    3  3|        |     |  |  |  |  |  |
                -----++--+--+--+--+--+--+--+--+--+--+-----++--+--+--+--+--+-----------+--------+--------+-----+--+--+--+--+--+
    or            37 || 7|  | 2|  |  | 2|  | 4|  | 2| 2  7||  |  |  |  |  |           |        |        |     |  |  |  | 2| 4|
                =====++==+==+==+==+==+==+==+==+==+==+=====++==+==+==+==+==+===========+========+========+=====+==+==+==+==+==+
    TOT           44 ||10|  |  |  |  |  |  |  |  |  | 2  2|| 2| 2| 1| 1| 1| 1  1      |    1   |        | 1   |  |  |  | 1|  |
  

  Again, with more columns and rows:

            row probabilities
            ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
                        c                                z                                         o                                q                  
                     c  c                    c  z     z  c        e  e                 o           e                                o  q           q   
                     c  c  c        c  z  o  c  c     c  c  z     c  z                 H  o        c                          q  q  H  o  q  q  q  o  q
                     c  c  c  c  z  c  c  H  c  c     c  c  c     c  c     o  o  o  o  c  H        c                 H     q  o  o  c  H  o  o  o  H  o
               T     H  H  c  c  c  c  c  c  c  c  z  H  H  c  z  c  c  o  H  H  H  H  c  c     o  c        8  8  8  c  q  o  H  H  c  c  H  H  H  c  H
               O  /  c  c  c  c  c  8  8  8  8  8  a  c  c  c  o  8  8  H  a  a  a  c  8  c  o  e  8  o  8  a  a  a  8  o  H  a  a  8  8  a  a  c  c  o
               T  /  a  a  a  a  a  a  a  a  a  a  m  a  a  a  e  a  a  a  e  m  r  a  a  a  e  a  a  r  a  e  m  r  a  e  a  e  m  a  a  n  r  a  a  e
            ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    //        27  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  1  .  1  3  3  1  .  .  .  1  .
                                                                                                                                                       
    8a        71 31  2  2  .  .  2  .  .  2  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  8  .  .  2  .  5  .  2  2  .  2  .  .  .
    oHa       55 27  .  .  .  .  3  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  3  .  3  .  .  3  .  .  .  .  .  .  3  .  .
    oea       99 78  .  .  .  .  8  .  .  .  4  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .
                                                                                                                                                       
    qoHa      65 25  1  .  .  1  1  3  1  .  .  .  3  .  .  .  .  2  .  .  .  2  .  .  2  .  .  .  .  .  .  .  6  .  .  2  1  3  1  .  2  .  1  .  .  1
    Hc8a      63  7  .  .  .  .  .  7  .  7  .  .  .  .  .  .  .  3  .  .  .  3  .  3  .  .  3  .  .  .  .  3  .  7  .  .  .  .  .  .  7  .  .  .  3  .
    oeccc8a   69 26  .  4  .  .  .  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .  .  .  4  .  .  4  .  4  8  8  .  .  .  .  .  .
    ezcc8a    57 14  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  4  .  4  9  .  4  9  .  .  .  .  .  4
    eccc8a    51 17  1  .  .  1  .  1  5  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  5  .  5  5  3  .  .  .  .  .
    cccc8a    68 10  .  .  5  .  .  .  .  5  .  .  .  .  .  .  .  5  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  5  . 10 10 15  .  .  .  .  .  .
    zccc8a    63  .  .  .  2  .  .  2  .  .  .  .  .  .  .  .  .  2  .  .  .  2  .  .  .  .  2  .  .  .  .  .  2  .  .  8  .  2  5 13  8  .  2  .  5  .
                                                                                                                                                       
    cccHca    57  7  .  .  .  .  1  .  1  .  .  .  .  1  .  .  .  1  .  .  .  1  1  1  1  .  .  .  .  1  .  .  1  .  .  .  7  3  7  1  .  3  .  .  3  .
    zccHca    35  .  2  .  .  .  .  .  2  2  .  .  5  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  8  2  .  5  2  .
    ccccHca   51  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  2  .  5  2  .  .  2  .  .  .  2  2  .  .  .  2  .  .  5  8  5  2  .  2  .  .  .  .
    zcccHca   64  .  .  .  .  3  3  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  3  .  .  .  .  3  3  .  .  3  3  9  9  6  6  .  3  .  3  .
    oHca      47  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  4  .  .  .  .  .  .  .  .  4  4  .  4  .  .  .  .  .  .  9  4  .  4  .  .
    qoHca     72  4  2  .  .  .  .  .  .  4  .  .  2  2  .  .  2  .  6  .  .  .  2  .  .  2  2  .  2  2  .  .  2  .  2  2  6  6  4  .  4  2  .  .  2  .
    oHcca     49  2  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  5  .  2  .  .  .  .  2  2  .  .  .  .  2  .  .  5  .  2  5  .  .  .  2  .  2  2
    qoHcca    58  3  .  .  .  .  .  3  3  1  .  .  .  .  .  .  .  1  1  1  1  2  .  .  2  4  1  .  .  1  .  1  2  1  .  .  1  3  3  2  7  1  .  2  1  .
    cccca     35  .  .  .  .  .  3  .  .  3  .  .  .  .  3  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  3  3  .  .  .  .  . 12  .  .  .  .  3  .  .
    zccca     60  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  4  .  .  8  .  4  .  .  .  .  .  4 17  .  8  .  .  .  4  .
    ccca      43  5  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  2  .  .  .  .  1  .  .  1  1  .  .  .  .  .  1  1  .  2  2  .  8  4  1  1  1  .  1  .
    zcca      50  4  .  .  .  .  .  1  1  .  .  .  .  .  .  .  .  1  1  .  .  .  .  .  .  .  2  .  .  .  .  .  1  1  .  .  4  . 11  4  5  1  1  2  1  1
                                                                                                                                                       
    oHc8a     54  3  .  .  1  .  .  1  4  4  .  1  .  1  .  .  1  2  1  .  1  1  .  1  .  1  1  1  1  .  1  .  .  .  3  1  2  1  .  2  9  .  1  .  .  .
    qoHc8a    57  3  1  .  .  .  .  4  4  3  .  .  .  .  .  .  .  1  .  .  1  1  1  .  1  .  1  .  .  .  .  1  1  1  .  2  2  3  .  7  7  1  .  1  1  .
    oHcc8a    53  8  .  .  .  .  1  3  5  3  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  1  .  .  1  .  .  .  .  .  1  7  1  .  5  5  .  1  1  .  .
    qoHcc8a   57  2  .  1  .  .  .  3  1  1  .  .  .  .  .  .  .  2  .  1  .  1  .  .  2  .  .  1  .  .  1  .  1  1  .  1  1  4  3  4  8  1  2  1  2  1
    ccc8a     57  6  .  .  .  .  .  2  .  1  .  .  .  1  .  .  .  1  .  1  .  .  .  .  .  .  .  .  .  .  1  .  1  .  .  4  .  3  6  2  5  2  3  1  2  .
    zcc8a     58  4  .  .  .  .  1  2  2  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  3  2  1  9  6  7  .  .  .  1  1
                                                                                                                                                       
    8ae       57 15  1  .  .  .  .  3  7  .  .  1  .  .  .  .  .  3  .  .  .  .  .  .  .  .  1  .  3  3  .  1  3  1  .  1  .  .  .  .  1  .  .  .  .  .
    oHae      56 10  2  .  .  .  2 10  2  .  .  2  .  .  .  2  .  .  .  .  .  .  .  .  2  .  2  .  .  .  2  5  2  2  2  .  .  .  2  .  .  .  .  .  .  .
    qoHae     58  7  2  .  .  1  1  7  6  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  1  2  3  4  .  1  .  1  .  .  .  .  .  .  .  .
                                                                                                                                                       
    oe        56 12  1  .  1  1  .  4  7  .  .  2  .  .  .  1  .  .  .  .  .  2  .  .  2  .  1  .  .  .  .  .  2  .  2  .  1  .  .  .  .  .  .  .  .  .
    qoHoe     42  .  .  .  .  4  .  . 14  4  4  4  .  .  .  .  .  .  .  .  .  4  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    qoe       61 11  .  .  1  2  . 11  7  1  4  2  1  .  1  2  .  .  .  .  .  2  1  .  .  1  3  .  .  1  1  .  .  .  .  2  .  1  .  .  .  .  .  .  .  .
    zoe       59  .  .  .  3  7  3  7 11  .  3  3  .  .  .  3  .  .  3  .  .  .  .  .  .  .  3  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    8am       58  8  .  1  .  7  2  3  3  2  .  .  .  1  .  .  .  .  .  .  .  2  .  .  .  .  4  .  .  .  2  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .
    oHam      43  3  3  .  .  1  2  1  6  2  .  .  .  .  2  .  1  .  .  .  2  3  2  .  1  .  3  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    qoHam     54  2  3  4  .  1  1  7  5  .  .  .  .  2  .  .  .  .  .  .  1  2  .  .  1  .  2  .  .  .  1  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .
    zam       53  5  .  1  .  1  3  5  7  .  .  1  .  1  1  1  .  .  .  1  .  1  .  .  .  .  3  3  .  .  .  1  1  .  .  1  .  .  .  .  .  1  .  .  .  .
    qoHan     62  1 11  .  1  3  .  5  1  3  1  1  .  1  1  1  .  .  .  .  1  3  1  .  .  .  1  .  .  1  .  3  1  5  .  .  .  .  .  .  .  1  .  .  .  .
    8ar       56  9  .  1  .  .  3  1  7  .  .  .  1  1  .  .  .  .  .  .  1  1  1  1  .  .  7  .  1  1  .  .  .  1  .  .  .  .  1  .  .  .  1  .  1  .
    oHar      62  2  .  .  .  .  2  8 11  8  2  .  .  .  .  .  .  .  .  .  .  2  2  2  .  .  5  .  .  .  .  2  .  5  .  .  .  .  .  .  2  .  .  .  .  .
    qoHar     60  2  4  2  .  2  4  4 14  .  .  2  .  2  .  .  .  .  .  .  .  .  2  .  2  .  8  .  2  2  2  .  .  .  .  .  .  2  .  .  .  .  .  2  .  .
    or        39  7  .  2  .  4  .  2  7  .  .  2  .  .  .  2  .  .  .  .  .  .  .  .  .  .  2  2  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
            ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT       50  5  .  .  .  .  .  2  3  1  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  1  .  .  .  .  .  1  .  .  1  1  1  3  2  3  .  .  .  1  .


            col probabilities
            ---- -- -- -- -- -- --  -- -- -- -- -- --  -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- -- -- -- -- --  --  --  -- -- -- -- -- -- -- -- -- -- --
                                  c                     z                                       o                                                   q         
                         c     c  c            z     z  c                           o           e                      e   e                  q     o  q      
                      c  c     c  c  c      z  c     c  c  z                  o     H  o        c                      c   z      q  q  q  q  o  q  H  o  q   
                      c  c  c  c  c  c      c  c  z  c  c  c         o  o  o  H  o  c  H        c                  H   c   c   q  o  o  o  o  H  o  c  H  o   
               T      c  H  c  c  H  c   z  c  H  c  c  H  c  z   o  H  H  H  c  H  c  c     o  c         8  8  8  c   c   c   o  H  H  H  H  c  H  c  c  H  q
               O  /   8  c  c  8  c  c   a  8  c  c  8  c  c  o   H  a  a  a  8  c  8  c  o  e  8  o   8  a  a  a  8   8   8   H  a  a  a  a  8  c  8  c  o  o
               T  /   a  a  a  a  a  a   m  a  a  a  a  a  a  e   a  e  m  r  a  a  a  a  e  a  a  r   a  e  m  r  a   a   a   a  e  m  n  r  a  a  a  a  e  e
            ---- --  -- -- -- -- -- --  -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- -- -- -- -- --  -- -- -- -- --  --  --  -- -- -- -- -- -- -- -- -- -- --
    //         5  .   2  .  1  .  .  3  44  .  .  2 11  .  4 59   .  2  .  .  3  .  .  5  .  .  4  2   .  5 19  7  .   5   9   2  7 11  9  4  7 13 15 16 14 12

    8a         0  1   .  1  .  .  2  .   .  .  .  1  .  .  .  .   .  .  .  .  1  .  .  .  .  .  .  .   .  .  2  .  .   1   .   .  1  .  .  2  .  .  .  .  .  1
    oHa        0  .   .  .  .  .  .  .   .  .  .  1  .  .  .  .   .  .  .  .  1  .  .  .  .  .  .  .   2  1  .  1  .   .   .   1  .  .  .  .  .  2  .  .  .  .
    oea        0  2   .  .  .  5  .  .   .  .  .  2  .  .  .  .   .  2  .  .  .  .  .  .  .  .  .  .   .  .  .  .  .   .   .   .  .  .  .  .  .  .  .  .  .  .

    qoHa       1  2   1  1  1  .  .  .   5  .  .  1  .  .  .  .   .  .  2  .  .  .  3  .  .  .  .  .   .  .  4  .  .   3   .   1  2  .  .  2  1  .  .  .  4  2
    Hc8a       0  .   1  .  .  .  .  .   .  .  .  .  .  .  .  .   .  .  1  .  2  4  .  .  .  .  .  .   .  1  .  3  .   1   .   .  .  .  .  .  1  .  .  1  .  .
    ezcc8a     0  .   .  .  .  .  .  .   .  .  .  .  .  .  .  .   .  .  .  .  .  .  .  .  .  .  .  .   .  1  .  1  .   .   .   2  .  .  .  .  .  .  1  .  4  1
    eccc8a     0  1   .  1  1  .  .  .   .  1  .  .  .  .  .  .   .  .  .  .  .  .  .  .  .  .  .  .   .  .  .  .  .   .   .   3  .  1  .  .  1  .  1  .  .  .
    oeccc8a    0  .   .  .  .  .  2  .   .  .  .  .  .  .  .  .   .  .  .  .  1  4  .  .  .  .  .  .   .  .  .  .  .   .   .   .  .  .  .  .  .  .  1  .  .  1
    cccc8a     0  .   .  .  .  .  .  3   .  .  .  .  .  .  .  .   .  .  .  .  1  .  .  .  .  .  .  .   .  .  .  .  .   1   .   .  1  .  .  .  .  .  1  .  .  1
    zccc8a     0  .   .  .  .  .  .  3   .  .  .  .  .  .  .  .   .  .  1  .  .  .  .  .  .  .  .  .   .  .  .  .  .   1   .   .  .  .  .  2  1  .  2  2  .  3

    ccccHca    0  .   .  .  .  .  .  .   .  .  .  .  .  .  .  3   .  5  1  .  .  .  1  .  .  .  4  2   .  .  .  1  .   .   4   2  2  .  1  .  .  .  .  .  .  .
    zcccHca    0  .   .  .  1  .  .  .   .  .  .  1  .  .  .  .   .  2  .  .  .  .  .  .  .  .  .  .   .  1  .  .  .   .   .   1  2  1  .  2  1  .  1  1  .  1
    cccHca     0  .   .  .  .  .  .  .   .  .  2  1  .  .  .  .   .  .  1  2  .  4  1  .  .  .  .  2   .  .  .  .  .   1   .   5  1  1  3  .  .  .  .  2  .  .
    zccHca     0  .   .  1  .  .  .  .   3  .  .  .  .  .  .  .   .  .  .  .  1  .  .  .  .  .  .  .   .  .  .  .  .   .   .   1  .  .  1  .  1  4  .  1  .  .
    oHca       0  .   .  .  .  .  .  .   .  .  .  .  .  .  .  .   .  .  1  .  .  .  .  .  .  .  .  .   2  1  .  1  .   1   .   .  .  .  1  .  1  2  .  .  .  .
    qoHca      0  .   .  1  .  .  .  .   1  .  2  .  .  .  .  3   .  .  .  2  2  .  .  2  .  .  4  2   .  .  .  .  3   .  14   3  2  .  1  .  1  .  .  1  .  1
    oHcca      0  .   .  1  .  .  .  .   .  .  .  .  .  .  .  .   7  .  1  .  .  .  .  .  .  4  .  .   .  .  .  .  .   1   .   .  .  .  .  2  .  .  .  1  4  2
    qoHcca     1  .   1  .  .  .  .  .   .  1  .  .  .  .  .  .   3  2  2  .  1  .  3 11  .  .  .  2   .  1  1  1  .   1   4   1  2  1  1  .  3  4  1  1  .  .
    cccca      0  .   .  .  .  .  .  .   .  .  .  1  .  3  .  .   .  .  .  .  1  .  .  .  .  .  .  .   .  1  .  .  .   1   .   .  .  1  .  .  .  2  .  .  .  .
    zccca      0  .   .  1  .  .  .  .   .  .  .  .  .  .  .  .   .  .  .  2  .  .  .  .  .  .  .  4   .  1  .  .  .   .   .   .  .  1  .  .  1  .  .  1  .  .
    ccca       0  .   .  .  .  .  .  .   1  .  .  .  .  .  .  .   .  .  .  2  .  .  .  2  .  .  .  .   .  .  .  1  .   3   .   2  .  2  1  2  .  .  1  1  .  2
    zcca       0  .   .  .  .  .  .  .   .  .  .  .  .  .  .  .   .  .  .  .  .  .  .  .  1  .  .  .   .  .  .  1  .   1   4   3  .  3  1  2  2  4  1  1  4  .

    oHc8a      1  .   .  .  .  .  .  3   .  1  2  .  2  .  .  3   .  2  1  .  4  4  .  2  .  4  4  .   2  .  .  . 11   3   4   2  .  .  .  2  4  .  1  .  .  1
    qoHc8a     3  .   4  3  1  .  .  .   .  3  2  1  2  .  .  .   3  5  2  5  8  4  3  2  1  .  4  .   2  5  1  3  .   3   4   6  5  .  5  .  7  6  7  2  4  4
    oHcc8a     0  .   1  .  .  .  .  .   .  1  .  1  .  .  .  .   .  2  .  .  2  .  .  .  .  .  .  2   .  .  .  .  .   .   .   5  .  .  .  2  1  2  1  .  .  1
    qoHcc8a    2  .   3  1  1  5  5  3   .  .  .  .  .  .  4  .   7  2  2  .  2  4  8  .  .  8  .  .   5  .  2  3  3   9   4   2  7  3  3 10  7  4  4  6  9  2
    ccc8a      2  1   2  .  .  .  2  .   .  .  8  .  .  3  .  3   7  .  1  2  3  .  .  .  .  4  .  .   5  .  1  1  .   5   4   1  5  5  7 12  4  6  2  4  4  9
    zcc8a      3  1   2  .  2  .  2  .   .  2  .  4  .  .  .  3   .  2  1  .  3  .  1  2  3  .  .  .   5  3  .  .  .   .   9   6  2  9  3  4  8  .  7  4 14  9

    8ae        0  1   1  1  .  .  .  .   .  1  .  .  2  .  .  .   .  .  .  .  .  .  .  .  .  .  8  4   .  1  1  1  .   3   .   .  .  .  .  .  .  .  .  .  .  1
    oHae       0  .   2  1  .  .  .  .   .  .  .  1  2  .  4  .   .  .  .  .  .  .  1  .  .  .  .  .   2  3  .  1  3   .   .   .  .  .  .  .  .  .  .  .  .  .
    qoHae      1  1   5  5  2  .  2  .   .  3  2  2  2  9  .  .   .  .  .  2  .  .  .  2  .  4  8  2   5  5  3  9  .   1   .   1  1  .  .  .  .  .  .  .  .  2

    oe         1  2   3  3  2  5  2  6   1  4  .  1  8  .  8  3   .  .  3  .  1  .  5  .  1  4  .  2   .  1  2  1 11   1   .   2  .  .  .  2  .  .  .  .  .  .
    qoHoe      0  .   .  .  1  5  .  .   .  1  .  .  2  .  .  .   .  .  1  .  1  .  1  .  .  .  .  .   .  .  .  .  .   .   .   .  .  .  .  .  .  .  .  .  .  .
    qoe        1  1   5  .  2 21  .  3   1  2  .  .  5  3  8  .   .  .  2  2  1  .  .  2  2  .  .  2   2  .  .  .  .   .   .   .  .  .  .  .  .  .  .  .  .  2
    zoe        0  .   1  .  2  5  .  3   .  1  .  1  2  .  4  .   .  .  .  .  .  .  .  .  .  .  .  2   .  .  .  .  .   .   4   .  .  .  .  .  .  .  .  .  .  .
    8am        1  1   2  1 11  5  5  .   1  1  5  4  .  3  .  .   3  2  3  2  3  4  .  .  3  4  .  .   8  .  3  .  .   .   .   .  .  .  .  .  .  .  .  .  .  .
    oHam       0  .   .  5  1  .  .  .   .  2  .  2  .  6  .  3   .  5  3  5  2  .  1  .  2  8  .  .   .  .  .  .  .   .   .   .  .  .  .  .  .  .  .  .  .  .
    qoHam      3  .   8 13  4 10 28  .   3  5 13  5  2  6  .  3   3  7  7  .  2  9  5  .  4  .  8  .   8  3  .  3  .   .   .   .  .  1  .  2  .  .  .  1  .  .
    zam        0  .   1  .  1  .  2  .   .  1  2  2  2  3  4  .   3  .  1  .  .  .  .  .  1  8  .  .   .  1  .  .  .   .   .   .  .  .  1  .  .  .  .  .  .  1
    qoHan      0  .   1 11  2  5  .  3   .  .  2  .  2  3  4  .   .  2  2  2  2  .  .  .  .  .  .  2   .  3  .  5  .   .   .   .  .  .  1  .  .  .  .  .  .  .
    8ar        0  .   .  .  .  .  2  .   1  1  2  2  .  .  .  .   .  2  1  2  .  4  .  .  3  .  4  2   .  .  .  1  .   .   .   .  .  .  .  2  .  .  .  1  .  .
    oHar       0  .   1  .  .  5  .  .   .  1  .  1  .  .  .  .   .  .  1  2  3  4  .  .  1  .  .  .   .  1  .  3  .   .   .   .  .  .  .  .  .  .  .  .  .  .
    qoHar      0  .   1  3  1  .  2  .   .  3  2  2  2  .  .  .   .  .  .  2  .  .  1  .  3  .  4  2   2  .  .  .  .   .   .   .  .  .  .  .  .  2  .  .  .  .
    or         0  .   .  .  2  .  2  .   .  1  .  .  2  .  4  .   .  .  .  .  .  .  .  .  .  4  .  4   .  .  .  .  .   .   .   .  .  .  .  .  .  .  .  .  .  .
            ---- --  -- -- -- -- -- --  -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- -- -- -- -- --  -- -- -- -- --  --  --  -- -- -- -- -- -- -- -- -- -- --
    TOT       50 27  61 71 52 73 68 32  69 55 51 52 58 41 47 91  43 51 52 45 60 52 42 38 45 56 56 47  59 57 62 66 35  63  71  60 56 59 49 56 58 58 56 53 66 67

  Tried to recompute the table, collapsing the prefixes and suffixes into categories:
  
    --- collapse-words ------------------------
    #! /n/gnu/bin/sed -f
    s/^cc\(..\)$/K\1/g
    s/^zc\(..\)$/K\1/g
    s/^ccc\(..\)$/K\1/g
    s/^zcc\(..\)$/K\1/g
    s/^cccH\(..\)$/K\1/g
    s/^zccH\(..\)$/K\1/g
    s/^ccccH\(..\)$/K\1/g
    s/^zcccH\(..\)$/K\1/g
    s/^cccc\(..\)$/K\1/g
    s/^zccc\(..\)$/K\1/g
    s/^qoH\(..\)$/Q\1/g
    s/^qoHc\(..\)$/Q\1/g
    s/^qoHcc\(..\)$/Q\1/g
    s/^oH\(..\)$/O\1/g
    s/^oHc\(..\)$/O\1/g
    s/^oHcc\(..\)$/O\1/g
    s/^8\(..\)$/B\1/g
    -------------------------------------------

    cat .keys \
      | collapse-words \
      | sort | uniq \
      > .keys.cooked
      
    cat .wds \
      | sed -e 's/c?m$/am/g' \
      | sed -e '/?/s/^.*$/???/g' \
      | collapse-words \
      | enum-word-pairs \
      | count-diword-freqs -v keyfile=.keys.cooked \
      > .baz

  Here are the results, with 0 and 1 mapped to "."

           raw pair counts
           ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                       e                    
                                                                                       c                    
                                                                                       c           q        
               T       B   B   B   K   K   O   O   O   O   O   Q   Q   Q   Q   Q   Q   c           o   q   z
               O   /   a   a   a   8   c   8   a   a   a   c   8   a   a   a   a   c   8   o   o   H   o   a
               T   /   e   m   r   a   a   a   e   m   r   a   a   e   m   n   r   a   a   e   r   a   e   m
           ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    //       765   .   3  20   4  10   5   3   .   .   .   2  43   9  24   5   2  19   3   .   .   2  10  23
    Bae       50   8   .   2   .   7   .   .   .   .   .   .   .   .   .   .   .   .   2   .   2   .   .   .
    Bam      100   9   .   4   .  10  17   3   .   3   .   .   .   .   .   .   .   .   .   5   .   .   .   1
    Bar       51   5   .   .   .   6   4   .   .   .   .   .   .   .   .   .   .   .   .   4   .   .   .   1
    K8a      452  25   2   4   .  17  14   8   2   3   .   .  56  12  35   8  10  14   5   7   .   6  20   .
    Kca      343  11   3   5   3   4   8   4   3   2   3   2  22   9  31   6   3  13   5   5   4  13   3   3
    O8a      139   8   .   .   .  11   3   6   2   .   .   2  16   2   .   .   2   .   2   2   .   6   2   .
    Oae       40   4   2   .   .   7   3   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
    Oam       76   3   .   .   .   7   8   3   3   3   2   .   .   .   .   .   .   .   .   3   .   .   .   .
    Oar       36   .   .   .   2   8   .   3   .   .   .   .   .   .   .   .   .   .   .   2   .   .   .   .
    Oca       61   4   .   2   .   .   .   .   .   2   .   .   2   .   2   .   .   2   2   .   .   .   2   .
    Q8a      382  10   3   6   4  26  11  16   3   4   2   3  51  14   8   5   5  13   7   2   .   7   6   .
    Qae      114   9   3   4   5  17  12   .   .   .   .   .   .   2   .   .   .   .   .   .   .   .   2   .
    Qam      200   5   2   .   2  30  31   5   3   6   .   2   2   .   4   .   .   .   .   6   .   .   .   2
    Qan       54   .   2   .   3   6  12   2   .   2   .   .   .   .   .   .   .   .   .   .   .   .   .   .
    Qar       49   .   .   .   .  11   8   .   .   .   .   .   .   .   .   .   .   .   .   4   .   .   .   .
    Qca      132   5   .   3   .   6   2   5   .   2   .   5  11   6   6   2   .   5   .   2   2   6   .   1
    eccc8a    52   9   .   .   .   4   2   .   .   .   .   .   6   .   3   .   .   .   .   .   .   3   .   .
    oe       127  16   .   3   .  22  10   4   .   3   .   .   .   .   .   .   .   .   .   2   .   2   .   1
    or        40   3   .   .   .   6   4   .   .   .   .   .   .   .   .   .   .   .   .   .   2   .   .   .
    qoHa      79  20   .   5   .   4   3   2   .   2   .   .   2   3   .   .   .   .   2   .   .   .   2   3
    qoe       81   9   .   .   .  22   6   .   .   2   .   .   .   .   .   .   .   .   .   3   .   .   2   1
    zam       52   3   .   .   .   8   7   .   .   .   .   .   .   .   .   .   .   .   .   2   .   .   .   .
           ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
    TOT     7054 765  50 100  51 452 343 139  40  76  36  61 382 114 200  54  49 132  52 127  40  79  81  52

                next word probabilities
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
                                                                         e               
                                                                         c               
                                                                         c        q      
                   T     B  B  B  K  K  O  O  O  O  O  Q  Q  Q  Q  Q  Q  c        o  q  z
                   O  /  a  a  a  8  c  8  a  a  a  c  8  a  a  a  a  c  8  o  o  H  o  a
                   T  /  e  m  r  a  a  a  e  m  r  a  a  e  m  n  r  a  a  e  r  a  e  m
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    //            24  .  .  2  .  .  .  .  .  .  .  .  5  .  3  .  .  2  .  .  .  .  .  3
    Kca           48  3  .  .  .  .  2  .  .  .  .  .  6  2  9  .  .  3  .  .  .  3  .  .
    Qca           56  3  .  2  .  4  .  3  .  .  .  3  8  4  4  .  .  3  .  .  .  4  .  .
    Oca           40  6  .  3  .  .  .  .  .  3  .  .  3  .  3  .  .  3  3  .  .  .  3  .
    K8a           55  5  .  .  .  3  3  .  .  .  .  . 12  2  7  .  2  3  .  .  .  .  4  .
    O8a           48  5  .  .  .  7  2  4  .  .  .  . 11  .  .  .  .  .  .  .  .  4  .  .
    Q8a           53  2  .  .  .  6  2  4  .  .  .  . 13  3  2  .  .  3  .  .  .  .  .  .
    eccc8a        53 17  .  .  .  7  3  .  .  .  .  . 11  .  5  .  .  .  .  .  .  5  .  .
    Bam           54  8  .  3  .  9 16  2  .  2  .  .  .  .  .  .  .  .  .  4  .  .  .  .
    Oam           43  3  .  .  .  9 10  3  3  3  2  .  .  .  .  .  .  .  .  3  .  .  .  .
    Qam           51  2  .  .  . 14 15  2  .  2  .  .  .  .  .  .  .  .  .  2  .  .  .  .
    zam           48  5  .  .  . 15 13  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .
    Bar           56  9  .  .  . 11  7  .  .  .  .  .  .  .  .  .  .  .  .  7  .  .  .  1
    Oar           61  2  2  .  5 22  2  8  .  2  2  2  2  .  .  .  .  .  .  5  .  .  .  .
    Qar           59  2  .  .  . 22 16  2  .  .  2  .  .  2  .  .  .  2  .  8  2  .  .  .
    Qan           62  .  3  .  5 11 22  3  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Bae           53 15  .  3  . 13  .  .  .  .  .  .  .  .  .  .  .  .  3  .  3  .  .  .
    Oae           57  9  4  2  2 17  7  2  .  .  .  2  .  2  2  .  .  .  .  2  .  .  .  .
    Qae           53  7  2  3  4 14 10  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    or            39  7  .  .  . 14  9  .  .  .  .  .  .  .  .  .  .  .  .  2  4  .  .  .
    qoe           61 11  .  .  . 27  7  .  .  2  .  .  .  .  .  .  .  .  .  3  .  .  2  1
    oe            54 12  .  2  . 17  7  3  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    qoHa          64 25  .  6  .  5  3  2  .  2  .  .  2  3  .  .  .  .  2  .  .  .  2  3
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT           49 10  .  .  .  6  4  .  .  .  .  .  5  .  2  .  .  .  .  .  .  .  .  .

                prev word probabilities
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
                                                                         e               
                                                                         c               
                                                                         c        q      
                   T     B  B  B  K  K  O  O  O  O  O  Q  Q  Q  Q  Q  Q  c        o  q  z
                   O  /  a  a  a  8  c  8  a  a  a  c  8  a  a  a  a  c  8  o  o  H  o  a
                   T  /  e  m  r  a  a  a  e  m  r  a  a  e  m  n  r  a  a  e  r  a  e  m
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    //            10  .  5 19  7  2  .  2  2  .  .  3 11  7 11  9  4 14  5  .  2  2 12 44
    K8a            6  3  3  3  .  3  4  5  4  3  2  . 14 10 17 14 20 10  9  5  .  7 24  .
    O8a            .  .  .  .  .  2  .  4  4  .  .  3  4  .  .  .  4  .  3  .  2  7  2  .
    Q8a            5  .  5  5  7  5  3 11  7  5  5  4 13 12  3  9 10  9 13  .  .  8  7  .
    eccc8a         0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .
    Bae            0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  4  .  .  .
    Oae            0  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Qae            .  .  5  3  9  3  3  .  .  .  2  .  .  .  .  .  .  .  .  .  2  .  2  .
    Bam            .  .  .  3  .  2  4  2  2  3  2  .  .  .  .  .  .  .  .  3  .  .  .  1
    Oam            .  .  .  .  .  .  2  2  7  3  5  .  .  .  .  .  .  .  .  2  .  .  .  .
    Qam            2  .  3  .  3  6  9  3  7  7  .  3  .  .  .  .  2  .  .  4  .  .  .  3
    zam            0  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Qan            0  .  3  .  5  .  3  .  2  2  2  .  .  .  .  .  .  .  .  .  2  .  .  .
    Bar            0  .  .  .  .  .  .  .  2  .  2  .  .  .  .  .  2  .  .  3  2  .  .  1
    Oar            0  .  .  .  3  .  .  2  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .
    Qar            0  .  .  .  .  2  2  .  .  .  2  .  .  .  .  .  .  .  .  3  2  .  .  .
    Kca            4  .  5  4  5  .  2  2  7  2  8  3  5  7 15 11  6  9  9  3  9 16  3  5
    Oca            0  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  2  .  3  .  .  .  2  .
    Qca            .  .  .  2  .  .  .  3  2  2  2  8  2  5  2  3  .  3  .  .  4  7  .  1
    oe             .  2  .  2  .  4  2  2  .  3  .  .  .  .  .  .  2  .  .  .  2  2  .  1
    qoe            .  .  .  .  .  4  .  .  .  2  2  .  .  .  .  .  .  .  .  2  2  .  2  1
    qoHa           .  2  .  4  .  .  .  .  .  2  .  .  .  2  .  .  2  .  3  .  .  .  2  5
    or             0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .
                ---- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT           49 22 51 60 58 55 50 48 52 49 44 39 56 54 58 53 55 53 59 44 44 59 64 69

97-08-11 stolfi
===============

  Recomputed word pair tables with the new cont-diword-freqs:
  
    cat .wds \
      | sed -e 's/c?m$/am/g' \
      | sed -e '/?/s/^.*$/???/g' \
      > .fix.wds
      
    cat .fix.wds \
      | sort | uniq \
      > .fix.dic
      
    cat .fix.wds \
      | enum-word-pairs \
      | count-diword-freqs -v rows=.fix.dic -v cols=.ckeys \
      > .baz
      
    tac .fix.wds \
      | sed -e '/=/d' \
      | enum-word-pairs \
      | count-diword-freqs -v rows=.fix.dic -v cols=.rkeys \
      > .bar
      
  I took all words with 8 or more occurrences, and looked at the
  probabilities "in" and "fn" of the "word" occurring at
  beginning-of-line and end-of-line, respectively.
  
  I added the two probabilities, and got the probability "ex" of each word 
  occuring at an extremal position in the line.  
  
  Here are the words, sorted by the probability "in" of being line-initial:
  
    word         freq in fn ex
    ------------ ---- -- -- --
    Poe             8 99  0 99
    8zcc8a         17 88  0 88
    zor            10 79  0 79
    azcc8a          8 74  0 74
    8ccc8a          9 66  0 66
    zoe            25 59  0 59
    Hccc8a         14 49  0 49
    zam            52 44  5 49
    Pccc8a         14 42  7 49
    aHcc8a         12 41  8 49
    8an            12 33  0 33
    zae            14 28 14 42
    zar            11 27 27 54
    aHc8a          12 24  8 32
    eoe            17 23 58 81
    zccor           9 22  0 22
    8am           100 19  8 27
    qoHccc8a       11 18  9 27
    zcoe           11 18  0 18
    8oe            17 17  0 17
    ???          1294 17 18 35
    qoHcca         81 16  3 19
    qoHcc8a       183 15  2 17
    oeHcc8a        14 14  0 14
    qoHoe          21 14  0 14
    qoHca          43 13  4 17
    qoPccc8a        8 12 12 24
    qoe            81 12 11 23
    qoeccca         8 12  0 12
    cccoe          17 11  0 11
    qoHam         200 11  2 13
    zccc8a         36 11  0 11
    oeHcca         19 10  0 10
    ccoe           11  9  0  9
    eor            10  9 39 48
    ezcc8a         21  9 14 23
    qoHan          54  9  1 10
    oeccca         12  8  8 16
    8ar            51  7  9 16
    oezcc8a        14  7 14 21
    qoHae         113  7  7 14
    qoHc8a        198  7  3 10
    Ham            16  6  0  6
    oHan           16  6 12 18
    8ae            50  5 15 20
    eccc8a         52  5 17 22
    oHcca          34  5  2  7
    zccoe          17  5  0  5
    oHc8a          83  4  3  7
    oeccc8a        23  4 26 30
    qoHar          48  4  2  6
    zcca           69  4  4  8
    zccca          23  4  0  4
    cccca          31  3  0  3
    ccc8a         172  2  6  8
    oHae           39  2 10 12
    or             40  2  7  9
    qoHa           79  2 25 27
    ccca           67  1  5  6
    8a             35  0 31 31
    Hae            10  0  9  9
    Hc8a           25  0  7  7
    Hcc8a          14  0  0  0
    aHcca           9  0  0  0
    ae             12  0 24 24
    am             20  0  9  9
    cc8a           16  0 12 12
    cccHa          12  0  0  0
    cccHc8a         8  0  0  0
    cccHca         50  0  7  7
    cccc8a         19  0 10 10
    ccccHa         12  0  0  0
    ccccHca        35  0  0  0
    cccz            8  0  0  0
    e8a             8  0 74 74
    eHam            8  0  0  0
    eHc8a           8  0  0  0
    eccca          15  0 19 19
    oHa            25  0 27 27
    oHam           76  0  3  3
    oHar           35  0  2  2
    oHca           21  0  4  4
    oHcc8a         56  0  8  8
    oHoe           11  0  0  0
    oPccc8a        14  0  7  7
    oe            127  0 12 12
    oe8a            9  0 77 77
    oeHa           10  0  0  0
    oeHam          22  0  4  4
    oeHc8a         19  0 21 21
    oea            23  0 78 78
    oeoe            8  0 49 49
    oeor           13  0 23 23
    oezcca          8  0 12 12
    qoHccca         8  0  0  0
    ram            14  0 21 21
    roe             9  0 33 33
    zca             9  0 11 11
    zcc8a         204  0  4  4
    zccHa          14  0  7  7
    zccHca         37  0  0  0
    zccHcca         8  0  0  0
    zcccHa         12  0  8  8
    zcccHca        31  0  0  0

  By the probability "fn" of being line-final:

    word         freq in fn ex
    ------------ ---- -- -- --
    oea            23  0 78 78
    oe8a            9  0 77 77
    e8a             8  0 74 74
    eoe            17 23 58 81
    oeoe            8  0 49 49
    eor            10  9 39 48
    roe             9  0 33 33
    8a             35  0 31 31
    oHa            25  0 27 27
    zar            11 27 27 54
    oeccc8a        23  4 26 30
    qoHa           79  2 25 27
    ae             12  0 24 24
    oeor           13  0 23 23
    oeHc8a         19  0 21 21
    ram            14  0 21 21
    eccca          15  0 19 19
    ???          1294 17 18 35
    eccc8a         52  5 17 22
    8ae            50  5 15 20
    ezcc8a         21  9 14 23
    oezcc8a        14  7 14 21
    zae            14 28 14 42
    cc8a           16  0 12 12
    oHan           16  6 12 18
    oe            127  0 12 12
    oezcca          8  0 12 12
    qoPccc8a        8 12 12 24
    qoe            81 12 11 23
    zca             9  0 11 11
    cccc8a         19  0 10 10
    oHae           39  2 10 12
    8ar            51  7  9 16
    Hae            10  0  9  9
    am             20  0  9  9
    qoHccc8a       11 18  9 27
    8am           100 19  8 27
    aHc8a          12 24  8 32
    aHcc8a         12 41  8 49
    oHcc8a         56  0  8  8
    oeccca         12  8  8 16
    zcccHa         12  0  8  8
    Hc8a           25  0  7  7
    Pccc8a         14 42  7 49
    cccHca         50  0  7  7
    oPccc8a        14  0  7  7
    or             40  2  7  9
    qoHae         113  7  7 14
    zccHa          14  0  7  7
    ccc8a         172  2  6  8
    ccca           67  1  5  6
    zam            52 44  5 49
    oHca           21  0  4  4
    oeHam          22  0  4  4
    qoHca          43 13  4 17
    zcc8a         204  0  4  4
    zcca           69  4  4  8
    oHam           76  0  3  3
    oHc8a          83  4  3  7
    qoHc8a        198  7  3 10
    qoHcca         81 16  3 19
    oHar           35  0  2  2
    oHcca          34  5  2  7
    qoHam         200 11  2 13
    qoHar          48  4  2  6
    qoHcc8a       183 15  2 17
    qoHan          54  9  1 10
    8an            12 33  0 33
    8ccc8a          9 66  0 66
    8oe            17 17  0 17
    8zcc8a         17 88  0 88
    Ham            16  6  0  6
    Hcc8a          14  0  0  0
    Hccc8a         14 49  0 49
    Poe             8 99  0 99
    aHcca           9  0  0  0
    azcc8a          8 74  0 74
    cccHa          12  0  0  0
    cccHc8a         8  0  0  0
    ccccHa         12  0  0  0
    ccccHca        35  0  0  0
    cccca          31  3  0  3
    cccoe          17 11  0 11
    cccz            8  0  0  0
    ccoe           11  9  0  9
    eHam            8  0  0  0
    eHc8a           8  0  0  0
    oHoe           11  0  0  0
    oeHa           10  0  0  0
    oeHcc8a        14 14  0 14
    oeHcca         19 10  0 10
    qoHccca         8  0  0  0
    qoHoe          21 14  0 14
    qoeccca         8 12  0 12
    zccHca         37  0  0  0
    zccHcca         8  0  0  0
    zccc8a         36 11  0 11
    zcccHca        31  0  0  0
    zccca          23  4  0  4
    zccoe          17  5  0  5
    zccor           9 22  0 22
    zcoe           11 18  0 18
    zoe            25 59  0 59
    zor            10 79  0 79

  By probability "ex" of being line-extreme:

    word         freq in fn ex
    ------------ ---- -- -- --
    Poe             8 99  0 99
    8zcc8a         17 88  0 88
    eoe            17 23 58 81
    zor            10 79  0 79
    oea            23  0 78 78
    oe8a            9  0 77 77
    azcc8a          8 74  0 74
    e8a             8  0 74 74
    8ccc8a          9 66  0 66
    zoe            25 59  0 59
    zar            11 27 27 54
    Hccc8a         14 49  0 49
    Pccc8a         14 42  7 49
    aHcc8a         12 41  8 49
    oeoe            8  0 49 49
    zam            52 44  5 49
    eor            10  9 39 48
    zae            14 28 14 42
    ???          1294 17 18 35
    8an            12 33  0 33
    roe             9  0 33 33
    aHc8a          12 24  8 32
    8a             35  0 31 31
    oeccc8a        23  4 26 30
    8am           100 19  8 27
    oHa            25  0 27 27
    qoHa           79  2 25 27
    qoHccc8a       11 18  9 27
    ae             12  0 24 24
    qoPccc8a        8 12 12 24
    ezcc8a         21  9 14 23
    oeor           13  0 23 23
    qoe            81 12 11 23
    eccc8a         52  5 17 22
    zccor           9 22  0 22
    oeHc8a         19  0 21 21
    oezcc8a        14  7 14 21
    ram            14  0 21 21
    8ae            50  5 15 20
    eccca          15  0 19 19
    qoHcca         81 16  3 19
    oHan           16  6 12 18
    zcoe           11 18  0 18
    8oe            17 17  0 17
    qoHca          43 13  4 17
    qoHcc8a       183 15  2 17
    8ar            51  7  9 16
    oeccca         12  8  8 16
    oeHcc8a        14 14  0 14
    qoHae         113  7  7 14
    qoHoe          21 14  0 14
    qoHam         200 11  2 13
    cc8a           16  0 12 12
    oHae           39  2 10 12
    oe            127  0 12 12
    oezcca          8  0 12 12
    qoeccca         8 12  0 12
    cccoe          17 11  0 11
    zca             9  0 11 11
    zccc8a         36 11  0 11
    cccc8a         19  0 10 10
    oeHcca         19 10  0 10
    qoHan          54  9  1 10
    qoHc8a        198  7  3 10
    Hae            10  0  9  9
    am             20  0  9  9
    ccoe           11  9  0  9
    or             40  2  7  9
    ccc8a         172  2  6  8
    oHcc8a         56  0  8  8
    zcca           69  4  4  8
    zcccHa         12  0  8  8
    Hc8a           25  0  7  7
    cccHca         50  0  7  7
    oHc8a          83  4  3  7
    oHcca          34  5  2  7
    oPccc8a        14  0  7  7
    zccHa          14  0  7  7
    Ham            16  6  0  6
    ccca           67  1  5  6
    qoHar          48  4  2  6
    zccoe          17  5  0  5
    oHca           21  0  4  4
    oeHam          22  0  4  4
    zcc8a         204  0  4  4
    zccca          23  4  0  4
    cccca          31  3  0  3
    oHam           76  0  3  3
    oHar           35  0  2  2
    Hcc8a          14  0  0  0
    aHcca           9  0  0  0
    cccHa          12  0  0  0
    cccHc8a         8  0  0  0
    ccccHa         12  0  0  0
    ccccHca        35  0  0  0
    cccz            8  0  0  0
    eHam            8  0  0  0
    eHc8a           8  0  0  0
    oHoe           11  0  0  0
    oeHa           10  0  0  0
    qoHccca         8  0  0  0
    zccHca         37  0  0  0
    zccHcca         8  0  0  0
    zcccHca        31  0  0  0

  Since there are 765 occurrences of "//" in about 6900 words, the
  expected probability of a word occuring at a specific end of a line
  is about 12%, and 24% of it occuring at either end.
    
  Taking 12% as the split point for "in" or "fn", we get the following 
  tentative categories:
  
  Extremists:

    word         freq in fn ex
    ------------ ---- -- -- --
    ???          1294 17 18 35
    eoe            17 23 58 81
    qoPccc8a        8 12 12 24
    zae            14 28 14 42
    zar            11 27 27 54

  Finalists:

    word         freq in fn ex
    ------------ ---- -- -- --
    8a             35  0 31 31
    8ae            50  5 15 20
    ae             12  0 24 24
    cc8a           16  0 12 12
    e8a             8  0 74 74
    eccc8a         52  5 17 22
    eccca          15  0 19 19
    eor            10  9 39 48
    ezcc8a         21  9 14 23
    oHa            25  0 27 27
    oHan           16  6 12 18
    oe            127  0 12 12
    oe8a            9  0 77 77
    oeHc8a         19  0 21 21
    oea            23  0 78 78
    oeccc8a        23  4 26 30
    oeoe            8  0 49 49
    oeor           13  0 23 23
    oezcc8a        14  7 14 21
    oezcca          8  0 12 12
    qoHa           79  2 25 27
    ram            14  0 21 21
    roe             9  0 33 33

  Initialists:

    word         freq in fn ex
    ------------ ---- -- -- --
    8am           100 19  8 27
    8an            12 33  0 33
    8ccc8a          9 66  0 66
    8oe            17 17  0 17
    8zcc8a         17 88  0 88
    Hccc8a         14 49  0 49
    Pccc8a         14 42  7 49
    Poe             8 99  0 99
    aHc8a          12 24  8 32
    aHcc8a         12 41  8 49
    azcc8a          8 74  0 74
    oeHcc8a        14 14  0 14
    qoHca          43 13  4 17
    qoHcc8a       183 15  2 17
    qoHcca         81 16  3 19
    qoHccc8a       11 18  9 27
    qoHoe          21 14  0 14
    qoe            81 12 11 23
    qoeccca         8 12  0 12
    zam            52 44  5 49
    zccor           9 22  0 22
    zcoe           11 18  0 18
    zoe            25 59  0 59
    zor            10 79  0 79

  Medialists:

    word         freq in fn ex
    ------------ ---- -- -- --
    cccoe          17 11  0 11
    qoHam         200 11  2 13
    zccc8a         36 11  0 11
    oeHcca         19 10  0 10
    ccoe           11  9  0  9
    qoHan          54  9  1 10
    oeccca         12  8  8 16
    8ar            51  7  9 16
    qoHae         113  7  7 14
    qoHc8a        198  7  3 10
    Ham            16  6  0  6
    oHcca          34  5  2  7
    zccoe          17  5  0  5
    oHc8a          83  4  3  7
    qoHar          48  4  2  6
    zcca           69  4  4  8
    zccca          23  4  0  4
    cccca          31  3  0  3
    ccc8a         172  2  6  8
    oHae           39  2 10 12
    or             40  2  7  9
    ccca           67  1  5  6
    Hae            10  0  9  9
    Hc8a           25  0  7  7
    Hcc8a          14  0  0  0
    aHcca           9  0  0  0
    am             20  0  9  9
    cccHa          12  0  0  0
    cccHc8a         8  0  0  0
    cccHca         50  0  7  7
    cccc8a         19  0 10 10
    ccccHa         12  0  0  0
    ccccHca        35  0  0  0
    cccz            8  0  0  0
    eHam            8  0  0  0
    eHc8a           8  0  0  0
    oHam           76  0  3  3
    oHar           35  0  2  2
    oHca           21  0  4  4
    oHcc8a         56  0  8  8
    oHoe           11  0  0  0
    oPccc8a        14  0  7  7
    oeHa           10  0  0  0
    oeHam          22  0  4  4
    qoHccca         8  0  0  0
    zca             9  0 11 11
    zcc8a         204  0  4  4
    zccHa          14  0  7  7
    zccHca         37  0  0  0
    zccHcca         8  0  0  0
    zcccHa         12  0  8  8
    zcccHca        31  0  0  0


  Note that the average line has about 10 words.  The average number
  of lines per paragraph is at most 10 (but an unknown number
  of paragraph breaks may have been lost in the transcription).
  
  Here are three explanations I can think of for a word w to have a
  marked preference for or aversion to these extremal positions:
  
    (1) Grammar: If w occurs preferably at the end of a sentence, it 
        will be often found at the end of paragraphs, which
        are a significant fraction (10% or more) of all end-of-lines.

        This effect can only boost the end-of-line probability up to
        the fraction F of sentences that end at end-of-line.  The
        extreme cases are `e8a' and `oe8a' (around 75%).  To explain
        these numbers by cause (1), it would require at least 3/4 of
        all sentences to end at end-of-line.

        Conversely, if w has preference for beginning-of-sentence,
        it will be found at end-of-line only if a paragraph 
        contains two or more sentences.

        An extreme case is `qoHam', that occurs 200 times, but only 2%
        of those occurrences are at end-of-line.  We tentatively
        conclude that at most 2% of the sentences begin one word
        before end-of-line.  If the second and subsequent sentences of
        a paragraph begin at random positions of the line, then such
        sentences are less than 20% of all sentences, and hence 80% of
        all paragraphs contain only one sentence.

    (2) Word splitting. In the VMs, words may have been split across 
        line breaks without obvious markings.  The left halves of
        split words would then show up as end-loving, begin-loathing;
        and symmetrically for ther right halves.
        
        This explanation canot account for the many end-loving words
        ending in `8a', like `eccc8a', because `8a' rarely occurs in
        the middle of a word: it is almost always final, and a few
        times initial.  Likewise, it cannot account for end-loathing
        words that begin with `qo', which appears to be strictly
        word-initial.
        
        Also, this effect cannot explain words that 
        avoid both ends of the line, like 

            Hcc8a          14
            aHcca           9
            cccHa          12
            cccHc8a         8
            ccccHa         12
            ccccHca        35
            cccz            8
            eHam            8
            eHc8a           8
            oHoe           11
            oeHa           10
            qoHccca         8
            zccHca         37
            zccHcca         8
            zcccHca        31
            
    (3) False line breaks: In a sense the opposite of (2).
        Suppose w is part of a longer word x, but the letter spacing
        is such that x is often transcribed as two or three separate
        words, one of them being w.  Then w will seem to avoid
        end-of-line, begin-of-line, or both, depending on the position
        of w in x.
        
        This effect can only explain end-avoidance, not
        end-attraction.  Also, it seems unlikely to be due to bad
        judgement by the transcribers; the word spaces in VMs are
        usually pretty clear, and anyway I only considered word breaks
        where both Friedman and Currier agreed.  So, this explanation
        only flies if the the word spaces are bogus by design.
         
  Conclusion: the most likely explanation for most
  anomalous words seems to be (1).  
  
  Posted an improved version of these comments to the "voynich" list.
  
97-08-12 stolfi
===============

  Did a general cleanup.