Hacking at the Voynich manuscript
Notebook - volume 10

Warning: these notebooks aren't strictly chronological logs.
  Sometimes I go back and redo things, clarify comments,
  delete garbage, etc.

97-10-01 stolfi
===============

   Redoing the statistics.  Should I correct the 2/R "mistakes"?
   Let's not do it for now.  But I will combine H+D, P+F, S+T: 

  Tetragram frequencies around line breaks, ignoring spaces:

    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=5 \
      | egrep -v '\*' \
      | egrep '^..:..$' \
      > .voyn-nl-2-2.grm
  
    cat .voyn-nl-2-2.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-2-2.frq

  Tetragram frequencies around blanks (spaces and line breaks): 

    cat .voyn.fsg \
      | tr -d '/=' \
      | tr -s ' \012' '__' \
      | enum-ngraphs -v n=7 \
      | egrep -v '\*' \
      | egrep '^..._...$' \
      | sed \
          -e 's/^\(...\)_\(...\)$/\1:\2/g'  \
          -e 's/_//g'  \
          -e 's/^.*\(..\):\(..\).*$/\1:\2/g'  \
      > .voyn-sp-2-2.grm

    cat .voyn-sp-2-2.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-sp-2-2.frq

  Comparisons:

    compare-freqs \
        .voyn-tt-2-2.frq \
        .voyn-nl-2-2.frq \
      | compute-count-ratio \
      | sort +0.0 -0.2r +4 -5nr \
      > .voyn-tt-nl-2-2.cmp

    compare-freqs \
        .voyn-sp-2-2.frq \
        .voyn-nl-2-2.frq \
      | compute-count-ratio \
      | sort +0.0 -0.2r +4 -5nr \
      > .voyn-sp-nl-2-2.cmp

  
97-10-07 stolfi
===============

  Let's compute the distribution of letters at start, middle, and 
  end of the pseudo-"words" (delimited by spaces).  
  I will ignore line-start and line-end as they may
  be special.

    cat .voyn.fsg \
      | tr -d '/=' \
      | sed \
          -e 's/^  *//g' \
          -e 's/  *$//g' \
      | enum-ngraphs -v n=2 \
      > .voyn.dig

    cat .voyn.fsg \
      | tr -d '/=' \
      | sed \
          -e 's/^ *//g' \
          -e 's/ *$//g' \
      | enum-ngraphs -v n=3 \
      > .voyn.trg

    cat .voyn.dig \
      | egrep '^ .$' \
      | sed -e 's/^.\(.\)$/\1/g' \
      | sort | uniq -c | expand \
      > .voyn-ws.frq

    cat .voyn.dig \
      | egrep '^. $' \
      | sed -e 's/^\(.\).$/\1/g' \
      | sort | uniq -c | expand \
      > .voyn-we.frq

    cat .voyn.trg \
      | egrep '^[^ ][^ ][^ ]$' \
      | sed -e 's/^.\(.\).$/\1/' \
      | sort | uniq -c | expand \
      > .voyn-wm.frq

    join \
        -a 1 -a 2  -e 0 -j1 2 -j2 2 -o '0,1.1,2.1' \
        .voyn-wm.frq \
        .voyn-we.frq \
      > .tmp

    join \
        -a 1 -a 2  -e 0 -j1 2 -j2 1 -o '0,1.1,2.2,2.3' \
        .voyn-ws.frq \
        .tmp \
      | gawk ' {printf "%s   %5d %5d %5d\n", $1, $2, $3, $4}' \
      > .voyn-wsme.frq

      let   ini   mid   fin
      --- ----- ----- -----
      4    1456    21     5

      O    1317  2558    27
      S     670   383     4
      T     746   689     1
      8     412  2154    61
      D     102  2071    14
      H     101   816     4

      P      28   112     4
      F       6    27     1
      A     126  1826     0
      C      23  4235     4
      I       1    71     0
      Z       0   343     1

      G      79   120  3126
      K       0     2     5
      L       2     3     3
      M       0    10   395
      N       0    16   458
      6       1     1     3

      *      11    15    10

      E     340   909   947

      2     140    28    57
      R     130   176   561

  Note that "E", "2" and "R" are the only letters that 
  occur in significant numbers at all three positions.
  
  Note also that "2" and "R" are easily confused with each other, so
  the numbers are consistent with "2" being exclusively word-initial,
  "R" being exclusively word-final, and there being substantial
  misredings in both directions (10% of the "R"s misread as "2"s,
  40% of the "2"s misread as "R"s).

  Here is an attempt to recreate the blanks in the VMs according to
  simple rules.  First, prepare a file where every two characters are
  separated by " " or "-". Then replace all blanks by "-", and replace
  some "-" by " " before "[42]" and after "[GKLMN6R]"
  
    cat .voyn.fsg \
      | tr -d '/=' \
      | sed -e 's/^  *//g' -e 's/  *$//g' \
      | sed \
          -e 's/\(.\)/\1:/g' \
          -e 's/: :/ /g' \
          -e 's/:$//g' \
      > .voyn-sp-org.fsg
      
    cat .voyn.fsg \
      | tr -d '/= ' \
      | sed \
          -e 's/\(.\)/\1:/g' \
          -e 's/:$//g' \
          -e 's/:\([42]\)/ \1/g' \
          -e 's/\([GKLMN6R]\):/\1 /g' \
      > .voyn-sp-syn.fsg

    compare-spaces \
      .voyn-sp-syn.fsg \
      .voyn-sp-org.fsg \
      | tr -d ':' \
      > .voyn-sp.cmp
      
                  :
        ----- -----
      |  4707   676
    : |   984 22311


    cat .voyn-sp.cmp \
      | tr -dc '+\- ' \
      | sed -e 's/\(.\)/\1@/g' \
      | tr '@ ' '\012_' \
      | egrep '.' \
      | sort | uniq -c | expand \
      > .voyn-sp-o-s.frq
      
        676 +
        984 -
       4707 _

    cat .voyn-sp.cmp \
      | tr ' ' '_' \
      | enum-ngraphs -v n=3 \
      | egrep '.[-_+].' \
      | sort | uniq -c | expand \
      | sort -b +1.0 -1.2 +0 -1nr \
      > .foo
  
   It seems that many of the errors made by these space-prediction
   rules are due to confusion between "2" and "R" by the transcriber.
   Let's try to "correct" these mistakes by changing in the original 
   
     word-initial "R" to "2"
     non-word-initial "2" to "R"
     
   Let's do these changes 
     
    cat .voyn.fsg \
      | tr -d '/=' \
      | sed \
          -e 's/^  *//g' \
          -e 's/  *$//g' \
          -e 's/ R/ 2/g' \
          -e 's/\([^ ]\)2/\1R/g' \
      | tr -d ' ' \
      | sed \
          -e 's/\(.\)/\1:/g' \
          -e 's/:$//g' \
          -e 's/:\([42]\)/ \1/g' \
          -e 's/\([GKLMN6R]\):/\1 /g' \
      > .voyn-sp-fix.fsg

    compare-spaces \
      .voyn-sp-fix.fsg \
      .voyn-sp-org.fsg \
      | tr -d ':' \
      > .voyn-sp-fix.cmp

            R     2           :
        ----- ----- ----- -----
    R |   792    87     0     0
    2 |   130   278     0     0
      |     0     0  4759   507
    : |     0     0   932 22480


    cat .voyn-sp-fix.cmp \
      | tr -dc '+\- ' \
      | sed -e 's/\(.\)/\1@/g' \
      | tr '@ ' '\012_' \
      | egrep '.' \
      | sort | uniq -c | expand \
      > .voyn-sp-o-s-fix.frq

        507 +
        932 -
       4759 _

    cat .voyn-sp-fix.cmp \
      | tr ' ' '_' \
      | enum-ngraphs -v n=3 \
      | egrep '.[-_+].' \
      | sort | uniq -c | expand \
      | sort -b +1.0 -1.2 +0 -1nr \
      > .foo

  Let's compute what would be the initial/medial/final statistics
  with these R/2 changes but with the original spaces:
  
    cat .voyn-sp-fix.cmp \
      | tr -d '+' \
      | tr '\-' ' ' \
      > .voyn-sp-fixr2.fsg
      
    cat .voyn-sp-fixr2.fsg \
      | tr -d '/=' \
      | sed \
          -e 's/^  *//g' \
          -e 's/  *$//g' \
      | egrep ' .* ' \
      | sed \
          -e 's/^[^ ][^ ]* //g' \
          -e 's/ [^ ][^ ]*$//g' \
      | tr ' ' '\012' \
      | egrep '.' \
      > .voyn-sp-fixr2-nonend.wds

    cat .voyn-sp-fixr2-nonend.wds \
      | sed -e 's/^\(.\).*$/\1/g' \
      | sort | uniq -c | expand \
      > .voyn-sp-fixr2-ws.frq

    cat .voyn-sp-fixr2-nonend.wds \
      | sed -e 's/^.*\(.\)$/\1/g' \
      | sort | uniq -c | expand \
      > .voyn-sp-fixr2-we.frq

    cat .voyn-sp-fixr2-nonend.wds \
      | egrep '...' \
      | sed \
          -e 's/^.\(.*\).$/\1/' \
          -e 's/\(.\)/\1@/g' \
      | tr '@' '\012' \
      | egrep '.' \
      | sort | uniq -c | expand \
      > .voyn-sp-fixr2-wm.frq

    join \
        -a 1 -a 2  -e 0 -j1 2 -j2 2 -o '0,1.1,2.1' \
        .voyn-sp-fixr2-wm.frq \
        .voyn-sp-fixr2-we.frq \
      > .tmp

    join \
        -a 1 -a 2  -e 0 -j1 2 -j2 1 -o '0,1.1,2.2,2.3' \
        .voyn-sp-fixr2-ws.frq \
        .tmp \
      | gawk ' {printf "%s   %5d %5d %5d\n", $1, $2, $3, $4}' \
      > .voyn-sp-fixr2-wsme.frq

      let   ini   mid   fin
      --- ----- ----- -----
      2     189     8     2
      R      15    79   524

97-10-08 stolfi
===============

  An intermezzo: for Denis's benefit, let's compute a table of digraph
  frequencies in Currier notation.
  
    cat .voyn.fsg \
      | sed \
          -e 's/HZ/q/g' \
          -e 's/PZ/w/g' \
          -e 's/DZ/x/g' \
          -e 's/FZ/y/g' \
          -e 's/IIIE/1/g' \
          -e 's/IIE/h/g' \
          -e 's/IE/g/g' \
          -e 's/IIIR/0/g' \
          -e 's/IIR/t/g' \
          -e 's/IR/u/g' \
          -e 's/IIIL/3/g' \
          -e 's/IIIK/5/g' \
          -e 's/IIK/l/g' \
          -e 's/IK/k/g' \
      | tr 'GTSHPDFLK' '9SZPBFVDJ' \
      | tr 'qwxyhgtulk' 'QWXYHGTULK' \
      > .voyn.cur
      
    cat .voyn.cur \
      | tr -d '/= ' \
      | tr 'IGHTUDL56' '*********' \
      | count-digraph-freqs \
        -vshowentropy=1 \
        -vchars='PFBVQXWYSZC2RNMJ4AEO89IGH1TU0D3KL567' 

    Digraph counts:

           TT     P     F     B     V     Q     X     W     Y     S     Z     C     2     R     N     M     J     4     A     E     O     8     9     *      
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      P   852     .     1     .     .     .     .     .     .    62    26   341     1     .     .     1     .     .   259     3    62     3    88     5     .
      F  1993     .     .     .     .     .     .     .     .    72    30   869     .     1     .     2     .     .   736    11    95     3   170     4     .
      B   195     .     .     .     .     .     .     .     .    92    25     4     .     .     .     .     .     .    13     .    51     3     7     .     .
      V    32     .     .     .     .     .     .     .     .    24     2     1     .     .     .     .     .     .     2     .     3     .     .     .     .
      Q   121     .     .     .     .     .     .     .     .     .     .    31     1     .     .     .     .     .     3     .     5     6    74     .     1
      X   199     .     .     .     .     .     .     .     .     2     1    53     .     .     .     .     .     .     5     .     3    10   125     .     .
      W    21     .     .     .     .     .     .     .     .     .     .     9     .     .     .     .     .     .     2     .     2     3     5     .     .
      Y     4     .     .     .     .     .     .     .     .     .     .     2     .     .     .     .     .     .     .     .     .     2     .     .     .
      S  1453     8    17     4     1    31    66     8     2     1     3  1053     6     4     .     .     .     .    27    13    49    96    62     2     .
      Z  1078     6     6     .     .    19    39     .     .     3     .   866     1     1     .     .     .     2    23     5    38    41    28     .     .
      C  4268    38    79    13     3    39    69     4     .    15     9   953    45     8     .     .     .     2    53     4   175  1898   844    14     3
      2   365     3     4     1     .     1     1     .     .    18    19     2     2     .     .     .     .     3   150     2   133     4    10     1    11
      R   883     2     5     3     .     1     1     1     1   123   145     4     4     1     .     .     .    25   147     3   272    22    54     6    63
      N   503     1     .     .     .     3     .     2     .   117   104     3     5     1     .     .     .    19     9     2   169    30     9     .    29
      M   438     .     2     .     .     .     2     .     1   114    89     1     7     1     .     .     .    16     4     2   127    25    13     1    33
      J    53     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     2     2     .     .     .    48
      4  1676     1     5     2     2     5     1     .     .     .     1    10     2     .     .     .     .     1     .     .  1646     .     .     .     .
      A  1952     .     .     1     .     .     .     1     .     .     .     .     .   405   495   414    43     .     .   552     .     .     .    41     .
      E  2344    64   310    15     8     2     1     1     .   501   344    19    41    28     .     .     2    96    69    41   377   174   114     1   136
      O  3964   571  1434    67    13     9    14     1     .    10    10    19     7   305     7    20     7    13     4  1349    15    41    20    19     9
      8  2740     1     8     .     1     .     .     .     .    41    43    15     2     2     .     .     .    21   417    14    98     4  2059     2    12
      9  3781   107   115    17     2     9     3     2     .   233   190     6   101   121     .     .     1  1277    18   312   556   266    34     9   402
      *   113     .     1     1     .     .     1     .     .     8    14     4     1     1     1     1     .     3    11     3    28     5     6     8    16
          763    50     6    71     2     2     1     1     .    17    23     3   138     4     .     .     .   198     .    26    58   104    59     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 29791   852  1993   195    32   121   199    21     4  1453  1078  4268   365   883   503   438    53  1676  1952  2344  3964  2740  3781   113   763

    Next-symbol probability (× 99):

        TT  P  F  B  V  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  E  O  8  9  *   
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P 99  .  .  .  .  .  .  .  .  7  3 40  .  .  .  .  .  . 30  .  7  . 10  1  .
      F 99  .  .  .  .  .  .  .  .  4  1 43  .  .  .  .  .  . 37  1  5  .  8  .  .
      B 99  .  .  .  .  .  .  .  . 47 13  2  .  .  .  .  .  .  7  . 26  2  4  .  .
      V 99  .  .  .  .  .  .  .  . 74  6  3  .  .  .  .  .  .  6  .  9  .  .  .  .
      Q 99  .  .  .  .  .  .  .  .  .  . 25  1  .  .  .  .  .  2  .  4  5 61  .  1
      X 99  .  .  .  .  .  .  .  .  1  . 26  .  .  .  .  .  .  2  .  1  5 62  .  .
      W 99  .  .  .  .  .  .  .  .  .  . 42  .  .  .  .  .  .  9  .  9 14 24  .  .
      Y 99  .  .  .  .  .  .  .  .  .  . 50  .  .  .  .  .  .  .  .  . 50  .  .  .
      S 99  1  1  .  .  2  4  1  .  .  . 72  .  .  .  .  .  .  2  1  3  7  4  .  .
      Z 99  1  1  .  .  2  4  .  .  .  . 80  .  .  .  .  .  .  2  .  3  4  3  .  .
      C 99  1  2  .  .  1  2  .  .  .  . 22  1  .  .  .  .  .  1  .  4 44 20  .  .
      2 99  1  1  .  .  .  .  .  .  5  5  1  1  .  .  .  .  1 41  1 36  1  3  .  3
      R 99  .  1  .  .  .  .  .  . 14 16  .  .  .  .  .  .  3 16  . 30  2  6  1  7
      N 99  .  .  .  .  1  .  .  . 23 20  1  1  .  .  .  .  4  2  . 33  6  2  .  6
      M 99  .  .  .  .  .  .  .  . 26 20  .  2  .  .  .  .  4  1  . 29  6  3  .  7
      J 99  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  4  4  .  .  . 90
      4 99  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  . 97  .  .  .  .
      A 99  .  .  .  .  .  .  .  .  .  .  .  . 21 25 21  2  .  . 28  .  .  .  2  .
      E 99  3 13  1  .  .  .  .  . 21 15  1  2  1  .  .  .  4  3  2 16  7  5  .  6
      O 99 14 36  2  .  .  .  .  .  .  .  .  .  8  .  .  .  .  . 34  .  1  .  .  .
      8 99  .  .  .  .  .  .  .  .  1  2  1  .  .  .  .  .  1 15  1  4  . 74  .  .
      9 99  3  3  .  .  .  .  .  .  6  5  .  3  3  .  .  . 33  .  8 15  7  1  . 11
      * 99  .  1  1  .  .  1  .  .  7 12  4  1  1  1  1  .  3 10  3 25  4  5  7 14
        99  6  1  9  .  .  .  .  .  2  3  . 18  1  .  .  . 26  .  3  8 13  8  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99  3  7  1  0  0  1  0  0  5  4 14  1  3  2  1  0  6  6  8 13  9 13  0  3

    Previous-symbol probability (× 99):

        TT  P  F  B  V  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  E  O  8  9  *   
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P  3  .  .  .  .  .  .  .  .  4  2  8  .  .  .  .  .  . 13  .  2  .  2  4  .
      F  7  .  .  .  .  .  .  .  .  5  3 20  .  .  .  .  .  . 37  .  2  .  4  4  .
      B  1  .  .  .  .  .  .  .  .  6  2  .  .  .  .  .  .  .  1  .  1  .  .  .  .
      V  0  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Q  0  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  2  .  .
      X  1  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  3  .  .
      W  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Y  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      S  5  1  1  2  3 25 33 38 50  .  . 24  2  .  .  .  .  .  1  1  1  3  2  2  .
      Z  4  1  .  .  . 16 19  .  .  .  . 20  .  .  .  .  .  .  1  .  1  1  1  .  .
      C 14  4  4  7  9 32 34 19  .  1  1 22 12  1  .  .  .  .  3  .  4 69 22 12  .
      2  1  .  .  1  .  1  .  .  .  1  2  .  1  .  .  .  .  .  8  .  3  .  .  1  1
      R  3  .  .  2  .  1  .  5 25  8 13  .  1  .  .  .  .  1  7  .  7  1  1  5  8
      N  2  .  .  .  .  2  .  9  .  8 10  .  1  .  .  .  .  1  .  .  4  1  .  .  4
      M  1  .  .  .  .  .  1  . 25  8  8  .  2  .  .  .  .  1  .  .  3  1  .  1  4
      J  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  6
      4  6  .  .  1  6  4  .  .  .  .  .  .  1  .  .  .  .  .  .  . 41  .  .  .  .
      A  6  .  .  1  .  .  .  5  .  .  .  .  . 45 97 94 80  .  . 23  .  .  . 36  .
      E  8  7 15  8 25  2  .  5  . 34 32  . 11  3  .  .  4  6  3  2  9  6  3  1 18
      O 13 66 71 34 40  7  7  5  .  1  1  .  2 34  1  5 13  1  . 57  .  1  1 17  1
      8  9  .  .  .  3  .  .  .  .  3  4  .  1  .  .  .  .  1 21  1  2  . 54  2  2
      9 13 12  6  9  6  7  1  9  . 16 17  . 27 14  .  .  2 75  1 13 14 10  1  8 52
      *  0  .  .  1  .  .  .  .  .  1  1  .  .  .  .  .  .  .  1  .  1  .  .  7  2
         3  6  . 36  6  2  .  5  .  1  2  . 37  .  .  .  . 12  .  1  1  4  2  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 3.804

    Next-symbol entropy:

           TT     P     F     B     V     Q     X     W     Y     S     Z     C     2     R     N     M     J     4     A     E     O     8     9     *      
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      P 2.228     . 0.011     .     .     .     .     .     . 0.275 0.154 0.529 0.011     .     . 0.011     .     . 0.522 0.029 0.275 0.029 0.338 0.044     .
      F 1.918     .     .     .     .     .     .     .     . 0.173 0.091 0.522     . 0.005     . 0.010     .     . 0.531 0.041 0.209 0.014 0.303 0.018     .
      B 2.038     .     .     .     .     .     .     .     . 0.511 0.380 0.115     .     .     .     .     .     . 0.260     . 0.506 0.093 0.172     .     .
      V 1.288     .     .     .     .     .     .     .     . 0.311 0.250 0.156     .     .     .     .     .     . 0.250     . 0.320     .     .     .     .
      Q 1.589     .     .     .     .     .     .     .     .     .     . 0.503 0.057     .     .     .     .     . 0.132     . 0.190 0.215 0.434     . 0.057
      X 1.476     .     .     .     .     .     .     .     . 0.067 0.038 0.508     .     .     .     .     .     . 0.134     . 0.091 0.217 0.421     .     .
      W 2.064     .     .     .     .     .     .     .     .     .     . 0.524     .     .     .     .     .     . 0.323     . 0.323 0.401 0.493     .     .
      Y 1.000     .     .     .     .     .     .     .     .     .     . 0.500     .     .     .     .     .     .     .     .     . 0.500     .     .     .
      S 1.740 0.041 0.075 0.023 0.007 0.118 0.203 0.041 0.013 0.007 0.018 0.337 0.033 0.023     .     .     .     . 0.107 0.061 0.165 0.259 0.194 0.013     .
      Z 1.313 0.042 0.042     .     . 0.103 0.173     .     . 0.024     . 0.254 0.009 0.009     .     .     . 0.017 0.118 0.036 0.170 0.179 0.137     .     .
      C 2.283 0.061 0.107 0.025 0.007 0.062 0.096 0.009     . 0.029 0.019 0.483 0.069 0.017     .     .     . 0.005 0.079 0.009 0.189 0.520 0.462 0.027 0.007
      2 2.262 0.057 0.071 0.023     . 0.023 0.023     .     . 0.214 0.222 0.041 0.041     .     .     .     . 0.057 0.527 0.041 0.531 0.071 0.142 0.023 0.152
      R 2.867 0.020 0.042 0.028     . 0.011 0.011 0.011 0.011 0.396 0.428 0.035 0.035 0.011     .     .     . 0.146 0.431 0.028 0.523 0.133 0.247 0.049 0.272
      N 2.608 0.018     .     .     . 0.044     . 0.032     . 0.489 0.470 0.044 0.066 0.018     .     .     . 0.179 0.104 0.032 0.529 0.243 0.104     . 0.237
      M 2.676     . 0.036     .     .     . 0.036     . 0.020 0.505 0.467 0.020 0.095 0.020     .     .     . 0.174 0.062 0.036 0.518 0.236 0.151 0.020 0.281
      J 0.594     .     .     .     .     .     .     .     .     .     .     . 0.108     .     .     .     .     .     . 0.178 0.178     .     .     . 0.129
      4 0.180 0.006 0.025 0.012 0.012 0.025 0.006     .     .     . 0.006 0.044 0.012     .     .     .     . 0.006     .     . 0.026     .     .     .     .
      A 2.212     .     . 0.006     .     .     . 0.006     .     .     .     .     . 0.471 0.502 0.474 0.121     .     . 0.515     .     .     . 0.117     .
      E 3.345 0.142 0.386 0.047 0.028 0.009 0.005 0.005     . 0.476 0.406 0.056 0.102 0.076     .     . 0.009 0.189 0.150 0.102 0.424 0.279 0.212 0.005 0.238
      O 2.324 0.403 0.531 0.099 0.027 0.020 0.029 0.003     . 0.022 0.022 0.037 0.016 0.285 0.016 0.039 0.016 0.027 0.010 0.529 0.030 0.068 0.039 0.037 0.020
      8 1.317 0.004 0.025     . 0.004     .     .     .     . 0.091 0.094 0.041 0.008 0.008     .     .     . 0.054 0.413 0.039 0.172 0.014 0.310 0.008 0.034
      9 3.120 0.146 0.153 0.035 0.006 0.021 0.008 0.006     . 0.248 0.217 0.015 0.140 0.159     .     . 0.003 0.529 0.037 0.297 0.407 0.269 0.061 0.021 0.344
      * 3.434     . 0.060 0.060     .     . 0.060     .     . 0.270 0.373 0.171 0.060 0.060 0.060 0.060     . 0.139 0.327 0.139 0.499 0.199 0.225 0.270 0.399
        3.125 0.258 0.055 0.319 0.022 0.022 0.013 0.013     . 0.122 0.152 0.031 0.446 0.040     .     .     . 0.505     . 0.166 0.283 0.392 0.286     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 2.219 0.147 0.261 0.047 0.011 0.032 0.048 0.007 0.002 0.213 0.173 0.402 0.078 0.150 0.099 0.090 0.016 0.234 0.258 0.289 0.387 0.317 0.378 0.031 0.135

    Previous-symbol entropy:

           TT     P     F     B     V     Q     X     W     Y     S     Z     C     2     R     N     M     J     4     A     E     O     8     9     *      
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      P 0.147     . 0.005     .     .     .     .     .     . 0.194 0.130 0.291 0.023     .     . 0.020     .     . 0.387 0.012 0.094 0.011 0.126 0.199     .
      F 0.261     .     .     .     .     .     .     .     . 0.215 0.144 0.468     . 0.011     . 0.036     .     . 0.531 0.036 0.129 0.011 0.201 0.171     .
      B 0.047     .     .     .     .     .     .     .     . 0.252 0.126 0.009     .     .     .     .     .     . 0.048     . 0.081 0.011 0.017     .     .
      V 0.011     .     .     .     .     .     .     .     . 0.098 0.017 0.003     .     .     .     .     .     . 0.010     . 0.008     .     .     .     .
      Q 0.032     .     .     .     .     .     .     .     .     .     . 0.052 0.023     .     .     .     .     . 0.014     . 0.012 0.019 0.111     . 0.013
      X 0.048     .     .     .     .     .     .     .     . 0.013 0.009 0.079     .     .     .     .     .     . 0.022     . 0.008 0.030 0.163     .     .
      W 0.007     .     .     .     .     .     .     .     .     .     . 0.019     .     .     .     .     .     . 0.010     . 0.006 0.011 0.013     .     .
      Y 0.002     .     .     .     .     .     .     .     .     .     . 0.005     .     .     .     .     .     .     .     .     . 0.008     .     .     .
      S 0.213 0.063 0.059 0.115 0.156 0.503 0.528 0.530 0.500 0.007 0.024 0.498 0.097 0.035     .     .     .     . 0.085 0.042 0.078 0.169 0.097 0.103     .
      Z 0.173 0.050 0.025     .     . 0.419 0.461     .     . 0.018     . 0.467 0.023 0.011     .     .     . 0.012 0.075 0.019 0.064 0.091 0.052     .     .
      C 0.402 0.200 0.185 0.260 0.320 0.526 0.530 0.456     . 0.068 0.058 0.483 0.372 0.061     .     .     . 0.012 0.141 0.016 0.199 0.367 0.483 0.373 0.031
      2 0.078 0.029 0.018 0.039     . 0.057 0.038     .     . 0.078 0.103 0.005 0.041     .     .     .     . 0.016 0.284 0.009 0.164 0.014 0.023 0.060 0.088
      R 0.150 0.021 0.022 0.093     . 0.057 0.038 0.209 0.500 0.302 0.389 0.009 0.071 0.011     .     .     . 0.090 0.281 0.012 0.265 0.056 0.088 0.225 0.297
      N 0.099 0.011     .     .     . 0.132     . 0.323     . 0.293 0.325 0.007 0.085 0.011     .     .     . 0.073 0.036 0.009 0.194 0.071 0.021     . 0.179
      M 0.090     . 0.010     .     .     . 0.067     . 0.500 0.288 0.297 0.003 0.109 0.011     .     .     . 0.064 0.018 0.009 0.159 0.062 0.028 0.060 0.196
      J 0.016     .     .     .     .     .     .     .     .     .     .     . 0.023     .     .     .     .     .     . 0.009 0.006     .     .     . 0.251
      4 0.234 0.011 0.022 0.068 0.250 0.190 0.038     .     .     . 0.009 0.020 0.041     .     .     .     . 0.006     .     . 0.527     .     .     .     .
      A 0.258     .     . 0.039     .     .     . 0.209     .     .     .     .     . 0.516 0.023 0.077 0.245     .     . 0.491     .     .     . 0.531     .
      E 0.289 0.281 0.418 0.285 0.500 0.098 0.038 0.209     . 0.530 0.526 0.035 0.354 0.158     .     . 0.178 0.236 0.170 0.102 0.323 0.253 0.152 0.060 0.443
      O 0.387 0.387 0.342 0.530 0.528 0.279 0.269 0.209     . 0.049 0.063 0.035 0.109 0.530 0.086 0.203 0.386 0.054 0.018 0.459 0.030 0.091 0.040 0.433 0.076
      8 0.317 0.011 0.032     . 0.156     .     .     .     . 0.145 0.185 0.029 0.041 0.020     .     .     . 0.079 0.476 0.044 0.132 0.014 0.477 0.103 0.094
      9 0.378 0.376 0.237 0.307 0.250 0.279 0.091 0.323     . 0.423 0.441 0.013 0.513 0.393     .     . 0.108 0.299 0.062 0.387 0.397 0.327 0.061 0.291 0.487
      * 0.031     . 0.005 0.039     .     . 0.038     .     . 0.041 0.081 0.009 0.023 0.011 0.018 0.020     . 0.016 0.042 0.012 0.050 0.017 0.015 0.270 0.117
        0.135 0.240 0.025 0.531 0.250 0.098 0.038 0.209     . 0.075 0.118 0.007 0.531 0.035     .     .     . 0.364     . 0.072 0.089 0.179 0.094     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 2.219 1.681 1.405 2.305 2.411 2.639 2.176 2.678 1.500 3.091 3.046 2.547 2.483 1.815 0.126 0.356 0.917 1.323 2.713 1.740 3.015 1.809 2.262 2.879 2.273

  Denis would like me to remove the paragraph-initial lines.
  
  Testing the hypothesis that P,F are (often) HOE,DOE
  
    cat .voyn.cur \
      | tr -d '/= ' \
      | tr 'IGHTUDL56' '*********' \
      | sed \
          -e 's/POE/b/g' \
          -e 's/FOE/v/g' \
      | count-digraph-freqs \
        -vshowentropy=1 \
        -vchars='PFBVbvQXWYSZC2RNMJ4AEO89IGH1TU0D3KL567'

    Next-symbol probability (× 99):

         P  F  B  V  b  v  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  E  O  8  9  *   
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P  .  .  .  .  .  .  .  .  .  .  8  3 42  .  .  .  .  .  . 32  .  2  . 11  1  .
      F  .  .  .  .  .  .  .  .  .  .  4  2 44  .  .  .  .  .  . 38  1  2  .  9  .  .

      B  .  .  .  .  .  .  .  .  .  . 47 13  2  .  .  .  .  .  .  7  . 26  2  4  .  .
      V  .  .  .  .  .  .  .  .  .  . 74  6  3  .  .  .  .  .  .  6  .  9  .  .  .  .

      R  .  .  .  .  .  .  .  .  .  . 14 16  .  .  .  .  .  .  3 16  . 30  2  6  1  7
      N  .  .  .  .  .  .  1  .  .  . 23 20  1  1  .  .  .  .  4  2  . 33  6  2  .  6
      M  .  .  .  .  .  .  .  .  .  . 26 20  .  2  .  .  .  .  4  1  . 29  6  3  .  7
      E  3 13  1  .  .  .  .  .  .  . 21 14  1  2  1  .  .  .  4  3  2 16  7  5  .  6

      O 14 36  2  .  1  1  .  .  .  .  .  .  .  .  8  .  1  .  .  . 32  .  1  1  .  .

      A  .  .  .  .  .  .  .  .  .  .  .  .  .  . 21 25 21  2  .  . 28  .  .  .  2  .
      9  3  3  .  .  .  .  .  .  .  .  6  5  .  3  3  .  .  . 33  .  8 15  7  1  . 11

      b  7  9  2  2  .  .  .  .  .  . 20 18  .  .  .  .  .  .  4  .  . 24 13  .  .  .
      v  .  6  2  .  .  .  .  .  .  . 32 21  .  .  .  .  .  .  .  .  . 24 13  .  .  2

      Q  .  .  .  .  .  .  .  .  .  .  .  . 25  1  .  .  .  .  .  2  .  4  5 61  .  1
      X  .  .  .  .  .  .  .  .  .  .  1  . 26  .  .  .  .  .  .  2  .  1  5 62  .  .
      W  .  .  .  .  .  .  .  .  .  .  .  . 42  .  .  .  .  .  .  9  .  9 14 24  .  .
      Y  .  .  .  .  .  .  .  .  .  .  .  . 50  .  .  .  .  .  .  .  .  . 50  .  .  .
      S  .  1  .  .  .  .  2  4  1  .  .  . 72  .  .  .  .  .  .  2  1  3  7  4  .  .
      Z  .  1  .  .  .  .  2  4  .  .  .  . 80  .  .  .  .  .  .  2  .  3  4  3  .  .
      C  1  2  .  .  .  .  1  2  .  .  .  . 22  1  .  .  .  .  .  1  .  4 44 20  .  .
      2  1  1  .  .  .  .  .  .  .  .  5  5  1  1  .  .  .  .  1 41  1 36  1  3  .  3
      J  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  4  4  .  .  . 90
      4  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  . 97  .  .  .  .
      8  .  .  .  .  .  .  .  .  .  .  1  2  1  .  .  .  .  .  1 15  1  4  . 74  .  .
      *  .  1  1  .  .  .  .  1  .  .  7 12  4  1  1  1  1  .  3 10  3 25  4  5  7 14
         5  1  9  .  1  .  .  .  .  .  2  3  . 18  1  .  .  . 26  .  3  8 13  8  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --


    Previous-symbol probability (× 99):

         P  F  B  V  b  v  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  9  E  O  8  *   
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P  .  .  .  .  .  .  .  .  .  .  4  2  8  .  .  .  .  .  . 13  2  .  .  .  4  .
      F  .  .  .  .  .  .  .  .  .  .  5  3 20  .  .  .  .  .  . 37  4  .  1  .  4  .
      B  .  .  .  .  .  .  .  .  .  .  6  2  .  .  .  .  .  .  .  1  .  .  1  .  .  .
      V  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      b  .  .  1  3  .  .  .  .  .  .  1  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      v  .  .  1  .  .  .  .  .  .  .  1  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Q  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  2  .  .  .  .  .
      X  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  3  .  .  .  .  .
      W  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Y  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      S  1  1  2  3  2  . 25 33 38 50  .  . 24  2  .  .  .  .  .  1  2  1  1  3  2  .
      Z  .  .  .  .  7  . 16 19  .  .  .  . 20  .  .  .  .  .  .  1  1  .  1  1  .  .
      C  5  4  7  9  .  . 32 34 19  .  1  1 22 12  1  .  .  .  .  3 22  .  4 69 12  .
      2  .  .  1  .  2  .  1  .  .  .  1  2  .  1  .  .  .  .  .  8  .  .  3  .  1  1
      R  .  .  2  .  .  2  1  .  5 25  8 13  .  1  .  .  .  .  1  7  1  .  7  1  5  8
      N  .  .  .  .  .  .  2  .  9  .  8 10  .  1  .  .  .  .  1  .  .  .  4  1  .  4
      M  .  .  .  .  .  .  .  1  . 25  8  8  .  2  .  .  .  .  1  .  .  .  3  1  1  4
      J  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  6
      4  .  .  1  6  .  2  4  .  .  .  .  .  .  1  .  .  .  .  .  .  .  . 42  .  .  .
      A  .  .  1  .  .  .  .  .  5  .  .  .  .  . 45 97 94 80  .  .  . 24  .  . 36  .
      E  7 15  7 22  7 11  2  .  5  . 32 30  . 11  3  .  .  4  6  3  3  2  9  6  1 18
      O 67 71 34 40 48 65  7  7  5  .  1  1  .  2 34  1  5 13  1  .  1 55  .  1 17  1
      8  .  .  .  3  .  .  .  .  .  .  3  4  .  1  .  .  .  .  1 21 54  1  3  .  2  2
      9 13  5  9  6 11 15  7  1  9  . 16 17  . 27 14  .  .  2 75  1  1 14 14 10  8 52
      *  .  .  1  .  .  .  .  .  .  .  1  1  .  .  .  .  .  .  .  1  .  .  1  .  7  2
         5  . 36  6 22  4  2  .  5  .  1  2  . 37  .  .  .  . 12  .  2  1  1  4  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

  Perhaps they are POE8/HOE8:

    cat .voyn.cur \
      | tr -d '/= ' \
      | tr 'IGHTUDL56' '*********' \
      | sed \
          -e 's/POE8/b/g' \
          -e 's/FOE8/v/g' \
      | count-digraph-freqs \
        -vshowentropy=1 \
        -vchars='PFBVbvQXWYSZC2RNMJ4AEO89IGH1TU0D3KL567'
        
    Digraph counts:

           TT     P     F     B     V     b     v     Q     X     W     Y     S     Z     C     2     R     N     M     J     4     A     E     O     8     9     *      
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      P   846     .     1     .     .     .     .     .     .     .     .    62    26   341     1     .     .     1     .     .   259     3    56     3    88     5     .
      F  1986     .     .     .     .     .     .     .     .     .     .    72    30   869     .     1     .     2     .     .   736    11    88     3   170     4     .
      B   195     .     .     .     .     .     .     .     .     .     .    92    25     4     .     .     .     .     .     .    13     .    51     3     7     .     .
      V    32     .     .     .     .     .     .     .     .     .     .    24     2     1     .     .     .     .     .     .     2     .     3     .     .     .     .
      b     6     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     2     .     1     .     3     .     .
      v     7     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     6     .     .
      Q   121     .     .     .     .     .     .     .     .     .     .     .     .    31     1     .     .     .     .     .     3     .     5     6    74     .     1
      X   199     .     .     .     .     .     .     .     .     .     .     2     1    53     .     .     .     .     .     .     5     .     3    10   125     .     .
      W    21     .     .     .     .     .     .     .     .     .     .     .     .     9     .     .     .     .     .     .     2     .     2     3     5     .     .
      Y     4     .     .     .     .     .     .     .     .     .     .     .     .     2     .     .     .     .     .     .     .     .     .     2     .     .     .
      S  1453     8    17     4     1     .     .    31    66     8     2     1     3  1053     6     4     .     .     .     .    27    13    49    96    62     2     .
      Z  1078     6     6     .     .     .     .    19    39     .     .     3     .   866     1     1     .     .     .     2    23     5    38    41    28     .     .
      C  4268    38    79    13     3     .     .    39    69     4     .    15     9   953    45     8     .     .     .     2    53     4   175  1898   844    14     3
      2   365     3     4     1     .     .     .     1     1     .     .    18    19     2     2     .     .     .     .     3   150     2   133     4    10     1    11
      R   883     2     5     3     .     .     .     1     1     1     1   123   145     4     4     1     .     .     .    25   147     3   272    22    54     6    63
      N   503     1     .     .     .     .     .     3     .     2     .   117   104     3     5     1     .     .     .    19     9     2   169    30     9     .    29
      M   438     .     2     .     .     .     .     .     2     .     1   114    89     1     7     1     .     .     .    16     4     2   127    25    13     1    33
      J    53     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     2     2     .     .     .    48
      4  1676     1     5     2     2     .     .     5     1     .     .     .     1    10     2     .     .     .     .     1     .     .  1646     .     .     .     .
      A  1952     .     .     1     .     .     .     .     .     1     .     .     .     .     .   405   495   414    43     .     .   552     .     .     .    41     .
      E  2331    64   309    15     8     .     1     2     1     1     .   501   344    19    41    28     .     .     2    96    69    41   377   161   114     1   136
      O  3951   567  1430    67    13     4     4     9    14     1     .    10    10    19     7   305     7    20     7    13     4  1336    15    41    20    19     9
      8  2727     1     8     .     1     .     .     .     .     .     .    41    43    15     2     2     .     .     .    21   414    14    97     4  2050     2    12
      9  3781   106   113    17     2     1     2     9     3     2     .   233   190     6   101   121     .     .     1  1277    18   312   556   266    34     9   402
      *   113     .     1     1     .     .     .     .     1     .     .     8    14     4     1     1     1     1     .     3    11     3    28     5     6     8    16
          763    49     6    71     2     1     .     2     1     1     .    17    23     3   138     4     .     .     .   198     .    26    58   104    59     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 29752   846  1986   195    32     6     7   121   199    21     4  1453  1078  4268   365   883   503   438    53  1676  1952  2331  3951  2727  3781   113   763

    Next-symbol probability (× 99):

          P  F  B  V  b  v  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  E  O  8  9  *   
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P   .  .  .  .  .  .  .  .  .  .  7  3 40  .  .  .  .  .  . 30  .  7  . 10  1  .
      F   .  .  .  .  .  .  .  .  .  .  4  1 43  .  .  .  .  .  . 37  1  4  .  8  .  .
      B   .  .  .  .  .  .  .  .  .  . 47 13  2  .  .  .  .  .  .  7  . 26  2  4  .  .
      V   .  .  .  .  .  .  .  .  .  . 74  6  3  .  .  .  .  .  .  6  .  9  .  .  .  .
      b   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 33  . 17  . 50  .  .
      v   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 14  .  .  . 85  .  .
      Q   .  .  .  .  .  .  .  .  .  .  .  . 25  1  .  .  .  .  .  2  .  4  5 61  .  1
      X   .  .  .  .  .  .  .  .  .  .  1  . 26  .  .  .  .  .  .  2  .  1  5 62  .  .
      W   .  .  .  .  .  .  .  .  .  .  .  . 42  .  .  .  .  .  .  9  .  9 14 24  .  .
      Y   .  .  .  .  .  .  .  .  .  .  .  . 50  .  .  .  .  .  .  .  .  . 50  .  .  .
      S   1  1  .  .  .  .  2  4  1  .  .  . 72  .  .  .  .  .  .  2  1  3  7  4  .  .
      Z   1  1  .  .  .  .  2  4  .  .  .  . 80  .  .  .  .  .  .  2  .  3  4  3  .  .
      C   1  2  .  .  .  .  1  2  .  .  .  . 22  1  .  .  .  .  .  1  .  4 44 20  .  .
      2   1  1  .  .  .  .  .  .  .  .  5  5  1  1  .  .  .  .  1 41  1 36  1  3  .  3
      R   .  1  .  .  .  .  .  .  .  . 14 16  .  .  .  .  .  .  3 16  . 30  2  6  1  7
      N   .  .  .  .  .  .  1  .  .  . 23 20  1  1  .  .  .  .  4  2  . 33  6  2  .  6
      M   .  .  .  .  .  .  .  .  .  . 26 20  .  2  .  .  .  .  4  1  . 29  6  3  .  7
      J   .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  4  4  .  .  . 90
      4   .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  . 97  .  .  .  .
      A   .  .  .  .  .  .  .  .  .  .  .  .  .  . 21 25 21  2  .  . 28  .  .  .  2  .
      E   3 13  1  .  .  .  .  .  .  . 21 15  1  2  1  .  .  .  4  3  2 16  7  5  .  6
      O  14 36  2  .  .  .  .  .  .  .  .  .  .  .  8  .  1  .  .  . 33  .  1  1  .  .
      8   .  .  .  .  .  .  .  .  .  .  1  2  1  .  .  .  .  .  1 15  1  4  . 74  .  .
      9   3  3  .  .  .  .  .  .  .  .  6  5  .  3  3  .  .  . 33  .  8 15  7  1  . 11
      *   .  1  1  .  .  .  .  1  .  .  7 12  4  1  1  1  1  .  3 10  3 25  4  5  7 14
          6  1  9  .  .  .  .  .  .  .  2  3  . 18  1  .  .  . 26  .  3  8 13  8  .  .
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

    Previous-symbol probability (× 99):

          P  F  B  V  b  v  Q  X  W  Y  S  Z  C  2  R  N  M  J  4  A  E  O  8  9  *   
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      P   .  .  .  .  .  .  .  .  .  .  4  2  8  .  .  .  .  .  . 13  .  1  .  2  4  .
      F   .  .  .  .  .  .  .  .  .  .  5  3 20  .  .  .  .  .  . 37  .  2  .  4  4  .
      B   .  .  .  .  .  .  .  .  .  .  6  2  .  .  .  .  .  .  .  1  .  1  .  .  .  .
      V   .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      b   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      v   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Q   .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  2  .  .
      X   .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  3  .  .
      W   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      Y   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      S   1  1  2  3  .  . 25 33 38 50  .  . 24  2  .  .  .  .  .  1  1  1  3  2  2  .
      Z   1  .  .  .  .  . 16 19  .  .  .  . 20  .  .  .  .  .  .  1  .  1  1  1  .  .
      C   4  4  7  9  .  . 32 34 19  .  1  1 22 12  1  .  .  .  .  3  .  4 69 22 12  .
      2   .  .  1  .  .  .  1  .  .  .  1  2  .  1  .  .  .  .  .  8  .  3  .  .  1  1
      R   .  .  2  .  .  .  1  .  5 25  8 13  .  1  .  .  .  .  1  7  .  7  1  1  5  8
      N   .  .  .  .  .  .  2  .  9  .  8 10  .  1  .  .  .  .  1  .  .  4  1  .  .  4
      M   .  .  .  .  .  .  .  1  . 25  8  8  .  2  .  .  .  .  1  .  .  3  1  .  1  4
      J   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  6
      4   .  .  1  6  .  .  4  .  .  .  .  .  .  1  .  .  .  .  .  .  . 41  .  .  .  .
      A   .  .  1  .  .  .  .  .  5  .  .  .  .  . 45 97 94 80  .  . 23  .  .  . 36  .
      E   7 15  8 25  . 14  2  .  5  . 34 32  . 11  3  .  .  4  6  3  2  9  6  3  1 18
      O  66 71 34 40 66 57  7  7  5  .  1  1  .  2 34  1  5 13  1  . 57  .  1  1 17  1
      8   .  .  .  3  .  .  .  .  .  .  3  4  .  1  .  .  .  .  1 21  1  2  . 54  2  2
      9  12  6  9  6 17 28  7  1  9  . 16 17  . 27 14  .  .  2 75  1 13 14 10  1  8 52
      *   .  .  1  .  .  .  .  .  .  .  1  1  .  .  .  .  .  .  .  1  .  1  .  .  7  2
          6  . 36  6 17  .  2  .  5  .  1  2  . 37  .  .  .  . 12  .  1  1  4  2  .  .
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

  Obviously B/V are not POE8/HOE8: the counts are too low and the 
  next-symbol frequencies are all wrong.  POE/HOE is still the best fit.
  But surely there is more to the story...
  
97-10-08 stolfi
===============

   Digraph frequencies ignoring blanks and line breaks,
   and collapsing 'DFT' to 'HPS':

    cat .voyn.fsg \
      | tr -d ' /=\012' \
      | tr 'DFT' 'HPS' \
      | enum-ngraphs -v n=2 \
      | egrep -v '\*' \
      > .voyn-tt-2-r.grm
      
    cat .voyn-tt-2-r.grm \
      | sed -e 's/^\(.\)\(.\)$/\1:\2/g' \
      > .voyn-tt-1-1-r.grm

    cat .voyn-tt-1-1-r.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-1-1-r.frq
     
  Digraph frequencies around line breaks, ignoring spaces:

    cat .voyn.fsg \
      | tr -d ' /=' \
      | tr 'DFT' 'HPS' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=3 \
      | egrep -v '\*' \
      | egrep '^.:.$' \
      > .voyn-nl-1-1-r.grm
  
    cat .voyn-nl-1-1-r.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-1-1-r.frq

  Digraph frequencies around interword blanks (omitting line breaks): 

    cat .voyn.fsg \
      | tr -d '/=\012' \
      | tr 'DFT' 'HPS' \
      | tr -s ' ' ':' \
      | enum-ngraphs -v n=3 \
      | egrep -v '\*' \
      | egrep '^.:.$' \
      > .voyn-sp-1-1-r.grm

    cat .voyn-sp-1-1-r.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-sp-1-1-r.frq

  Now let's do the comparisons. First, line breaks against total occurrences:

    compare-freqs \
        .voyn-tt-1-1-r.frq \
        .voyn-nl-1-1-r.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=8 -v mc=40 \
      | sort +0.0 -0.2r +4 -5nr \
      > .voyn-tt-nl-1-1-r.cmp
      
    cat .voyn-tt-nl-1-1-r.cmp \
      | print-pattern-classes \
          -v rowchars='AI4FPDHCTSZ2L68OKMNREG' \
          -v colchars='A6KLMNIZFC2PEDHSTR4G8O'

          A  6  K  L  M  N  I  Z  C  2  P  E  H  S  R  4  G  8  O
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
     A |  .  . oo  . oo oo oo  .  .  . -? oo  .  . oo  .  .  .  .
     I |  .  . -? -?  .  . oo  .  .  .  . -?  .  . oo  .  .  .  .
     4 |  .  .  .  .  .  .  .  . oo -? -?  . oo -?  . -?  .  . oo
     P | oo  .  .  .  .  .  . oo -?  .  .  .  . oo  .  . -? -? oo
     H | oo  .  .  . -?  . -? oo oo -?  . oo -? oo -?  . oo -? oo
     C | oo -?  . -?  .  .  .  . oo oo oo -? oo oo -? -? oo -- oo
     S | oo  .  .  .  .  .  .  . oo -? oo oo oo -? -? -? oo oo oo
     Z | oo  .  .  .  .  .  .  . oo -?  .  .  . -?  .  . oo oo --
     2 | oo  .  .  .  .  . -?  . -? +? -? -? || --  . +? oo -? --
     L |  .  .  .  .  .  .  .  .  .  .  . -?  . -?  .  . -?  . -?
     6 | -?  .  .  .  .  .  .  .  . -?  .  . -? -?  . +?  . +? -?
     8 | oo  .  . -?  .  . -?  . oo -? -? || || oo -? || oo -? --
     O | -? -? -? -? oo -? -?  . oo ## oo oo oo -- oo || oo oo ||
     K |  .  .  .  .  .  .  .  .  . ## -? -? -? +?  . ## +? +? +?
     M | -?  .  .  .  .  . -?  . -? ## +? -? +? -- -? || -- -- --
     N | -?  .  .  .  .  .  .  . -? ## +? +? -? -- -? || || ++ --
     R | oo -?  .  .  .  . -?  . -? ## ## +? || -- -? || -- || --
     E | oo  . -?  .  .  .  .  . -- || || || -- -- oo || ++ ++ --
     G | oo -? -? -?  .  . -?  . +? || || -- || -- -- ++ || || --
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
          A  6  K  L  M  N  I  Z  C  2  P  E  H  S  R  4  G  8  O

  Now, intra-line spaces against all occurrences:
  
    compare-freqs \
        .voyn-tt-1-1-r.frq \
        .voyn-sp-1-1-r.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=5 -v mc=5 \
      | sort +0.0 -0.2r +4 -5nr \
      > .voyn-tt-sp-1-1-r.cmp

      
    cat .voyn-tt-sp-1-1-r.cmp \
      | print-pattern-classes \
          -v rowchars='AI4FPDHCTSZ2L68OKMNREG' \
          -v colchars='A6KLMNIZFC2PEDHSTR4G8O'

          A  6  K  L  M  N  I  Z  C  2  P  E  H  S  R  4  G  8  O
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
     A |  .  . oo  . oo oo oo  .  .  . +? oo  .  . oo  .  .  .  .
     I |  .  . +? +?  .  . oo  .  .  .  . -?  .  . oo  .  .  .  .
     4 |  .  .  .  .  .  .  .  . oo +? +?  . || +?  . +?  .  . oo
     P | oo  .  .  .  .  .  . oo +?  .  .  .  . --  .  . -? +? --
     H | --  .  .  . +?  . +? oo -- +?  . oo +? -- +?  . -- +? --
     C | oo +?  . +?  .  .  .  . oo oo oo +? oo oo +? +? -- -- --
     S | oo  .  .  .  .  .  .  . oo -? oo oo oo +? +? +? oo -- --
     Z | oo  .  .  .  .  .  .  . oo +?  .  .  . +?  .  . oo oo ++
     2 | ++  .  .  .  .  . +?  . +? +? +? +? ## ||  . +? ++ +? ++
     L |  .  .  .  .  .  .  .  .  .  .  . +?  . +?  .  . +?  . +?
     6 | +?  .  .  .  .  .  .  .  . +?  .  . +? +?  . +?  . +? +?
     8 | --  .  . +?  .  . +?  . oo +? +? || || ++ +? ## -- +? ||
     O | +? -? -? +? oo -? +?  . oo ## oo -- -- || -- ## ++ -- ||
     K |  .  .  .  .  .  .  .  .  . ## +? +? +? +?  . ## +? +? +?
     M | +?  .  .  .  .  . +?  . +? ## +? +? +? ## +? ## ## ## ##
     N | +?  .  .  .  .  .  .  . +? ## +? +? +? ## +? ## ## ## ##
     R | || +?  .  .  .  . +?  . +? ## ## +? ## || +? ## || ## ||
     E | ||  . +?  .  .  .  .  . || || || ## ++ || || ## ++ || ||
     G | ## +? +? +?  .  . +?  . +? ## ## || || || || || ## || ##
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
          A  6  K  L  M  N  I  Z  C  2  P  E  H  S  R  4  G  8  O


  There are some notable differences.
  
    Patterns that are strong space-contexts but weak
    or negligible line break contexts:
    
      [28OMNREG]:[ST]
      
      [2O]:G [N8]:O M:[G8O] R:[GO] E:[CRO] G:[AERO]
      
  Just for the sake of completeness, here is the comparison of spaces
  with line breaks:

    compare-freqs \
        .voyn-sp-1-1-r.frq \
        .voyn-nl-1-1-r.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=2 -v mc=8 \
      | sort +0.0 -0.2r +4 -5nr \
      > .voyn-sp-nl-1-1-r.cmp

      
    cat .voyn-sp-nl-1-1-r.cmp \
      | print-pattern-classes \
          -v rowchars='AI4FPDHCTSZ2L68OKMNREG' \
          -v colchars='A6KLMNIZFC2PEDHSTR4G8O'

          A  6  L  I  C  2  P  E  H  S  R  4  G  8  O
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
     4 |  .  .  .  .  .  . -?  . -?  .  . -?  .  .  .
     P |  .  .  .  . -?  .  .  .  . -?  .  .  .  . -?
     H | -?  .  .  . -?  .  .  .  . -? -?  . -? -? -?
     C |  .  .  .  .  .  .  .  .  .  . -? -? -? -? -?
     S |  .  .  .  .  .  .  .  .  . -?  . -?  . -? -?
     Z |  .  .  .  .  .  .  .  .  . -?  .  .  .  . -?
     2 | oo  .  .  .  . +? -? -? +? oo  . -? -? -? oo
     L |  .  .  .  .  .  .  .  .  . -?  .  .  .  . -?
     6 |  .  .  .  .  . -?  .  . -? -?  . +?  . +? -?
     8 | -?  .  .  .  . -? -? -? +? -?  . -- -? -? oo
     O |  .  .  .  .  . +?  . -? -? -? -? -? -? -? -?
     K |  .  .  .  .  . ## -? -? -? +?  . ## +? +? +?
     M | -?  .  . -? -? ## +? -? +? oo -? || -? oo oo
     N | -?  .  .  . -? ## +? +? -? oo -? -- oo -- oo
     R | oo  .  .  . -? ## ## +? ## -- -? ++ -- ++ --
     E | oo  .  .  . oo || ## -- -- -- oo ++ ## -- --
     G | oo -? -?  . -? ++ || -- ++ -- oo -- || -- --

  Let's write a sed script to split words and syllabes according to 
  the patterns that occur at line breaks.  
  
  I recomputed the ratio by the more generous formula
  
    gawk '\
      { printf "  %5d %5.3f  %5d %5.3f  %5.3f  %s  %s\n",\
          $1, $2, $3, $4, ($3)/($1+2), $6, $7 \
      }'
  
  Then classified them as
  
    ++  very likey a word break     ratio >= 0.200 and NT >= 5
    +?  possibly a word break       ratio >= 0.200 and NT < 5
    ::  very likey a syllabe break  0.200 > ratio >= 0.005 and NL >= 5
    :?  possible syllabe break      0.200 > ratio >= 0.005 and NL < 5
    --  very likely unbreakable     0.005 > ratio and NT >= 80
    -?  possibly unbreakable        0.005 > ratio and NT < 80
    
  Result is in .voyn-tt-nl-1-1-r-hand.cmp
  
    cat .voyn-tt-nl-1-1-r-hand.cmp \
      | print-pattern-classes \
          -v rowchars='AI4FPDHCTSZ2L68OKMNREG' \
          -v colchars='A6KLMNIZFC2PEDHSTR4G8O'
    
          A  6  K  L  M  N  I  Z  C  2  P  E  H  S  R  4  G  8  O
         -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
     A |  .  . -?  . -- -- -?  .  .  . -? --  .  . --  .  .  .  .
     I |  .  . -? -?  .  . -?  .  .  .  . -?  .  . -?  .  .  .  .
     4 |  .  .  .  .  .  .  .  . -? -? -?  . -? -?  . -?  .  . --
     P | -?  .  .  .  .  .  . -? -?  .  .  .  . --  .  . -? -? -?
     H | --  .  .  . -?  . -? -- -- -?  . -? -? -- -?  . -- -? --
     C | -? -?  . -?  .  .  .  . -- -? -? -? -- -? :? +? -- -- --
     S | -?  .  .  .  .  .  .  . -- -? -? -? -- -? -? -? -- -- --
     Z | -?  .  .  .  .  .  .  . -- -?  .  .  . -?  .  . -- -? :?
     2 | --  .  .  .  .  . -?  . -? ++ +? -? :? :?  . ++ -? -? :?
     L |  .  .  .  .  .  .  .  .  .  .  . -?  . -?  .  . -?  . -?
     6 | -?  .  .  .  .  .  .  .  . +?  .  . +? +?  . +?  . +? -?
     8 | --  .  . -?  .  . -?  . -? +? -? :? :? -- -? :? -- :? :?
     O | -? -? -? -? -? -? -?  . -? ++ -- -- -- :? -- :? -? -? :?
     K |  .  .  .  .  .  .  .  .  . ++ +? -? +? +?  . ++ +? ++ ++
     M | -?  .  .  .  .  . -?  . -? ++ +? +? ++ :? -? ++ :? :? :?
     N | -?  .  .  .  .  .  .  . -? ++ ++ +? :? :? -? :: ++ :? :?
     R | -- -?  .  .  .  . -?  . -? ++ ++ ++ ++ :? +? ++ :? ++ ::
     E | -?  . -?  .  .  .  .  . :? ++ ++ :: :: :? -? ++ :: :: ::
     G | -? -? -? -?  .  . -?  . ++ ++ ++ :: :: :: :? :: ++ :: ::

  Here are the rules ("+" means word split, ":" means syllabe, "-" means
  no break).
  
    .-[A6KLMNIZ]
    [AI4FPDHCSTZ]-.
    [2MNRE]-G
    [L8O]-[CFPEDHSTR4G8O]
    
    [26KMNREG]+[FP]
    [2L68OKMNREG]+2
    [G]+[CFPG]
    [MNR]+[EDHG]
    [MER2]+[4]
    [R]+[8R]
    
    [2]:[FPEDHSTRG8O]
    [MNR]:[ST]
    [E]:[EDHSTG8]
    [G]:[EDHSTR8]
    [MN]:[8]
    {MNREG]:[O]

    cat .voyn.fsg \
      | tr -d '/= ' \
      | sed -e 's/\(.\)/\1 /g' \
      | split-by-nl-patterns \
      | split-by-nl-patterns \
      | tr -d ' \-' \
      | tr '+' ' ' \
      > .voyn-nl-split.fsg

  Global tetragram frequencies, ignoring line breaks and word spaces:

    cat .voyn.fsg \
      | tr -d ' /=\012' \
      | enum-ngraphs -v n=4 \
      | egrep -v '\*' \
      > .voyn-tt-4.grm
      
    cat .voyn-tt-4.grm \
      | sed -e 's/^\(..\)\(..\)$/\1:\2/g' \
      > .voyn-tt-2-2.grm

    cat .voyn-tt-2-2.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-2-2.frq

97-10-09 stolfi
===============
  
  Decided to create another error-tolerant encoding even more 
  "lossy" than HOP.  This one collapses FSG A with O, 
  R with 2, S with T.  Also ignore spaces (periods):
  
    --- fsg2ecc ------------------------
    #! /n/gnu/bin/gawk -f

    # Recoding an interlinear file from the FSG alphabet to 
    # my Super-Lossy Fault-Tolerant encoding

    BEGIN {
      print "# Output of fsg2ecc - Stolfi's Semi-Analytic Fault-Tolerant alphabet"
    }

    /^ *$/ { print; next }
    /^ *#/ { print; next }
    /^<[^>.;]*>/ { print; next }

    /^<[^>]*\.[^>]*;[A-Z]> / {
      curtxt = substr($0,20)

      # We discard  "%" and "!" since the conversion
      # will destroy synchronism anyway.
      gsub(/[%!]/, "", curtxt);

      # We also discard spaces ("." in the evt format),
      # since they are not reliable
      gsub(/[.]/, "", curtxt);

      # First, the conversion from FSG to JSA (Stolfi's super-analytic)
      gsub(/IIIK/, "iiiij",  curtxt);
      gsub(/IIIL/, "iiiiu",  curtxt);
      gsub(/IIIR/, "iiiis",  curtxt);
      gsub(/IIIE/, "iiiix",  curtxt);
      gsub(/IIE/,  "iiix",   curtxt);
      gsub(/IIR/,  "iiis",   curtxt);
      gsub(/IIK/,  "iiij",   curtxt);
      gsub(/HZ/,   "cqjc",   curtxt);
      gsub(/PZ/,   "cqgc",   curtxt);
      gsub(/DZ/,   "cljc",   curtxt);
      gsub(/FZ/,   "clgc",   curtxt);
      gsub(/IE/,   "iix",    curtxt);
      gsub(/IR/,   "iis",    curtxt);
      gsub(/IK/,   "iij",    curtxt);
      gsub(/2/,    "cs",     curtxt);
      gsub(/4/,    "q",      curtxt);
      gsub(/6/,    "cj",     curtxt);
      gsub(/7/,    "ig",     curtxt);
      gsub(/8/,    "cg",     curtxt);
      gsub(/A/,    "ci",     curtxt);
      gsub(/C/,    "c",      curtxt);
      gsub(/D/,    "lj",     curtxt);
      gsub(/E/,    "ix",     curtxt);
      gsub(/F/,    "lg",     curtxt);
      gsub(/G/,    "cy",     curtxt);
      gsub(/H/,    "qj",     curtxt);
      gsub(/I/,    "i",      curtxt);
      gsub(/K/,    "ij",     curtxt);
      gsub(/L/,    "iu",     curtxt);
      gsub(/M/,    "iiiu",   curtxt);
      gsub(/N/,    "iiu",    curtxt);
      gsub(/O/,    "o",      curtxt);
      gsub(/P/,    "qg",     curtxt);
      gsub(/R/,    "is",     curtxt);
      gsub(/S/,    "cc",     curtxt);  # Was "csc" in JSA
      gsub(/T/,    "cc",     curtxt);
      gsub(/V/,    "?",      curtxt);
      gsub(/Y/,    "?",      curtxt);

      # Now, the conversion from JSA to ECC:

      gsub(/[ql]j/, "H",     curtxt);
      gsub(/[ql]g/, "P",     curtxt);
      gsub(/ij/,    "k",     curtxt);
      gsub(/ii*x/,  "e",     curtxt);
      gsub(/is/,    "r",     curtxt);
      gsub(/iiu/,   "n",     curtxt);
      gsub(/y/,     "i",     curtxt);
      gsub(/ci/,    "a",     curtxt);
      gsub(/cg/,    "8",     curtxt);
      gsub(/cs/,    "r",     curtxt);
      gsub(/ii*r/,  "w",     curtxt);
      gsub(/i*n/,   "m",     curtxt);
      gsub(/a/,     "o",     curtxt);

      print (substr($0,1,19) curtxt);
      next
    }
    ------------------------------------
  
    cat bio-m-evt.evt \
      | fsg2ecc \
      > bio-m-ecc.evt
      
    cat bio-m-ecc.evt \
      | make-consensus-interlin \
      > bio-x-ecc.evt
  
    cat bio-x-ecc.evt \
      | egrep '^<.*;J> ' \
      | sed \
          -e 's/{[^}]*}//g' \
      > bio-j-ecc.evt

    extract-words-from-interlin \
        -chars "8coqHPemrwk" \
        bio-j-ecc.evt \
        bio-j-ecc

     lines   words     bytes file        
    ------ ------- --------- ------------
      1605    1605     35644 bio-j-ecc.wds
       767     767     33204 bio-j-ecc.dic
       333     333     13811 bio-j-ecc-gut.wds
       333     333     13811 bio-j-ecc-gut.dic
       840     840      2445 bio-j-ecc-fun.wds
         2       2         5 bio-j-ecc-fun.dic
       432     432     19388 bio-j-ecc-bad.wds
       432     432     19388 bio-j-ecc-bad.dic
       
  Here are the statistics.  Keep in mind that 
  spaces were deleted, and here " " means line break.

    Digraph counts:

           TT           8     c     o     q     H     P     e     m     r     w     k
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
          333     .    39    15    51    89    24    38    11     .    66     .     .
      8  1166     4     2    92  1052     9     2     .     4     .     1     .     .
      c  4351     1   909  2389   585     1   183    18   232     3    30     .     .
      o  3864   189   113   211   261   576   972    41   683   402   384    10    22
      q   728     .     .    10   718     .     .     .     .     .     .     .     .
      H  1347     .     2   853   484     .     .     .     5     1     2     .     .
      P   109     .     1    75    33     .     .     .     .     .     .     .     .
      e   958    64    67   360   224    29   162    10    18     .    24     .     .
      m   406    24    24   188   148    13     1     .     2     .     6     .     .
      r   517    31     9   153   302    11     3     2     3     .     3     .     .
      w    10     .     .     5     5     .     .     .     .     .     .     .     .
      k    22    20     .     .     1     .     .     .     .     .     1     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 13811   333  1166  4351  3864   728  1347   109   958   406   517    10    22

    Next-symbol probability (× 99):

             8  c  o  q  H  P  e  m  r  w  k
         -- -- -- -- -- -- -- -- -- -- -- --
          . 12  4 15 26  7 11  3  . 20  .  .
      c   . 21 54 13  .  4  .  5  .  1  .  .
      o   5  3  5  7 15 25  1 17 10 10  .  1
      8   .  .  8 89  1  .  .  .  .  .  .  .
      q   .  .  1 98  .  .  .  .  .  .  .  .
      H   .  . 63 36  .  .  .  .  .  .  .  .
      P   .  1 68 30  .  .  .  .  .  .  .  .
      w   .  . 50 50  .  .  .  .  .  .  .  .
      e   7  7 37 23  3 17  1  2  .  2  .  .
      m   6  6 46 36  3  .  .  .  .  1  .  .
      r   6  2 29 58  2  1  .  1  .  1  .  .
      k  90  .  .  5  .  .  .  .  .  5  .  .
         -- -- -- -- -- -- -- -- -- -- -- --
    TOT   2  8 31 28  5 10  1  7  3  4  0  0
    
  Note that "e", "m", and "r" have become more similar.
  It is curious that "8" and "q" have very similar 
  next-symbol statistics.  Also curious that P and H 
  become identical...

    Previous-symbol probability (× 99):

        TT     w  k  m  e  H  P  q  r  8  c  o
        -- -- -- -- -- -- -- -- -- -- -- -- --
         2  .  .  .  .  1  2 35 12 13  3  .  1
      o 28 56 99 99 98 71 71 37 78 74 10  5  7
      c 31  .  .  .  1 24 13 16  .  6 77 54 15
      8  8  1  .  .  .  .  .  .  1  .  .  2 27
      q  5  .  .  .  .  .  .  .  .  .  .  . 18
      H 10  .  .  .  .  1  .  .  .  .  . 19 12
      P  1  .  .  .  .  .  .  .  .  .  .  2  1
      e  7 19  .  .  .  2 12  9  4  5  6  8  6
      m  3  7  .  .  .  .  .  .  2  1  2  4  4
      r  4  9  .  .  .  .  .  2  1  1  1  3  8
      w  0  .  .  .  .  .  .  .  .  .  .  .  .
      k  0  6  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- --

    Symbol entropy: 2.693
    
  An encouraging sign: with this encoding, all labels in f77v can be found in
  the text of the bio section, hand B.

  Let's try to discern word/syllabe boundaries from the 
  line breaks, in this reduced encoding:

    cat bio-j-ecc-gut.wds \
      | tr -d '\012' \
      | enum-ngraphs -v n=2 \
      | egrep -v '\*' \
      > .bio-j-ecc-tt-2.grm
      
    cat .bio-j-ecc-tt-2.grm \
      | sed -e 's/^\(.\)\(.\)$/\1:\2/g' \
      > .bio-j-ecc-tt-1-1.grm

    cat .bio-j-ecc-tt-1-1.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .bio-j-ecc-tt-1-1.frq
     
  Digraph frequencies around line breaks, ignoring spaces:

    cat bio-j-ecc-gut.wds \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=3 \
      | egrep -v '\*' \
      | egrep '^.:.$' \
      > .bio-j-ecc-nl-1-1.grm
  
    cat .bio-j-ecc-nl-1-1.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .bio-j-ecc-nl-1-1.frq

    compare-freqs \
        .bio-j-ecc-tt-1-1.frq \
        .bio-j-ecc-nl-1-1.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=10 -v mc=40 \
      | sort +0.0 -0.2r +4 -5nr \
      > .bio-j-ecc-tt-nl-1-1.cmp
     
    cat .bio-j-ecc-tt-nl-1-1.cmp \
      | print-pattern-classes \
          -v rowchars='co8qHPwemrk' \
          -v colchars='co8qHPwemrk'
          
    Pattern classes:

          c  o  8  q  H  P  w  e  m  r  k
         -- -- -- -- -- -- -- -- -- -- --
     c | -- -- -- -? -- --  . -- -? --  .
     q | -- --  .  .  .  .  .  .  .  .  .
     H | -- -- -?  .  .  .  . -? -? -?  .
     P | -- -- -?  .  .  .  .  .  .  .  .
     w | -? -?  .  .  .  .  .  .  .  .  .
     8 | -- -- -? || -? -?  . -?  . -?  .
     o | -- || || ++ -- || -- -- -- ++ --
     e | -- -- || || -- ||  . ||  . ||  .
     m | -- -- || || -? +?  . +?  . ##  .
     r | -- -- || || +? +?  . -?  . +?  .
     k | -? +? +? +?  . +?  . -?  . +?  .

  Fixing the count ratio and classification as in previous manual
  classification experiment:
  
    --- compute-count-ratio-new ------------------------
    #! /n/gnu/bin/gawk -f
    # 
    # Usage: "$0 -v nmin=NNN -v mw=N.NNN mc=N.NNN
    #
    # Computes the ratio of two counts for a list of patterns.
    # The input must be the output of compare-freqs, in the 
    # format " NT FT  NL FL  patt", where "NT","NL" are
    # two counts, and "FT","FL" the corresponding relative 
    # frequencies.  The output will have the format
    # " NT FT  NL FL  rat mk patt" where "rat=(NL)/(NT+2)".
    #
    # The "mk" field is a class code, assigned based on the 
    # ratio and its certainty, and the parameters "mw", "mc",
    # and "nmin", as follows:

    function classify(NT, NL, ratio, nmin, mw, mc)
    {
      if (ratio >= 1.0/mw) 
        { if (NT >= nmin) 
            { return "++" }  # Probably word break
          else
            { return "+?" }  # unimportant but looks more like a word break
        }
      else if (ratio >= 0.005)
        { if (NL >= nmin)
            { return "::" }  # possible syllabe break
          else
            { return ":?" }  # uncertain but looks more like syllabe break
        }
      else 
        { if (2*NT < mc) 
            { return "??" }  # too rare, can't tell
          else if (NT < 2*mc) 
            { return "-?" }  # uncertain but looks more like non-break
          else 
            { return "--" }  # non-break
        }
    }

    /^##/ { 
      $0 = substr($0, 3);
      printf "##%11.11s  %11.11s  RelFr  MK  %s\n", $1, $2, $3; next
    }

    /^# / { 
      $0 = substr($0, 3);
      printf "# %11.11s  %11.11s  -----  --  %s\n", $1, $2, $3; next
    }

    /[0-9]\.[0-9]/ { 
      if (mw == 0)   { print "must define mw" > "/dev/stderr"; exit 1; }
      if (mc == 0)   { print "must define mc" > "/dev/stderr"; exit 1; }
      if (nmin == 0) { print "must define nmin" > "/dev/stderr"; exit 1; }
      NT = $1
      NL = $3
      rat = (NL/(NT+2));
      mark = classify(NT, NL, rat, nmin, mw, mc)
      printf "  %5d %5.3f  %5d %5.3f %6.3f  %s  %s\n", $1, $2, $3, $4, rat, mark, $5;
      next
    }
    ----------------------------------------------------
      
    compare-freqs \
        .bio-j-ecc-tt-1-1.frq \
        .bio-j-ecc-nl-1-1.frq \
      | compute-count-ratio-new \
          -v nmin=5 -v mw=8 -v mc=40 \
      | sort +0.0 -0.2r +4 -5nr \
      > .bio-j-ecc-tt-nl-1-1-new.cmp
     
    cat .bio-j-ecc-tt-nl-1-1-new.cmp \
      | print-pattern-classes \
          -v rowchars='qHPwco8rekm' \
          -v colchars='mwkco8eHPqr'

          m  w  k  c  o  8  e  H  P  q  r
         -- -- -- -- -- -- -- -- -- -- --
     q |  .  .  . ?? --  .  .  .  .  .  .
     H | ??  .  . -- -- ?? ??  .  .  . ??
     P |  .  .  . -? -? ??  .  .  .  .  .
     w |  .  .  . ?? ??  .  .  .  .  .  .
     c | ??  .  . -- -- -- -- -- ?? +? -?
     o | -- ?? -? :: :: ++ :: :: ++ :: ::
     8 |  .  .  . -- -- ?? ?? ?? +? ++ +?
     r |  .  .  . :? :? ++ ?? ++ ++ ++ ++
     e |  .  .  . :? :: :: ++ :? ++ ++ ++
     k |  .  .  . +? +? +? +?  . +? ++ ++
     m |  .  .  . -- :? :? +? ?? +? ++ ++

  Non-breaks:
  
    [qHPw]:.
    .:[mwk]
    [c]:[co8eHPr]
    [8]:[co]
    [m]:[c]
    
  "Word" breaks: 
  
    [8rk]:[8]
    [8erkm]:[eHPqr]
    [o]:[8P]
    [k]:[co]
    
  Possible "Syllabe" breaks:
  
    all else.
    
  Recomputing with mw=5 instead of 8:
  
    compare-freqs \
        .bio-j-ecc-tt-1-1.frq \
        .bio-j-ecc-nl-1-1.frq \
      | compute-count-ratio-new \
          -v nmin=5 -v mw=5 -v mc=40 \
      | sort +0.0 -0.2r +4 -5nr \
      > .bio-j-ecc-tt-nl-1-1-new.cmp
     
    cat .bio-j-ecc-tt-nl-1-1-new.cmp \
      | print-pattern-classes \
          -v rowchars='qHPwco8rekm' \
          -v colchars='mwkco8eHPqr'


          m  w  k  c  o  8  e  H  P  q  r
         -- -- -- -- -- -- -- -- -- -- --
     q |  .  .  . ?? --  .  .  .  .  .  .
     H | ??  .  . -- -- ?? ??  .  .  . ??
     P |  .  .  . -? -? ??  .  .  .  .  .
     w |  .  .  . ?? ??  .  .  .  .  .  .
     c | ??  .  . -- -- -- -- -- ?? +? -?
     o | -- ?? -? :: :: :: :: :: ++ :: ::
     8 |  .  .  . -- -- ?? ?? ?? +? :? +?
     e |  .  .  . :? :: :: :? :? ++ ++ ++
     r |  .  .  . :? :? ++ ?? ++ ++ ++ ++
     k |  .  .  . +? +? +? +?  . +? ++ ++
     m |  .  .  . -- :? :? +? ?? +? ++ ++


  Non-breaks:
  
    [qHPw]:.
    .:[mwk]
    [c]:[Pr]
    [8]:[co]
    [m]:[c]
    
  "Word" breaks: 
  
    [8erkm]:[eHPqr]
    [8]:[8]
    [rkm]:[o8]
    [k]:[c]
    
  Possible "Syllabe" breaks:
  
    all else (should check digraphs).
    
  Overall tetragram frequencies: 

    cat bio-j-ecc-gut.wds \
      | tr -d ' \012' \
      | enum-ngraphs -v n=4 \
      | egrep -v '\*' \
      | sed \
          -e 's/^\(..\)\(..\)$/\1:\2/g'  \
      > .bio-j-ecc-gut-tt-2-2.grm

    cat .bio-j-ecc-gut-tt-2-2.grm \
      | egrep -v '[qHPw]:.|.:[mwk]|[c]:[co8eHPr]|[8]:[co]|[m]:[c]' \
      | egrep -v '[8rk]:[8]|[8erkm]:[eHPqr]|[o]:[8P]|[k]:[co]' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .bio-j-ecc-gut-tt-2-2.frq

  Tetragram frequencies around line breaks, ignoring spaces:

    cat bio-j-ecc-gut.wds \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=5 \
      | egrep -v '\*' \
      | egrep '^..:..$' \
      > .bio-j-ecc-gut-nl-2-2.grm
  
    cat .bio-j-ecc-gut-nl-2-2.grm \
      | egrep -v '[qHPw]:.|.:[mwk]|[c]:[co8eHPr]|[8]:[co]|[m]:[c]' \
      | egrep -v '[8rk]:[8]|[8erkm]:[eHPqr]|[o]:[8P]|[k]:[co]' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .bio-j-ecc-gut-nl-2-2.frq

  Comparisons:

    compare-freqs \
        .bio-j-ecc-gut-tt-2-2.frq \
        .bio-j-ecc-gut-nl-2-2.frq \
      | compute-count-ratio-new \
          -v nmin=5 -v mw=8 -v mc=40 \
      | sort +0.0 -0.2r +4 -5nr \
      > .bio-j-ecc-gut-tt-nl-2-2-new.cmp

    cat .bio-j-ecc-gut-tt-nl-2-2-new.cmp \
      | print-pattern-classes

         oc  cc  8o 8c oH oP oe or om o8 oq oo ok ow qo qc ro rq Ho Hc eo ec rc e8 eq er ee eH eP r8 rH rP re rr ce cH cP cm co He H8 Hm 8P 8e 8r
         --  --  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    oo |  .  ??   .  . ??  . ??  .  .  .  .  .  .  .  .  . ??  . -? -- ?? -? ?? ?? ?? ?? ?? -? ??  .  . ??  .  .  . ??  .  .  .  . ??  .  .  .  .
    qo |  .  ??   .  . ??  . ?? ??  .  .  .  .  .  . ??  . ?? ?? -- -- ?? -? ?? ?? ?? ?? ?? ?? ??  .  .  .  .  .  . ??  .  .  . ??  .  .  .  .  .
    ko |  .  ??   .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    wo |  .   .   .  .  .  .  .  .  .  .  .  .  .  .  .  . ??  . ?? ??  .  .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
                                                                                                                                                 
    Ho | +?  ++   .  . :? ?? ?? +?  .  .  .  .  .  . ++ +? :? ?? ?? +? ?? ?? -? ??  .  .  . ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Po |  .  ??   .  .  .  .  .  .  .  .  .  .  .  .  .  . ++  . ?? ?? ?? ?? ??  .  .  .  . ?? ??  . ??  .  .  .  . ??  .  .  .  .  .  .  .  .  .
    eo | +?  ++   .  . +?  . +?  . ??  .  .  .  .  . ++  . ++ ?? :? -? ?? -? ?? ?? ?? ?? ?? ??  . ??  . ??  . ??  .  .  .  .  .  .  .  .  .  .  .
    mo |  .  ??   .  .  .  .  .  .  .  .  .  .  .  . +?  . +? +? -? -? ?? ?? ?? ?? ?? ?? ?? ?? ??  .  .  .  .  .  . ??  .  .  .  .  . ??  .  .  .
    ro | +?  ??   .  . +?  . +?  . ?? +?  .  .  .  . +?  . :? ?? ?? :? :? -? ?? ?? ?? ?? ?? -? ?? ?? ?? ?? ?? ??  . ??  .  .  . ??  .  .  .  .  .
    8o | ++  :?   .  . :: ?? :? ?? ?? +?  . ??  .  . :: ?? ++ ?? ++ ++ ?? :? :? ?? ?? ??  . ?? ?? ?? ?? ??  . ?? ?? ?? +?  .  .  .  .  .  .  .  .
    co | +?  :?   .  . :? ?? :? ??  . ??  . ?? ??  . :: ?? :? ?? :? :? -? :? :? ?? ?? ?? ?? -? ?? ?? ?? ??  . ??  . ?? ??  .  .  .  .  .  .  .  .
                                                                                                                                                 
    oe | ++  --  :? ++ :? ?? :? -? ?? ?? ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ?? ?? ??  .  .  .  .  .  .  .  .
    om | ++   .  :? :? :? ?? -? ??  . ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ?? ??
    or | ++  :?   .  . -? ?? :? -? -? ?? ?? ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . -? ?? ?? ??  .  .  .  .  .  .  .
    ce | +?  :?  :? :? ?? ?? :? ?? ?? ?? ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ?? ??  .  .  .  .  .  . ??  .  .
                                                                                                                                                 
    Hc |  .   .   .  .  .  .  .  .  .  .  .  .  .  . +?  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
                                                                                                                                                 
    cr |  .  ??   .  . ??  . ?? ?? ??  . ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ??  .  .  . ??  .  .  .  .  .  .
    er |  .  ??   .  .  .  . ?? ?? ??  .  . +?  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .
    kr |  .   .   .  .  .  . ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    mr |  .   .   .  .  .  . ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .
    rr |  .  ??   .  .  .  . ?? ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    8e |  .  ??   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    He |  .  ??   .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    cc |  .   .   .  .  .  .  .  .  .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    ee |  .  ??   .  .  .  . ?? ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    ke |  .   .   .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    me |  .  ??   .  .  .  . ??  .  . ??  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    re |  .  ??   .  .  .  .  .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Hm |  .   .   .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    cm |  .   .   .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    8r |  .   .   .  .  .  . ??  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
    Hr |  .   .   .  .  .  .  .  . ??  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
         --  --  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
         oc  cc  8o 8c oH oP oe or om o8 oq oo ok ow qo qc ro rq Ho Hc eo ec rc e8 eq er ee eH eP r8 rH rP re rr ce cH cP cm co He H8 Hm 8P 8e 8r
    
    
  Note that :oH resembles :cH, could it be a mistreading?
  
  From this table, the only reasonably certain entries are
  
    "Word" boundary:     
      eo:cc eo:ro eo:qo
      Ho:cc Ho:qo 
      Po:ro
      8o:oc 8o:ro 8o:Ho 80:Hc 
      oe:oc oe:8c 
      om:oc 
      or:oc
    
    Non-boundary:  
      oo:Hc
      qo:Ho qo:Hc
      oe:cc
    
    "Syllabe" boundary:  
      8o:qo  8o:oH
      co:qo 
  
  We could extend these to "don't care" cases as follows:
  
    "Word" boundary:                  
      [HPerm8c]o:o[crm8]
      [HPemr]o:(cc|qo|qc)
      [emr]o:o[HPeqokw]
      [Pem8o]o:r[oq]
      8o:H[oc]
      Ho:Hc
      (oe|om|or|ce):oc
      oe:8c
      Hc:qo

    "Syllabe" boundary:
       [HP8c]o:o[HPeqokw] 
       [8c]o:(cc|qo|qc|ec|rc)
       Ho:ro
       eo:Ho
       ro:(ro|Hc|eo)
       co:(r[oq]|H[oc])
       (om|or|ce):(cc|8o|8c)
       oe:8o
       o[em]:oH
       (oe|or|ce):oe
          
    Non-break:
      ([cekmr8H]r|oo|qo|ko|wo):..
      ..:(e[8qreHP]|r[8HPer]|c[eHPmo]|8[Per])
      ([HPem]o|oe|or|om|ce):(eo|ec|rc|8o|8c)
      [r8c]o:8[oc]
      (oe|om|or|ce):(o[Pr8mqokw]|q[oc]|r[oq]|H[oc])
      (ro|Ho):(rq|Ho)
      (mo|Po):(Ho|Hc)
      ro:(ec|rc)
      co:eo
      eo:Hc
      om:oe
      or:oH
      ce:oH
      oe:cc

    cat bio-j-ecc-gut.wds \
      | sed -e 's/\(.\)/\1 /g' -e 's/ $//g' \
      | split-ecc-by-nl-patterns \
      | split-ecc-by-nl-patterns \
      | tr -d ' \-' | tr '+:' ' \-' \
      > .bio-j-ecc-gut-split.ecc
    
  Here is a sample of the result:
  
    8ocHcoe Hok ooHcco-eccco-Hce-8o-ccco-oHccco-qoHcc8o
    Pccc8o-qoHcc8o-oHomccc8o-qoHor-ccoe-oeccc8o-qoHo
    Pccc8o Hcc8o-qoHc8o-qoHc8o-qoHc8o-qoHc8o-qoHomoeccc8o
    rom qoHom qoe Hccoeo romccc8o r-o-eor-ccc8o-oHcc8o-qoHo
    Pccc8o-r-cccPcco-eccc8o ro 8ce-ccce-cco-Hoeccc8o-qoHok
    roecccc8o-qoeccc8o-qoe-o-Homccor ro-r-o-eo
    qoHccc8o-qoeccco-qoHo cccocHcco-qoHomor
    qoHomoe Hcco-qoe Ho-ro-romccccHcoeo r-oe
    8omoecccoe-8omoe qoeo 8o ro 8o
    Hccc8o Pccc8o-qoHcco-r-o-e-oe-8owccccHco-qoe ecccc8o-qoHcc8oe-oeccc8o
    qo 8omccccHo qoHco-qoHomcccHo qoHce-8omccc8o-oHce-oeccc8o-oHo-r-o-eok
    roe Hc8o-oHce-8o roHo-oHo-roHo-r-oe Homoe Hc8o
    qoHc8o 8o-ccccHo qoHc8o-qoHcc8o-qoHccc8oe-oe
    qoHcc8o-qoHcc8o-qoHc8o-qoHc8o-qoHcc8oe-8o
    occc8o-qoHcc8o-qoHcc8o-oe Hcc8o-oHco-Hoe-8o
    8ccc8o-qoHc8o-qoHcc8o-qoHcco-qoHcc8o 8or
    occc8o-cccHo-r-oe-8o-qoHomccHo-roHo-r-oe-8o
  
  Ditto, without "-"s:
  
    8ocHcoe Hok ooHccoecccoHce8occcooHcccoqoHcc8o
    Pccc8oqoHcc8ooHomccc8oqoHorccoeoeccc8oqoHo
    Pccc8o Hcc8oqoHc8oqoHc8oqoHc8oqoHc8oqoHomoeccc8o
    rom qoHom qoe Hccoeo romccc8o roeorccc8ooHcc8oqoHo
    Pccc8orcccPccoeccc8o ro 8ceccceccoHoeccc8oqoHok
    roecccc8oqoeccc8oqoeoHomccor roroeo
    qoHccc8oqoecccoqoHo cccocHccoqoHomor
    qoHomoe Hccoqoe HororomccccHcoeo roe
    8omoecccoe8omoe qoeo 8o ro 8o
    Hccc8o Pccc8oqoHccoroeoe8owccccHcoqoe ecccc8oqoHcc8oeoeccc8o
    qo 8omccccHo qoHcoqoHomcccHo qoHce8omccc8ooHceoeccc8ooHoroeok
    roe Hc8ooHce8o roHooHoroHoroe Homoe Hc8o
    qoHc8o 8occccHo qoHc8oqoHcc8oqoHccc8oeoe
    qoHcc8oqoHcc8oqoHc8oqoHc8oqoHcc8oe8o
    occc8oqoHcc8oqoHcc8ooe Hcc8ooHcoHoe8o
    8ccc8oqoHc8oqoHcc8oqoHccoqoHcc8o 8or
    occc8occcHoroe8oqoHomccHoroHoroe8o