Let's confirm the split rules based on glyphs by comparing the
  global distribution of letter groups (spaces omitted) with the
  distribution of letter groups that straddle or are adjacent to the
  line breaks.
  
    cat bio-m-evt.evt \
      | grep ';C>' \
      | sed \
          -e 's/{[^}]*}//g' \
          -e 's/[\!%]//g' \
      > .tmp-c-fsg.evt
      
    extract-words-from-interlin \
        -chars 'COG8EDA4TSHRNM2ZPIKLF6' \
        .tmp-c-fsg.evt \
        .tmp-c-fsg
      
     lines   words     bytes file        
    ------ ------- --------- ------------
      7227    7227     38289 .tmp-c-fsg.wds
      1699    1699     10717 .tmp-c-fsg.dic
      6420    6420     35789 .tmp-c-fsg-gut.wds
      1665    1665     10530 .tmp-c-fsg-gut.dic
       807     807      2500 .tmp-c-fsg-fun.wds
        34      34       187 .tmp-c-fsg-fun.dic
         0       0         0 .tmp-c-fsg-bad.wds
         0       0         0 .tmp-c-fsg-bad.dic


    Digraph counts:

           TT           C     O     G     8     E     D     A     4     T     S     H     R     N     M     2     Z     P     I     K     L     F     6
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
         6420     .    24  1363   138   514   365   108   133  1646   761   692   153   134     .     .   278     .    99     1     .     2     8     1
      C  4278     7   951   172   838  1895     4   155    55     1    15     9    80     8     .     8    45     .    17    11     .     2     3     2
      O  3895    34    19     4    13    31  1342  1430     3     8     7     9   567   300     7    14     7     .    68     9     7     1    13     2
      G  3764  3510     1     7     .    17    21    71     2    10    20    25    55    14     .     .     6     .     1     1     1     .     2     .
      8  2728    73    19    72  2045     2    10     8   418     1    36    38     1     2     .     .     1     .     .     1     .     1     .     .
      E  2349  1085     9   159   106    84     7   270    55     2   306   181    37    13     .     .    16     .    11     .     2     .     6     .
      D  2185    14   871    79   169     2    11     .   740     .    69    28     .     .     .     1     .   198     .     3     .     .     .     .
      A  1969     6     .     5     4     8   551     4     1     .     .     1     4   394   471   399     7     .     2    51    43    12     .     6
      4  1668     5    19  1622     3     .     .     4     4     .     .     1     5     .     .     .     2     .     2     .     .     .     1     .
      T  1447     1  1050    49    62    96    13    83    26     .     1     2    39     4     .     .     6     .    12     .     .     .     3     .
      S  1073     4   864    37    27    40     5    45    21     .     3     .    25     1     .     .     1     .     .     .     .     .     .     .
      H   969     4   343    58    88     3     3     1   258     .    60    25     .     .     .     .     1   121     .     4     .     .     .     .
      R   913   619     4    82    44     5     1     1    93     .    37    22     1     .     .     .     .     .     2     1     .     .     .     1
      N   478   462     .     7     2     3     .     .     2     .     1     .     .     .     .     .     1     .     .     .     .     .     .     .
      M   422   412     .     2     5     1     .     .     1     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .
      2   372    73     4   114    10     3     1     5   131     .    14    13     2     .     .     .     .     .     1     1     .     .     .     .
      Z   344     2    96    10   203    21     .     .     9     .     2     .     .     .     .     .     1     .     .     .     .     .     .     .
      P   215     4     3    48     6     3     .     .    14     .    91    25     .     .     .     .     .    21     .     .     .     .     .     .
      I   152     .     .     .     .     .    11     .     .     .     .     .     .    43     .     .     .     .     .    69     4    25     .     .
      K    57    55     .     1     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      L    43    38     .     1     1     .     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      F    36     1     1     3     .     .     .     .     2     .    23     2     .     .     .     .     .     4     .     .     .     .     .     .
      6    12    11     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 35789  6420  4278  3895  3764  2728  2349  2185  1969  1668  1447  1073   969   913   478   422   372   344   215   152    57    43    36    12

    Next-symbol probability (× 99):

        TT     C  O  G  8  E  D  A  4  T  S  H  R  N  M  2  Z  P  I  K  L  F  6
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
        99  .  . 21  2  8  6  2  2 25 12 11  2  2  .  .  4  .  2  .  .  .  .  .
      C 99  . 22  4 19 44  .  4  1  .  .  .  2  .  .  .  1  .  .  .  .  .  .  .
      O 99  1  .  .  .  1 34 36  .  .  .  . 14  8  .  .  .  .  2  .  .  .  .  .
      G 99 92  .  .  .  .  1  2  .  .  1  1  1  .  .  .  .  .  .  .  .  .  .  .
      8 99  3  1  3 74  .  .  . 15  .  1  1  .  .  .  .  .  .  .  .  .  .  .  .
      E 99 46  .  7  4  4  . 11  2  . 13  8  2  1  .  .  1  .  .  .  .  .  .  .
      D 99  1 39  4  8  .  .  . 34  .  3  1  .  .  .  .  .  9  .  .  .  .  .  .
      A 99  .  .  .  .  . 28  .  .  .  .  .  . 20 24 20  .  .  .  3  2  1  .  .
      4 99  .  1 96  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      T 99  . 72  3  4  7  1  6  2  .  .  .  3  .  .  .  .  .  1  .  .  .  .  .
      S 99  . 80  3  2  4  .  4  2  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .
      H 99  . 35  6  9  .  .  . 26  .  6  3  .  .  .  .  . 12  .  .  .  .  .  .
      R 99 67  .  9  5  1  .  . 10  .  4  2  .  .  .  .  .  .  .  .  .  .  .  .
      N 99 96  .  1  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      M 99 97  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      2 99 19  1 30  3  1  .  1 35  .  4  3  1  .  .  .  .  .  .  .  .  .  .  .
      Z 99  1 28  3 58  6  .  .  3  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .
      P 99  2  1 22  3  1  .  .  6  . 42 12  .  .  .  .  . 10  .  .  .  .  .  .
      I 99  .  .  .  .  .  7  .  .  .  .  .  . 28  .  .  .  .  . 45  3 16  .  .
      K 99 96  .  2  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      L 99 87  .  2  2  .  7  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      F 99  3  3  8  .  .  .  .  6  . 63  6  .  .  .  .  . 11  .  .  .  .  .  .
      6 99 91  .  .  .  .  .  .  8  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 18 12 11 10  8  6  6  5  5  4  3  3  3  1  1  1  1  1  0  0  0  0  0

    Previous-symbol probability (× 99):

        TT     C  O  G  8  E  D  A  4  T  S  H  R  N  M  2  Z  P  I  K  L  F  6
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
        18  .  1 35  4 19 15  5  7 98 52 64 16 15  .  . 74  . 46  1  .  5 22  8
      C 12  . 22  4 22 69  .  7  3  .  1  1  8  1  .  2 12  .  8  7  .  5  8 17
      O 11  1  .  .  .  1 57 65  .  .  .  1 58 33  1  3  2  . 31  6 12  2 36 17
      G 10 54  .  .  .  1  1  3  .  1  1  2  6  2  .  .  2  .  .  1  2  .  6  .
      8  8  1  .  2 54  .  .  . 21  .  2  4  .  .  .  .  .  .  .  1  .  2  .  .
      E  6 17  .  4  3  3  . 12  3  . 21 17  4  1  .  .  4  .  5  .  3  . 17  .
      D  6  . 20  2  4  .  .  . 37  .  5  3  .  .  .  .  . 57  .  2  .  .  .  .
      A  5  .  .  .  .  . 23  .  .  .  .  .  . 43 98 94  2  .  1 33 75 28  . 50
      4  5  .  . 41  .  .  .  .  .  .  .  .  1  .  .  .  1  .  1  .  .  .  3  .
      T  4  . 24  1  2  3  1  4  1  .  .  .  4  .  .  .  2  .  6  .  .  .  8  .
      S  3  . 20  1  1  1  .  2  1  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .
      H  3  .  8  1  2  .  .  . 13  .  4  2  .  .  .  .  . 35  .  3  .  .  .  .
      R  3 10  .  2  1  .  .  .  5  .  3  2  .  .  .  .  .  .  1  1  .  .  .  8
      N  1  7  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      M  1  6  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      2  1  1  .  3  .  .  .  .  7  .  1  1  .  .  .  .  .  .  .  1  .  .  .  .
      Z  1  .  2  .  5  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      P  1  .  .  1  .  .  .  .  1  .  6  2  .  .  .  .  .  6  .  .  .  .  .  .
      I  0  .  .  .  .  .  .  .  .  .  .  .  .  5  .  .  .  .  . 45  7 58  .  .
      K  0  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      L  0  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      F  0  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  1  .  .  .  .  .  .
      6  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 3.749

    Next-symbol entropy: 1.975
    
  let's generate a working text, correcting a few obvious mistakes,
  such as the "I" and "L" mistakes.

    cat .tmp-c-fsg.txt \
      | /n/gnu/bin/sed \
          -e 's/^ *//g' -e 's/ *$//g' -e 's/   */ /g' \
      | correct-fsg \
      > .voyn.fsg

    --- correct-fsg ------------------------
    #! /n/gnu/bin/sed -f
    # Corrects "transcription errors" in FSG notation
    #
    s/$/ /g
    s/^/ /g
    s/CI/A/g
    s/IIIL/M/g
    s/CM/AN/g
    s/AL/AN/g
    s/A2/AR/g
    s/4A/4O/g
    s/A /G /g
    s/4CD/4OD/g
    s/4CH/4OH/g
    s/4G/4O/g
    s/A\([^KMNRIEFP]\)/O\1/g
    s/^  *//g
    s/  *$//g
    ----------------------------------------

  First, single characters at end-of-line: 
  
    cat .voyn.fsg \
      | tr -d ' /=\012' \
      | enum-ngraphs -v n=1 \
      | egrep -v '\*' \
      > .voyn-tt-1.grm
      
    cat .voyn-tt-1.grm \
      | sed -e 's/^\(.\)$/\1:/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-1-0.frq
      
    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=2 \
      | egrep -v '\*' \
      | egrep '^.:$' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-1-0.frq
      
    compare-freqs \
        .voyn-tt-1-0.frq \
        .voyn-nl-1-0.frq \
      | compute-count-ratio \
          -v nmin=25 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-1-0.cmp

  The output is shown below. The first two columns are the total
  occurrences of the pattern (NT), and the ratio NT/(total
  characters).  The next two columns are count of occurrences centered
  on line breaks (NL) and the ratio NL/(total line breaks).  The
  "ratio" column is the ratio NL/NT.  The "mk" field is a
  classification of the pattern based on the two counts.
  
  First, if the total occurrence count NT of the pattern is less than
  a certain minimum count "nmin", it doesn't really matter how we
  classify it.  Such patterns are marked "-?" or "+?", meanind "don't
  care, but low" and "don't care, but high", respectively.  The choice
  between the two is the same as for the "--" and "++" marks,
  explained below.  Here we have taken nmin=25.
  
  Now suppose NT is greater than or equal to "nmin".
  
  IF NL is too small, it means that the pattern never occurs at line
  breaks, and hence is probably not a valid word boundary. Such
  patterns are marked "oo".

  If NL is close to NT (i.e. the ratio is close to 1.000), it means
  that the group ONLY occurs around LINE breaks; presumably because it
  only occurs around paragraph breaks, or is affected by abbreviations
  and calligraphic ornament.  Such patterns are marked "##".
  
  Let the average number of "true" words per line be "mw".  If line
  breaks are a random subset of the "true" word breaks, a pattern that
  ONLY occurs at word breaks would have ratio NL/NT equal to 1/mw.
  Conversely, if the ratio NL/NT is 1/mw or more, it means that the
  pattern is an almost certain indicator of WORD break (unless its
  line-break statistics are biased for some other rason).  Such
  patterns are marked "||".  For starters, we have guessed mw=5.
  
  Finally, the probability that an occurrence of the pattern marks
  a true word boundary is NB/NT, where NB is the number of occurrences
  of P at a true word boundary.  If we use mw*NL as an estimate for 
  NB, then P is most probably a word break when mw*NL/NT >= 1/2.
  In that case, we mark the pattern with "++". Conversely, 
  if mw*NL/NT < 1/2, the pattern is most likely *not* a word break;
  we mark such a pattern with "--". 
  
  The counts can be contaminated by sampling error, transcription
  mistakes, non-text usage, etc.  To allow for these perturbations,
  the ratio is computed as (NL+1)/(NT+mc), and the conditions have
  actually been modified as shown below.  The basic idea is to assign
  an "##", "||", or "oo" mark only if we are reasonably sure that by
  taking those marks at face value we would not make more than "nmin"
  mistakes for each pattern.
  

    function classify(NT, NL, nmin, mw, mc)
    {
      if      ((NT < nmin) && (2*mw*(NL+1) < (NT+mc))) 
        { return "-?" }  # unimportant but NL low
      else if ((NT < nmin) && (2*mw*(NL+1) >= (NT+mc))) 
        { return "+?" }  # unimportant but NL high
      else if (mw*(NL+1) < nmin)
        { return "oo" }  # NL practically zero  
      else if ((NL-1) > NT - nmin)
        { return "##" }  # NL practically NT
      else if (mw*(NL-1) > NT - nmin)
        { return "||" }  # NL practically maximum expected
      else if (2*mw*(NL+1) < (NT+mc))
        { return "--" }  # NL on the low side
      else if (2*mw*(NL+1) >= (NT+mc))
        { return "++" }  # NL on the high side
      else
        { return "!!" }  # program error
    }
    
  Here is the output:
  
       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         57 0.002     50 0.066  0.526  ##  K:

         12 0.000      8 0.011  0.173  +?  6:

       3781 0.129    402 0.529  0.105  ++  G:

        438 0.015     33 0.043  0.071  --  M:
        922 0.031     64 0.084  0.068  --  R:
       2353 0.080    138 0.182  0.058  --  E:
        503 0.017     29 0.038  0.055  --  N:
        365 0.012     11 0.014  0.030  --  2:
       2740 0.093     12 0.016  0.005  --  8:
       3964 0.135      9 0.012  0.002  --  O:

          8 0.000      0 0.000  0.021  -?  L:

         36 0.001      0 0.000  0.013  oo  F:
         72 0.002      0 0.000  0.009  oo  I:
        345 0.012      1 0.001  0.005  oo  Z:
        216 0.007      0 0.000  0.004  oo  P:
       4268 0.145      3 0.004  0.001  oo  C:
       1952 0.066      0 0.000  0.001  oo  A:
       1676 0.057      0 0.000  0.001  oo  4:
       1453 0.049      0 0.000  0.001  oo  T:
       1078 0.037      0 0.000  0.001  oo  S:
        973 0.033      0 0.000  0.001  oo  H:
       2192 0.075      0 0.000  0.000  oo  D:

  Interpretation: we can safely insert a word break after every
  occurrence of "K", and supress a word break after "F", "I", "Z", "A",
  "P", "H", "S", "T", "4", "C", and "D".  By doing so we
  will probably not make more than 25 mistakes for each pattern.
  
  Also, we can either break or not after "6" and "L".  Given the ratios,
  it seems safer to break after "6" but not break after "L".
  
  Note that, by our stated criteria, we cannot safely break after FSG
  "G". If there are only 5 true words per line (our working guess),
  then there should be about 5*402 = 2010 true words ending in "G",
  and therefore another 3781 - 2010 = 1771 "G"s that are not
  word-final.  In fact, the many lines that *start* with "G" already
  told us that.
  
  Moreover, since we know that "G" is not always word-final, we can
  (tentatively) conclude that mw is less than 3781/402 = 9.44.  But we
  should not bet the house on this...
  
  By the same reasoning, it is not safe to always supress word breaks
  after "O".  Given that there are 9 occurrences of "O" at end-of-line,
  we expect 5*9 = 45 "O"s at the end of "true" words. Supressing those 
  word breaks would mean 45 mistakes, with is more than our specified
  limit "nmin".  (Of course that may also mean that our choice of 
  "nmin" is too low...)
  
  Summarizing, we get the following word-splitting rules:
  
    always break:       K: 
    
    likely break:       6: G:
    
    unlikely break:     2: 8: E: M: N: O: R:
    
    never break:        4: A: C: D: F: H: I: L: P: S: T: Z:
  
  
  Now let's do the same for single characters at line-start:
  
    cat .voyn-tt-1.grm \
      | sed -e 's/^\(.\)$/:\1/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-0-1.frq
      
    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=2 \
      | egrep -v '\*' \
      | egrep '^:.$' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-0-1.frq
      
    compare-freqs \
        .voyn-tt-0-1.frq \
        .voyn-nl-0-1.frq \
      | compute-count-ratio \
          -v nmin=25 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-0-1.cmp

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
        365 0.012    138 0.181  0.343  ||  :2
        216 0.007     72 0.094  0.285  ||  :P

       1676 0.057    198 0.260  0.116  ++  :4

        973 0.033     52 0.068  0.052  --  :H
       2740 0.093    104 0.136  0.038  --  :8
       1078 0.037     23 0.030  0.021  --  :S
       3781 0.129     59 0.077  0.016  --  :G
       3964 0.135     58 0.076  0.015  --  :O
       1453 0.049     17 0.022  0.012  --  :T
       2353 0.080     26 0.034  0.011  --  :E
        922 0.031      4 0.005  0.005  --  :R
       2192 0.075      7 0.009  0.004  --  :D

          8 0.000      0 0.000  0.021  -?  :L
         12 0.000      0 0.000  0.019  -?  :6

         36 0.001      1 0.001  0.026  oo  :F
         57 0.002      0 0.000  0.010  oo  :K
         72 0.002      0 0.000  0.009  oo  :I
        345 0.012      0 0.000  0.003  oo  :Z
        503 0.017      0 0.000  0.002  oo  :N
        438 0.015      0 0.000  0.002  oo  :M
       4268 0.145      3 0.004  0.001  oo  :C
       1952 0.066      0 0.000  0.001  oo  :A

  In words, "2" and "P" can be considered word-start indicators.  Note
  that character "4" would only be a sure word-start indicator if the
  average number of true words per line was 8.45; and that is probably
  a upper bound for "mw".
  
  We can also assume safely that no words begin with "F", "C",
  "K", "I", "Z", "M", "N", and "A", and supress word breaks before those
  characters.
  
  Characters "6" and "L" are now so rare that we can either break or
  supress breaks before them.  Since "6" and "L" do not occur at
  line-start, but do occur at line-end, it is safer not to 
  break before them (unless we have other reasons to).

  Note the strong difference between the "word-startiness" of "H" and
  "D".  Since those characters seem equivalent by many other criteria,
  we conjecture the difference is a calligraphic efect, namely that
  "D" is almost always written as "H" when it is line-initial.

  In tabular form:
  
    always break:     :2 :P 
    
    likely break:     :4
    
    unlikely break:   :8 :D :E :G :H :O :R :S :T
    
    never break:      :6 :A :C :F :I :K :L :M :N :Z
  

  Let's recompute these statistics, excluding the "sure" break and 
  non-break patterns that we have already identified, namely 
  K: 4: A: C: D: F: H: I: P: S: T: Z: and :2 :P :6 :A :C :F :I :K :L :M :N :Z
 
    cat .voyn.fsg \
      | tr -d ' /=\012' \
      | enum-ngraphs -v n=2 \
      | egrep -v '\*' \
      > .voyn-tt-2.grm

    cat .voyn-tt-2.grm \
      | sed -e 's/^\(.\)\(.\)$/\1:\2/g' \
      | egrep -v '[K4ACDFHIPSTZ]:' \
      | egrep -v ':[2P6ACFIKLMNZ]' \
      > .voyn-tt-1-1-x.grm
      
    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=3 \
      | egrep -v '\*' \
      | egrep '^.:.$' \
      | egrep -v '[K4ACDFHIPSTZ]:' \
      | egrep -v ':[2P6ACFIKLMNZ]' \
      > .voyn-nl-1-1-x.grm
 
  Let's now have a look at the line-final characters in this 
  reduced sample:
      
    cat .voyn-tt-1-1-x.grm \
      | sed -e 's/^\(.\):.$/\1:/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-1-0-x.frq
     
    cat .voyn-nl-1-1-x.grm \
      | sed -e 's/^\(.\):.$/\1:/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-1-0-x.frq

    compare-freqs \
        .voyn-tt-1-0-x.frq \
        .voyn-nl-1-0-x.frq \
      | compute-count-ratio \
          -v nmin=25 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-1-0-x.cmp

  Here are the results:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
          9 0.001      7 0.014  0.163  +?  6:
       3514 0.258    291 0.572  0.082  --  G:
        732 0.054     47 0.092  0.062  --  R:
        413 0.030     22 0.043  0.051  --  M:
       2155 0.158     98 0.193  0.045  --  E:
        475 0.035     20 0.039  0.041  --  N:
        205 0.015      7 0.014  0.033  --  2:
       2302 0.169     11 0.022  0.005  --  8:
       3797 0.279      6 0.012  0.002  --  O:
          8 0.001      0 0.000  0.021  -?  L:

  Note that "G:" changed from "++" to "--" when we excluded the "sure"
  patterns.  Otherwise there was no change.
  
  Now let's look again at the line-initial characters:

    cat .voyn-tt-1-1-x.grm \
      | sed -e 's/^.:\(.\)$/:\1/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-0-1-x.frq
     
    cat .voyn-nl-1-1-x.grm \
      | sed -e 's/^.:\(.\)$/:\1/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-0-1-x.frq

    compare-freqs \
        .voyn-tt-0-1-x.frq \
        .voyn-nl-0-1-x.frq \
      | compute-count-ratio \
          -v nmin=25 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-0-1-x.cmp

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
        660 0.048     94 0.185  0.136  ++  :8
       1657 0.122    185 0.363  0.110  ++  :4
        825 0.061     51 0.100  0.060  --  :H
       1819 0.134     52 0.102  0.029  --  :O
       2373 0.174     56 0.110  0.024  --  :G
        976 0.072     21 0.041  0.022  --  :S
       1754 0.129     26 0.051  0.015  --  :E
       1177 0.086     14 0.028  0.012  --  :T
       1907 0.140      7 0.014  0.004  --  :D
        462 0.034      3 0.006  0.008  oo  :R

  The last line says is:
  if we have already decided to supress word breaks after
  [4ACDFHIPSTZ], and break after [K], then we might as well 
  supress breaks before "R", since we would be making less than
  25 errors because of that decision.

  Now let's look at 2-char patterns, with one character on either side
  of the line break.  First, let's omit the patterns already fixed:
      
    cat .voyn-tt-1-1-x.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-1-1-x.frq
     
    cat .voyn-nl-1-1-x.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-1-1-x.frq

    compare-freqs \
        .voyn-tt-1-1-x.frq \
        .voyn-nl-1-1-x.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-1-1-x.cmp

  Note that we have lowered nmin=10, because
  there are many more patterns.
  
  The results are better examined in tabular form:
  
    cat .voyn-tt-nl-1-1-x.cmp \
      | print-pattern-classes \
          -v rowchars='O28ERMNGL6' \
          -v colchars='4E8RDHGSTO'

         4  E  8  R  D  H  G  S  T  O
        -- -- -- -- -- -- -- -- -- --
    O | -- oo oo oo oo oo oo -- oo ||
    2 | -? -? -?  . -? -? oo -- oo --
    8 | -- -- -? -? -? -? oo oo oo --
    E | || -- ++ oo -- -- -- -- -- --
    R | || +? || -? -? +? -- -- -- --
    M | || -? -- -? -? -? -- -- -- --
    N | || -? -- -?  . -? || -- -- --
    G | -- -- ++ -- -- || || -- -- --
    L |  . -?  .  .  .  . -?  . -? -?
    6 | -?  . -?  .  . -?  . -?  . -?

  The table only shows pairs that still occur at least once.
  
  The table says that, after we have decided to split
  at   K:. .:[2P], and not to split at  [4ACDFHILPSTZ]:.
  .:[6ACFIKLMNZ], we can also break at
  
     [ERMN]:[4]
     [NG]:[HG]
     R:8
     O:O
     
  and supress breaks at 
   
     [O82]:[8RDHGT]
     8:S
     E:R
     O:E
     
  If we have to choose, it is best to supress breaks at 
     [L6]:.
     2:[4E]
     [RMN]:[RD]
     [MN]:[HE]
     
  and perhaps to break at
  
     R:[EH]
     
  Here are some particular entries:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
          5 0.000      2 0.004  0.067  -?  2:4
        135 0.010      2 0.004  0.017  --  2:O
         20 0.001      1 0.002  0.033  --  2:S
         18 0.001      0 0.000  0.017  oo  2:T

         25 0.002      4 0.008  0.077  --  8:4
         16 0.001      2 0.004  0.054  --  8:E
        100 0.007      2 0.004  0.021  --  8:O

        312 0.023      1 0.002  0.006  --  E:D
         47 0.003      6 0.012  0.080  --  E:E
        126 0.009     12 0.024  0.078  --  E:G
         71 0.005      5 0.010  0.054  --  E:H

       1377 0.101    100 0.196  0.071  --  G:4
        123 0.009      5 0.010  0.037  --  G:D
        323 0.024     11 0.022  0.033  --  G:E
        150 0.011     34 0.067  0.184  ||  G:H
        123 0.009      2 0.004  0.018  --  G:R

         15 0.001      2 0.004  0.055  --  O:4
         10 0.001      0 0.000  0.020  oo  O:T
         11 0.001      1 0.002  0.039  --  O:S

  Pattern 2:S is actually very similar to 2:T.  

  Patterns 2:O also is almost a non-break.
  
  Note that E:D is amost a non-break, wheras E:H is only moderately
  unlikely.  As we remarked, the difference is probably a calligraphic
  effect.

  Pattern O:S is actually very close to O:T; they are just above the
  "nmin" threshold.

  Let's reanalyze the 1:1 patterns without eliminating the 1:0 and 0:1
  extremal cases.
  
    cat .voyn-tt-2.grm \
      | sed -e 's/^\(.\)\(.\)$/\1:\2/g' \
      > .voyn-tt-1-1.grm
      
    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=3 \
      | egrep -v '\*' \
      | egrep '^.:.$' \
      > .voyn-nl-1-1.grm
 

    cat .voyn-tt-1-1.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-1-1.frq
     
    cat .voyn-nl-1-1.grm \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-1-1.frq

    compare-freqs \
        .voyn-tt-1-1.frq \
        .voyn-nl-1-1.frq \
      | compute-count-ratio \
          -v nmin=10 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-1-1.cmp

  Summarizing:
  
    cat .voyn-tt-nl-1-1.cmp \
      | print-pattern-classes \
          -v rowchars='AI4FPDHCTSZ2L68OKMNREG' \
          -v colchars='A6KLMNIZFC2PEDHSTR4G8O'

         A  6  K  L  M  N  I  Z  F  C   2  P   4  E  8  R  D  H  G  S  T  O
        -- -- -- -- -- -- -- -- -- --  -- --  -- -- -- -- -- -- -- -- -- --
    A |  .  . oo  . oo oo oo  .  .  .   . -?   . oo  . oo  .  .  .  .  .  .
    I |  .  . -? -?  .  . oo  .  .  .   .  .   . -?  . oo  .  .  .  .  .  .
    4 |  .  .  .  .  .  .  .  . -? oo  -? -?  -?  .  .  . -? -?  . -?  . oo
    D | oo  .  .  . -?  . -? oo  . oo   .  .   . oo -? -?  .  . oo oo oo oo
    H | oo  .  .  . -?  . -? oo  . oo  -?  .   . -? -?  . -?  . oo oo oo oo
    P | oo  .  .  .  .  .  . oo  . -?   .  .   .  . -?  .  .  . -? oo oo oo
    F | -?  .  .  .  .  .  . -?  . -?   .  .   .  .  .  .  .  .  . -? oo -?
    S | oo  .  .  .  .  .  .  .  . oo  -?  .  -? -? oo -? oo oo oo  . -? oo
    T | oo  .  .  .  .  .  .  . -? oo  -? oo   . oo oo -? oo oo oo -? -? oo
    C | oo -?  . -?  .  .  .  . -? oo  oo oo  -? -? -- -? oo oo oo -? oo oo
    Z | oo  .  .  .  .  .  .  .  . oo  -?  .   .  . oo  .  .  . oo -? -? --
    L |  .  .  .  .  .  .  .  .  .  .   .  .   . -?  .  .  .  . -?  . -? -?
    6 | -?  .  .  .  .  .  .  .  .  .  -?  .  -?  . -?  .  . -?  . -?  . -?
                                     
    K |  .  .  .  .  .  .  .  .  .  .  ## -?  ## -? +?  .  . -? -? -? -? +?
                                     
    2 | oo  .  .  .  .  . -?  .  . -?  -? -?  -? -? -?  . -? -? oo -- oo --
    8 | oo  .  . -?  .  . -?  . -? oo  -?  .  -- -- -? -? -? -? oo oo oo --
    O | -? -? -? -? oo -? -?  . oo oo  ## oo  -- oo oo oo oo oo oo -- oo ||
    E | oo  . -?  .  .  .  .  . -? --  || ||  || -- ++ oo -- -- -- -- -- --
    R | oo -?  .  .  .  . -?  . -? -?  ## ##  || +? || -? -? +? -- -- -- --
    M | -?  .  .  .  .  . -?  . -? -?  ## -?  || -? -- -? -? -? -- -- -- --
    N | -?  .  .  .  .  .  .  .  . -?  ## -?  || -? -- -?  . -? || -- -- --
    G | oo -? -? -?  .  . -?  . -? -?  || ||  -- -- ++ -- -- || || -- -- --
        
  Note that this table says that some single-character break-supressing patterns
  that we had found before, like C:8 Z:O E:C, are actually weakly possible:
  
       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         11 0.000      1 0.001  0.039  --  Z:O
       1899 0.065      1 0.001  0.001  --  C:8
         20 0.001      1 0.001  0.033  --  E:C

  Pattern C:8 should obviously have been "oo"; the classifier
  needs more work.  Also Z:O and E:C are practically "?-".
  
  Also, the breaking pattern K:. is not optimal: we can probably do better
  by breaking at K:[248O] and supressing at other combinations.
  
  Similarly, the breaking pattern .:[2P] is not optimal; it seems safer to break
  [KOERMNG]:2, [ERG]:P but supress breaks at [OMN]:P, [28]:[2P]
  
  Let's look closely at the ++ and -- entries:
  
       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
        197 0.007     23 0.030  0.101  ++  E:8
        321 0.011     55 0.072  0.155  ++  G:8

  These seem legitimately ambiguous.
  
        tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         20 0.001      1 0.001  0.033  --  2:S
       1899 0.065      1 0.001  0.001  --  C:8
         20 0.001      1 0.001  0.033  --  E:C
        312 0.011      1 0.001  0.006  --  E:D
         14 0.000      1 0.001  0.037  --  M:G
        115 0.004      1 0.001  0.013  --  M:T
        105 0.004      1 0.001  0.014  --  N:S
        118 0.004      1 0.001  0.013  --  N:T
         11 0.000      1 0.001  0.039  --  O:S
         11 0.000      1 0.001  0.039  --  Z:O

        135 0.005      2 0.003  0.017  --  2:O
         16 0.001      2 0.003  0.054  --  8:E
        100 0.003      2 0.003  0.021  --  8:O
        348 0.012      2 0.003  0.008  --  E:S
        503 0.017      2 0.003  0.006  --  E:T
        123 0.004      2 0.003  0.018  --  G:R
         91 0.003      2 0.003  0.023  --  M:S
         15 0.001      2 0.003  0.055  --  O:4
        155 0.005      2 0.003  0.015  --  R:S
        130 0.004      2 0.003  0.018  --  R:T

         25 0.001      4 0.005  0.077  --  8:4
         47 0.002      6 0.008  0.080  --  E:E
        126 0.004     12 0.016  0.078  --  E:G
         71 0.002      5 0.007  0.054  --  E:H
        384 0.013      5 0.007  0.014  --  E:O
       1377 0.047    100 0.132  0.071  --  G:4
        123 0.004      5 0.007  0.037  --  G:D
        323 0.011     11 0.014  0.033  --  G:E
        585 0.020     29 0.038  0.048  --  G:O
        201 0.007     11 0.014  0.050  --  G:S
        241 0.008      8 0.011  0.032  --  G:T
         27 0.001      2 0.003  0.045  --  M:8
        130 0.004      3 0.004  0.024  --  M:O
         34 0.001      4 0.005  0.068  --  N:8
        172 0.006      3 0.004  0.019  --  N:O
         61 0.002      4 0.005  0.050  --  R:G
        291 0.010      5 0.007  0.018  --  R:O
  
  It seems that the "--" of the first group above could be changed to
  "oo" without much harm; each is likely to generate 5 mistakes
  (omission of true break).
  
  Each pattern in the second group would generate about 10 mistakes.
  Perhaps less in some cases like E:T, since we aren't taking into
  account the possibility of accidental line breaking inside words.
  
  The patterns in the third group seem legimate ambiguities.
  
  Here is the table again, with the 1st and 2nd groups above manually
  changed to "-?" 

         A  6  K  L  M  N  I  Z  F  C   2  P   E  D  H  S  T  R  4  G  8  O
        -- -- -- -- -- -- -- -- -- --  -- --  -- -- -- -- -- -- -- -- -- --
    A |  .  . oo  . oo oo oo  .  .  .   . -?  oo  .  .  .  . oo  .  .  .  .
    I |  .  . -? -?  .  . oo  .  .  .   .  .  -?  .  .  .  . oo  .  .  .  .
    4 |  .  .  .  .  .  .  .  . -? oo  -? -?   . -? -? -?  .  . -?  .  . oo
    F | -?  .  .  .  .  .  . -?  . -?   .  .   .  .  . -? oo  .  .  .  . -?
    P | oo  .  .  .  .  .  . oo  . -?   .  .   .  .  . oo oo  .  . -? -? oo
    D | oo  .  .  . -?  . -? oo  . oo   .  .  oo  .  . oo oo -?  . oo -? oo
    H | oo  .  .  . -?  . -? oo  . oo  -?  .  -? -?  . oo oo  .  . oo -? oo
    C | oo -?  . -?  .  .  .  . -? oo  oo oo  -? oo oo -? oo -? -? oo -? oo
    T | oo  .  .  .  .  .  .  . -? oo  -? oo  oo oo oo -? -? -?  . oo oo oo
    S | oo  .  .  .  .  .  .  .  . oo  -?  .  -? oo oo  . -? -? -? oo oo oo
    Z | oo  .  .  .  .  .  .  .  . oo  -?  .   .  .  . -? -?  .  . oo oo -?
    2 | oo  .  .  .  .  . -?  .  . -?  -? -?  -? -? -? -? oo  . -? oo -? -?
    L |  .  .  .  .  .  .  .  .  .  .   .  .  -?  .  .  . -?  .  . -?  . -?
    6 | -?  .  .  .  .  .  .  .  .  .  -?  .   .  . -? -?  .  . -?  . -? -?
                                                                        
    8 | oo  .  . -?  .  . -?  . -? oo  -?  .  -? -? -? oo oo -? -- oo -? -?
    O | -? -? -? -? oo -? -?  . oo oo  ## oo  oo oo oo -? oo oo -? oo oo ||
    K |  .  .  .  .  .  .  .  .  .  .  ## -?  -?  . -? -? -?  . ## -? +? +?
    M | -?  .  .  .  .  . -?  . -? -?  ## -?  -? -? -? -? -? -? || -? -- --
    N | -?  .  .  .  .  .  .  .  . -?  ## -?  -?  . -? -? -? -? || || -- --
    R | oo -?  .  .  .  . -?  . -? -?  ## ##  +? -? +? -? -? -? || -- || --
    E | oo  . -?  .  .  .  .  . -? -?  || ||  -- -? -- -? -? oo || -- ++ --
    G | oo -? -? -?  .  . -?  . -? -?  || ||  -- -- || -- -- -? -- || ++ --

  Here are the tentative word-breaking rules, derived from this table:
  
    9. Supress breaks at  
        
        [24AIDHPFSTCZ6L]:. 
        .:[6AKLMNIZFCR] 
        [8O]:[8G] 
        O:4
        8:2
        [8KMNO]:[PEDHSTR]
         
    8. Insert break at
        
        K:[8O] 
        [KERMN]:[24]
        [OG]:2
        [ERG]:P
        R:[E8]
        G:H
        [NG]:G
        O:O
    
  For the remaining cases, we must look at digraphs on either side of
  line breaks.  First, let's prepare files of tetragrams with 2 chars on
  each side of the "cursor", discarding the unambiguous
  single-character patterns that we have found already.
  
    cat .voyn.fsg \
      | tr -d ' /=\012' \
      | enum-ngraphs -v n=4 \
      | egrep -v '\*' \
      > .voyn-tt-4.grm

    cat .voyn-tt-4.grm \
      | sed -e 's/^\(..\)\(..\)$/\1:\2/g' \
      | egrep -v '[24AIDHPFSTCZ6L]:' \
      | egrep -v ':[6AKLMNIZFCR]' \
      | egrep -v '[8O]:[8G]|O:4|8:2|[8KMNO]:[PEDHSTR]' \
      | egrep -v 'K:[8O]|[KERMN]:[24]|[OG]:2|[ERG]:P|R:[E8]|G:H|[NG]:G|O:O' \
      > .voyn-tt-2-2-x.grm 
      
    cat .voyn.fsg \
      | tr -d ' /=' \
      | sed -e 's/^\(..\).*\(..\)$/\1\2/g' \
      | tr -s '\012' ':' \
      | enum-ngraphs -v n=5 \
      | egrep -v '\*' \
      | egrep '^..:..$' \
      | egrep -v '[24AIDHPFSTCZ6L]:' \
      | egrep -v ':[6AKLMNIZFCR]' \
      | egrep -v '[8O]:[8G]|O:4|8:2|[8KMNO]:[PEDHSTR]' \
      | egrep -v 'K:[8O]|[KERMN]:[24]|[OG]:2|[ERG]:P|R:[E8]|G:H|[NG]:G|O:O' \
      > .voyn-nl-2-2-x.grm
      
  Now let's first look at 2-character word-end patterns:

    cat .voyn-tt-2-2-x.grm \
      | sed -e 's/\(..\):..$/\1:/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-2-0-x.frq
    
    cat .voyn-nl-2-2-x.grm \
      | sed -e 's/^\(..\):..$/\1:/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-2-0-x.frq
     
    compare-freqs \
        .voyn-tt-2-0-x.frq \
        .voyn-nl-2-0-x.frq \
      | compute-count-ratio \
          -v nmin=7 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-2-0-x.cmp

  Here are the results. These seem to be sure word-ends:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         90 0.014     37 0.118  0.292  ||  EG:
         44 0.007     19 0.061  0.238  ||  RG:
         44 0.007      9 0.029  0.119  ||  TG:
         10 0.002      3 0.010  0.080  ||  MG:

  The ones below seem possible but not certain. Note that 
  DG: and DH: are actually quite similar.

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
        138 0.022     19 0.061  0.112  ++  DG:

         75 0.012     10 0.032  0.096  --  HG:
       1764 0.280     97 0.310  0.054  --  8G:
        214 0.034     12 0.038  0.051  --  OR:
        160 0.025      6 0.019  0.035  --  AM:
         45 0.007      2 0.006  0.035  --  C8:
        438 0.069     15 0.048  0.033  --  AE:
        203 0.032      7 0.022  0.033  --  AN:
         26 0.004      1 0.003  0.030  --  SG:
       1149 0.182     32 0.102  0.028  --  OE:
         40 0.006      1 0.003  0.025  --  G8:
         44 0.007      1 0.003  0.024  --  EE:
        181 0.029      4 0.013  0.023  --  ZG:
        713 0.113     15 0.048  0.021  --  CG:
         79 0.013      1 0.003  0.017  --  GR:
        296 0.047      4 0.013  0.015  --  GE:
        303 0.048      3 0.010  0.012  --  AR:

  These seem actually "+?", the classifier is wrong:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
          3 0.000      3 0.010  0.093  -?  AK:
          2 0.000      2 0.006  0.071  -?  O8:
          1 0.000      1 0.003  0.049  -?  KE:
          3 0.000      1 0.003  0.047  -?  LE:
          4 0.001      1 0.003  0.045  -?  2G:
          5 0.001      1 0.003  0.044  -?  PG:
          6 0.001      1 0.003  0.043  -?  TE:

  These truly deserve "-?":

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
          8 0.001      1 0.003  0.042  --  NG:
          9 0.001      1 0.003  0.041  --  DE:
         11 0.002      1 0.003  0.039  --  ER:
         16 0.003      1 0.003  0.036  --  OG:
         15 0.002      1 0.003  0.036  --  E8:

          2 0.000      0 0.000  0.024  -?  8R:
          2 0.000      0 0.000  0.024  -?  CE:
          2 0.000      0 0.000  0.024  -?  KG:
          2 0.000      0 0.000  0.024  -?  RR:
          2 0.000      0 0.000  0.024  -?  TR:
          1 0.000      0 0.000  0.024  -?  2E:
          1 0.000      0 0.000  0.024  -?  68:
          1 0.000      0 0.000  0.024  -?  DM:
          1 0.000      0 0.000  0.024  -?  DR:
          1 0.000      0 0.000  0.024  -?  K8:
          1 0.000      0 0.000  0.024  -?  P8:
          1 0.000      0 0.000  0.024  -?  S8:
          4 0.001      0 0.000  0.023  -?  IE:
          4 0.001      0 0.000  0.023  -?  NE:
          4 0.001      0 0.000  0.023  -?  R8:
          3 0.000      0 0.000  0.023  -?  HE:
          3 0.000      0 0.000  0.023  -?  M8:
          3 0.000      0 0.000  0.023  -?  ME:
          3 0.000      0 0.000  0.023  -?  N8:
          3 0.000      0 0.000  0.023  -?  ON:
          3 0.000      0 0.000  0.023  -?  T8:
          3 0.000      0 0.000  0.023  -?  Z8:
          6 0.001      0 0.000  0.022  -?  CR:
          6 0.001      0 0.000  0.022  -?  RE:
          5 0.001      0 0.000  0.022  -?  SE:

  Finally, these are non-enders:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         10 0.002      0 0.000  0.020  oo  OM:
         12 0.002      0 0.000  0.019  oo  8E:
         30 0.005      0 0.000  0.014  oo  IR:
         49 0.008      0 0.000  0.011  oo  GG:

  In tabular form:
  
    always break:       [ERTM]G:.. AK:.. O8:.. [KLT]E:..
    
    likely break:       [DH2P]G:..
    
    unlikely break:     [CG]8:.. [EGO]E:.. [8CSZ]G:.. A[EMNRGO]:..

    never break:        OM:.. 8E:.. IR:.. GG:..

  Now the 2 characters after line-start:
  
    cat .voyn-tt-2-2-x.grm \
      | sed -e 's/^..:\(..\)$/:\1/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-tt-0-2-x.frq
     
    cat .voyn-nl-2-2-x.grm \
      | sed -e 's/^..:\(..\)$/:\1/g' \
      | sort | uniq -c | expand \
      | compute-freqs \
      > .voyn-nl-0-2-x.frq

    compare-freqs \
        .voyn-tt-0-2-x.frq \
        .voyn-nl-0-2-x.frq \
      | compute-count-ratio \
          -v nmin=7 -v mw=5 -v mc=40 \
      | sort -b +0.0 -0.2r +5 -6 +4 -5nr +0 -1nr \
      > .voyn-tt-nl-0-2-x.cmp

  The ones below seem sure word-starts: 

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
          7 0.001      3 0.010  0.085  ##  :HS
         34 0.005     23 0.073  0.324  ||  :8S
         22 0.003     14 0.045  0.242  ||  :8T
         17 0.003      7 0.022  0.140  ||  :GS
         16 0.003      6 0.019  0.125  ||  :GD
         13 0.002      5 0.016  0.113  ||  :GT
          9 0.001      3 0.010  0.082  ||  :HT
          9 0.001      2 0.006  0.061  ||  :8E

  The ones below seem probable word-starts:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
        302 0.048     35 0.112  0.105  ++  :8A
         60 0.010      8 0.026  0.090  --  :8O
          5 0.001      2 0.006  0.067  -?  :HO
          6 0.001      2 0.006  0.065  -?  :OS

  The ones below seem possible word-starts:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
       1376 0.218     99 0.316  0.071  --  :4O
         26 0.004      3 0.010  0.061  --  :SO
        128 0.020      9 0.029  0.060  --  :ET
         61 0.010      5 0.016  0.059  --  :EO
         15 0.002      2 0.006  0.055  --  :TA
         18 0.003      2 0.006  0.052  --  :SA
         27 0.004      2 0.006  0.045  --  :DO
         27 0.004      2 0.006  0.045  --  :TO
         59 0.009      3 0.010  0.040  --  :ES
         13 0.002      1 0.003  0.038  --  :O8
         15 0.002      1 0.003  0.036  --  :GH
         21 0.003      1 0.003  0.033  --  :DT
         25 0.004      1 0.003  0.031  --  :G2
        323 0.051     10 0.032  0.030  --  :OD
        721 0.114     21 0.067  0.029  --  :OE
         31 0.005      1 0.003  0.028  --  :HC
        290 0.046      7 0.022  0.024  --  :OH
         45 0.007      1 0.003  0.024  --  :TD
        572 0.091     10 0.032  0.018  --  :SC
        187 0.030      2 0.006  0.013  --  :OR
        131 0.021      1 0.003  0.012  --  :8G
        670 0.106      7 0.022  0.011  --  :TC
        206 0.033      1 0.003  0.008  --  :DC
          
  The ones below are uncertain:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         36 0.006      0 0.000  0.013  oo  :G4
         32 0.005      0 0.000  0.014  oo  :G8
         29 0.005      0 0.000  0.014  oo  :DG
         25 0.004      0 0.000  0.015  oo  :SD
         25 0.004      0 0.000  0.015  oo  :TG
         23 0.004      0 0.000  0.016  oo  :E8
         22 0.003      0 0.000  0.016  oo  :S8
         21 0.003      0 0.000  0.016  oo  :TH
         19 0.003      0 0.000  0.017  oo  :GO
         19 0.003      0 0.000  0.017  oo  :HA
         17 0.003      0 0.000  0.018  oo  :SH
         16 0.003      0 0.000  0.018  oo  :SG
         13 0.002      0 0.000  0.019  oo  :EA
         13 0.002      0 0.000  0.019  oo  :GE
         11 0.002      0 0.000  0.020  oo  :DS
         11 0.002      0 0.000  0.020  oo  :GG
         11 0.002      0 0.000  0.020  oo  :OM
          9 0.001      0 0.000  0.020  oo  :EG
          9 0.001      1 0.003  0.041  --  :O4
          8 0.001      0 0.000  0.021  oo  :4C
          8 0.001      0 0.000  0.021  oo  :EH
          8 0.001      0 0.000  0.021  oo  :TP
          7 0.001      0 0.000  0.021  oo  :8C
          7 0.001      0 0.000  0.021  oo  :OC
          7 0.001      0 0.000  0.021  oo  :OF
          7 0.001      0 0.000  0.021  oo  :OO
          7 0.001      0 0.000  0.021  oo  :TE
          7 0.001      1 0.003  0.043  --  :O2
          6 0.001      0 0.000  0.022  -?  :OK
          6 0.001      1 0.003  0.043  -?  :DZ
          6 0.001      2 0.006  0.065  -?  :OS
          5 0.001      0 0.000  0.022  -?  :4H
          5 0.001      0 0.000  0.022  -?  :8D
          5 0.001      0 0.000  0.022  -?  :ON
          5 0.001      0 0.000  0.022  -?  :T2
          5 0.001      1 0.003  0.044  -?  :OT
          5 0.001      3 0.010  0.089  -?  :4D
          4 0.001      0 0.000  0.023  -?  :DE
          4 0.001      0 0.000  0.023  -?  :E2
          4 0.001      0 0.000  0.023  -?  :ER
          4 0.001      0 0.000  0.023  -?  :OG
          4 0.001      0 0.000  0.023  -?  :SE
          4 0.001      1 0.003  0.045  -?  :84
          3 0.000      0 0.000  0.023  -?  :EE
          3 0.000      0 0.000  0.023  -?  :EP
          3 0.000      0 0.000  0.023  -?  :HZ
          3 0.000      0 0.000  0.023  -?  :O6
          3 0.000      0 0.000  0.023  -?  :OI
          3 0.000      0 0.000  0.023  -?  :TR
          2 0.000      0 0.000  0.024  -?  :42
          2 0.000      0 0.000  0.024  -?  :4P
          2 0.000      0 0.000  0.024  -?  :EC
          2 0.000      0 0.000  0.024  -?  :EF
          2 0.000      0 0.000  0.024  -?  :GC
          2 0.000      0 0.000  0.024  -?  :GP
          2 0.000      0 0.000  0.024  -?  :GR
          2 0.000      0 0.000  0.024  -?  :ST
          2 0.000      0 0.000  0.024  -?  :TS
          1 0.000      0 0.000  0.024  -?  :4F
          1 0.000      0 0.000  0.024  -?  :82
          1 0.000      0 0.000  0.024  -?  :88
          1 0.000      0 0.000  0.024  -?  :8H
          1 0.000      0 0.000  0.024  -?  :8L
          1 0.000      0 0.000  0.024  -?  :8R
          1 0.000      0 0.000  0.024  -?  :DI
          1 0.000      0 0.000  0.024  -?  :E4
          1 0.000      0 0.000  0.024  -?  :EK
          1 0.000      0 0.000  0.024  -?  :GA
          1 0.000      0 0.000  0.024  -?  :H8
          1 0.000      0 0.000  0.024  -?  :HD
          1 0.000      0 0.000  0.024  -?  :HE
          1 0.000      0 0.000  0.024  -?  :HG
          1 0.000      0 0.000  0.024  -?  :OA
          1 0.000      0 0.000  0.024  -?  :S2
          1 0.000      0 0.000  0.024  -?  :SR
          1 0.000      0 0.000  0.024  -?  :TF
          1 0.000      0 0.000  0.024  -?  :TT
          1 0.000      1 0.003  0.049  -?  :44
          1 0.000      1 0.003  0.049  -?  :4S
          1 0.000      1 0.003  0.049  -?  :DR

  The ones below seem invalid as word-starts:

       tot occurs   at newline  ratio  mk  group
      -----------  -----------  -----  --  -----------
         44 0.007      0 0.000  0.012  oo  :T8
         42 0.007      0 0.000  0.012  oo  :OP
         49 0.008      0 0.000  0.011  oo  :ED
        134 0.021      0 0.000  0.006  oo  :DA


  In tabular form:
  
    always  break:      :H[ST]  :8[STE] :G[STD]

    likely break:       :8[AO] :HO :OS

    unlikely break:     :4O :8G :D[COT] :E[OST] :G[2H] :HC
                        :O[8DHER] :S[ACO] :T[ACDO]
    
    never break:        :T8 :OP :ED :DA