Hacking at the Voynich manuscript
Notebook - volume 12

Warning: these notebooks aren't strictly chronological logs.
  Sometimes I go back and redo things, clarify comments,
  delete garbage, etc.

97-11-04 stolfi
===============

  I decided to unfold the "[|]" groups into separate lines.
  This unfolding should make the consensus more consistent.  

  Note that a group like "[A.|.A]" or or "[A|O]" may be considered a
  consensus, whereas "[A|P]" may not be, depending on the definition
  of consensus.  Hence it seems sensible to do the unfolding of
  alternatives before computing the consensus. (In previous attempts
  at computing the consensus, I would just take the first choice out
  of every alternation).
  
  (I had tried doing the unfolding after mapping to EVA, but there are
  some half-character choices like P[Z|] which would require posterior
  editing. Besides, it seems better to have an unfolded version of the
  FSG encoding, preserving the "%" and "!" alignment markers.)
  
  For the unfolding, I wrote a filter "unfold-alternatives"
  to be used with "filter-files".  New transcriber codes 
  were introduced for the variant lines; see "f0.U"
  for details.
  
  Note also that a line with n groups, like "A[P|F]ETR[II|O]G[A|P]E"
  need only generate two lines "APETRIGAE" and "AFETROGPE" and not
  2^n, since the consensus should not be affected too much by
  crossovers.  (Perhaps this is true only if the alternations are
  well-separated?)  Besides, this interpretation is generally closer
  to the way the '[|]' constructs are used in the file: each branch
  represents one specific version of the transcription.
  
    cat L16/INDEX \
      | sed -e 's/:.*$//g' \
      > .units.dir
    mkdir L16-unf
    foreach f ( `cat .units.dir ` )
      echo $f
      cat L16/$f \
        | unfold-alternatives \
        > L16-unf/$f
    end
  
    /bin/rm -f .diff
    foreach f ( `cat .units.dir ` )
      echo $f
      echo ' ' >>  .diff
      echo '=== '$f' ===' >>  .diff
      echo ' ' >>  .diff
      diff L16/$f L16-unf/$f \
        | prettify-diff-output \
        >> .diff
    end
    
  Expanded and complemented Landini's initial comments, producing
  L16/f0.{A,I,J,E,S,U}.  Included comments about my unfoldings and
  edits.
  
    cp L16/INDEX L16-unf/
    tar cvf - L16 | gzip > L16.tgz
    
      -rw-r--r--   1 stolfi   staff      170606 Nov  5 19:08 L16.tgz
      
    rm -rf L16
    
  Also added  new unit L16/f77v.L (and L16-eva/f77v.L), with the
  labels on figures of page f77v. (I should ask the folks in the 
  mailing list to check the labels...)
  
  Then I converted these files to the new EVA encoding.  I plan to
  work as much as possible with that encoding, since it is "the way of
  the future".
  
    mkdir L16-eva
    
    foreach f ( L16-unf/f[0-9]* )
      echo "$f -> L16-eva/${f:t}"
      cat ${f} \
        | fsg2eva \
        > L16-eva/${f:t}
    end
    
    /bin/rm .bugs
    foreach f ( L16-eva/f[0-9]* )
      echo "checking $f"
      cat ${f} \
        | validate-new-evt-format \
           -v chars='aoeilmnrchtpkfsqgjdvxy' \
        >>& .bugs
    end
    
  Edited manually some occurrences of FSG and Currier codes within
  '{}' comments.  Also fixed a few dozen bugs (bad letters, leading
  ".", missing lines). The file "f0.V" describes the recoding
  and the fixes.

  [Oops, made a mistake in fsg2eva (mapped 'T' to 'th' instead of 'ch').
  So now I am trying to redo the mapping without losing the manual edits:
    
    mv L16-eva L16-eva-th
    
    (recreate L16-eva mechanically as above)
    
    mkdir L16-eva-xx
    
    foreach f ( L16-eva/f[0-9]* )
      set fxx = "L16-eva-xx/${f:t}"
      echo "$f -> $fxx"
      cat ${f} \
        | sed -e '/^</s/ch/th/g' \
        > ${fxx}
    end    
    
    diff -r L16-eva-th L16-eva-xx \
      | prettify-diff-output \
      > .diff
    
    (check differences and edit as appropriate)
    cp -p L16-eva{-th,}/INDEX
    cp -p L16-eva{-th,}/f0.A
    cp -p L16-eva{-th,}/f0.V
    (fix fsg2eva code in f0.V)
    
  OK, let's redo everything we were doing...
  
97-11-08 stolfi
===============

  Let's compute again the digraph frequencies for English:
  
    cat engl-poi.txt | head -685 > .foo
    dicio-wc .foo

     lines   words     bytes file        
    ------ ------- --------- ------------
       685    6929     36813 .foo

    cat .foo \
      | tr ' ' '\012' \
      | sed -e 's/$/./g' \
      | count-digraph-freqs \
          -v pad='.' \
          -v chars='.abcdefghijklmnopqrstuvwxyz0123456789' \
          -v showentropy=1
          
    Digraph counts:

           TT     .     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o     p     q     r     s     t     u     v     w     x     y     z     0     1     2     3     5     6     7
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      .  6929     .   799   249   298   235   145   235   104   516   555    57    37   200   419   137   388   173    15   168   531   868    93    64   492     .   148     .     .     1     .     1     1     .     .
      a  2413   280     3    40    73   114     .    35    40     .   105     .    25   168    52   380     3    34     .   255   266   311    33    78    44     .    71     3     .     .     .     .     .     .     .
      b   359     2    39     2     .     .   115     .     .     .    12     3     .    34     .     .    47     .     .    22    10     2    45     1     .     .    25     .     .     .     .     .     .     .     .
      c   694     5   107     .    12     .   118     .     .    92    40     .    35    19     .     .   112     .     1    34     .    44    26     .     .     .    49     .     .     .     .     .     .     .     .
      d  1405   873    36     .     .    21   146     .     4     .   105     .     .    14     8     9    97     .     .    37    27     .    10     .     2     .    16     .     .     .     .     .     .     .     .
      e  3710  1281   157     3    58   401   116    39     6     4    27     .     3   124    79   316    10    26     1   536   215   138     3    74    26    41    25     1     .     .     .     .     .     .     .
      f   652   219    39     .     .     .    79    39     .     .    49     .     .    15     .     .    88     .     .    58     .    35    31     .     .     .     .     .     .     .     .     .     .     .     .
      g   577   198    24     .     .     .    70     .     6    87    24     .     .    52     3     5    37     .     .    37    24     .    10     .     .     .     .     .     .     .     .     .     .     .     .
      h  1813   184   327     1     1     1   767     1     .     .   216     .     .     1     .    33   181     .     .    15     8    53    17     .     .     .     7     .     .     .     .     .     .     .     .
      i  2061   171    63    16    73    90    79    53    56     2     .     .    14   100    96   562    81    10     .    69   228   245     3    46     .     1     .     3     .     .     .     .     .     .     .
      j    61     .     2     .     .     .     4     .     .     .     .     .     .     .     .     .    39     .     .     .     .     .    16     .     .     .     .     .     .     .     .     .     .     .     .
      k   213    63     1     .     .     1    85     1     .     .    25     .     .     .     .    27     2     .     .     .     8     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      l  1279   170   125     .     2    71   230    33     4     .   118     .     9   193     9     2    98     2     .     .     2    14    14     1    11     .   168     .     .     .     .     .     .     2     1
      m   865   122   121    13     .     .   220     1     .     .    98     .     .     3     7     3    83    26     .    58     8     .    35     .     .     .    67     .     .     .     .     .     .     .     .
      n  2001   509    29     1    80   344   145     6   286     1    60     1    15    22     2    26   130     1     3     2    66   227    15    12     .     1    17     .     .     .     .     .     .     .     .
      o  2249   291     9     8    24    33     5   195    10    44    50     .    38    66   137   258   115    25     .   211    63   115   369    31   149     .     3     .     .     .     .     .     .     .     .
      p   482    86    42     3     .     .    87     .     .     5    16     .     .    57     .     .    57    33     .    57    20     9    10     .     .     .     .     .     .     .     .     .     .     .     .
      q    23     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    23     .     .     .     .     .     .     .     .     .     .     .     .
      r  1806   444    92     4    10    67   388     5    14     5   119     .    19    27    24    38   144    44     .    34   146    57    25     8     4     .    88     .     .     .     .     .     .     .     .
      s  1933   741    78     8    21     .   212     .     .   184    93     .    18    16    14     2    85    41     2     1   101   248    56     .     5     .     7     .     .     .     .     .     .     .     .
      t  2547   657    90     .    10     .   274     4     .   772   143     .     .    61     2     4   254     .     .    56    53    65    43     .    13     .    46     .     .     .     .     .     .     .     .
      u   878    77    19    11    20    23    26     1    47     .    19     .     .    90    13   107     1    54     .   134   130   101     .     3     .     2     .     .     .     .     .     .     .     .     .
      v   319     2    13     .     .     .   242     .     .     .    45     .     .     .     .     .    17     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      w   746    87   186     .     .     2   116     3     .   100   126     .     .     4     .    40    58     .     .    21     1     .     1     .     .     .     1     .     .     .     .     .     .     .     .
      x    45     4     7     .    11     .     1     .     .     1     2     .     .     .     .     .     .    10     1     .     .     8     .     .     .     .     .     .     .     .     .     .     .     .     .
      y   738   460     2     .     1     2    38     1     .     .    13     .     .    13     .    52   122     3     .     1    26     3     .     1     .     .     .     .     .     .     .     .     .     .     .
      z     7     1     3     .     .     .     2     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      0     1     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      1     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .
      2     1     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      3     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .
      5     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .
      6     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     2     .     .     .     .     .     .     .     .     .     .     .     .     .
      7     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 36813  6929  2413   359   694  1405  3710   652   577  1813  2061    61   213  1279   865  2001  2249   482    23  1806  1933  2547   878   319   746    45   738     7     1     1     1     1     1     2     1

    Next-symbol probability (× 99):

        TT  .  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z  0  1  2  3  5  6  7
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  . 11  4  4  3  2  3  1  7  8  1  1  3  6  2  6  2  .  2  8 12  1  1  7  .  2  .  .  .  .  .  .  .  .
      a 99 11  .  2  3  5  .  1  2  .  4  .  1  7  2 16  .  1  . 10 11 13  1  3  2  .  3  .  .  .  .  .  .  .  .
      b 99  1 11  1  .  . 32  .  .  .  3  1  .  9  .  . 13  .  .  6  3  1 12  .  .  .  7  .  .  .  .  .  .  .  .
      c 99  1 15  .  2  . 17  .  . 13  6  .  5  3  .  . 16  .  .  5  .  6  4  .  .  .  7  .  .  .  .  .  .  .  .
      d 99 62  3  .  .  1 10  .  .  .  7  .  .  1  1  1  7  .  .  3  2  .  1  .  .  .  1  .  .  .  .  .  .  .  .
      e 99 34  4  .  2 11  3  1  .  .  1  .  .  3  2  8  .  1  . 14  6  4  .  2  1  1  1  .  .  .  .  .  .  .  .
      f 99 33  6  .  .  . 12  6  .  .  7  .  .  2  .  . 13  .  .  9  .  5  5  .  .  .  .  .  .  .  .  .  .  .  .
      g 99 34  4  .  .  . 12  .  1 15  4  .  .  9  1  1  6  .  .  6  4  .  2  .  .  .  .  .  .  .  .  .  .  .  .
      h 99 10 18  .  .  . 42  .  .  . 12  .  .  .  .  2 10  .  .  1  .  3  1  .  .  .  .  .  .  .  .  .  .  .  .
      i 99  8  3  1  4  4  4  3  3  .  .  .  1  5  5 27  4  .  .  3 11 12  .  2  .  .  .  .  .  .  .  .  .  .  .
      j 99  .  3  .  .  .  6  .  .  .  .  .  .  .  .  . 63  .  .  .  .  . 26  .  .  .  .  .  .  .  .  .  .  .  .
      k 99 29  .  .  .  . 40  .  .  . 12  .  .  .  . 13  1  .  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      l 99 13 10  .  .  5 18  3  .  .  9  .  1 15  1  .  8  .  .  .  .  1  1  .  1  . 13  .  .  .  .  .  .  .  .
      m 99 14 14  1  .  . 25  .  .  . 11  .  .  .  1  .  9  3  .  7  1  .  4  .  .  .  8  .  .  .  .  .  .  .  .
      n 99 25  1  .  4 17  7  . 14  .  3  .  1  1  .  1  6  .  .  .  3 11  1  1  .  .  1  .  .  .  .  .  .  .  .
      o 99 13  .  .  1  1  .  9  .  2  2  .  2  3  6 11  5  1  .  9  3  5 16  1  7  .  .  .  .  .  .  .  .  .  .
      p 99 18  9  1  .  . 18  .  .  1  3  .  . 12  .  . 12  7  . 12  4  2  2  .  .  .  .  .  .  .  .  .  .  .  .
      q 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .
      r 99 24  5  .  1  4 21  .  1  .  7  .  1  1  1  2  8  2  .  2  8  3  1  .  .  .  5  .  .  .  .  .  .  .  .
      s 99 38  4  .  1  . 11  .  .  9  5  .  1  1  1  .  4  2  .  .  5 13  3  .  .  .  .  .  .  .  .  .  .  .  .
      t 99 26  3  .  .  . 11  .  . 30  6  .  .  2  .  . 10  .  .  2  2  3  2  .  1  .  2  .  .  .  .  .  .  .  .
      u 99  9  2  1  2  3  3  .  5  .  2  .  . 10  1 12  .  6  . 15 15 11  .  .  .  .  .  .  .  .  .  .  .  .  .
      v 99  1  4  .  .  . 75  .  .  . 14  .  .  .  .  .  5  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      w 99 12 25  .  .  . 15  .  . 13 17  .  .  1  .  5  8  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      x 99  9 15  . 24  .  2  .  .  2  4  .  .  .  .  .  . 22  2  .  . 18  .  .  .  .  .  .  .  .  .  .  .  .  .
      y 99 62  .  .  .  .  5  .  .  .  2  .  .  2  .  7 16  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      z 99 14 42  .  .  . 28  .  .  . 14  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      0 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      1 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .
      2 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      3 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .
      5 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .
      6 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .
      7 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 19  6  1  2  4 10  2  2  5  6  0  1  3  2  5  6  1  0  5  5  7  2  1  2  0  2  0  0  0  0  0  0  0  0

    Previous-symbol probability (× 99):

        TT  .  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z  0  1  2  3  5  6  7
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 19  . 33 69 43 17  4 36 18 28 27 93 17 15 48  7 17 36 65  9 27 34 10 20 65  . 20  .  . 99  . 99 99  .  .
      a  6  4  . 11 10  8  .  5  7  .  5  . 12 13  6 19  .  7  . 14 14 12  4 24  6  . 10 42  .  .  .  .  .  .  .
      b  1  .  2  1  .  .  3  .  .  .  1  5  .  3  .  .  2  .  .  1  1  .  5  .  .  .  3  .  .  .  .  .  .  .  .
      c  2  .  4  .  2  .  3  .  .  5  2  . 16  1  .  .  5  .  4  2  .  2  3  .  .  .  7  .  .  .  .  .  .  .  .
      d  4 12  1  .  .  1  4  .  1  .  5  .  .  1  1  .  4  .  .  2  1  .  1  .  .  .  2  .  .  .  .  .  .  .  .
      e 10 18  6  1  8 28  3  6  1  .  1  .  1 10  9 16  .  5  4 29 11  5  . 23  3 90  3 14  .  .  .  .  .  .  .
      f  2  3  2  .  .  .  2  6  .  .  2  .  .  1  .  .  4  .  .  3  .  1  3  .  .  .  .  .  .  .  .  .  .  .  .
      g  2  3  1  .  .  .  2  .  1  5  1  .  .  4  .  .  2  .  .  2  1  .  1  .  .  .  .  .  .  .  .  .  .  .  .
      h  5  3 13  .  .  . 20  .  .  . 10  .  .  .  .  2  8  .  .  1  .  2  2  .  .  .  1  .  .  .  .  .  .  .  .
      i  6  2  3  4 10  6  2  8 10  .  .  .  7  8 11 28  4  2  .  4 12 10  . 14  .  2  . 42  .  .  .  .  .  .  .
      j  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .
      k  1  1  .  .  .  .  2  .  .  .  1  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      l  3  2  5  .  .  5  6  5  1  .  6  .  4 15  1  .  4  .  .  .  .  1  2  .  1  . 23  .  .  .  .  .  . 99 99
      m  2  2  5  4  .  .  6  .  .  .  5  .  .  .  1  .  4  5  .  3  .  .  4  .  .  .  9  .  .  .  .  .  .  .  .
      n  5  7  1  . 11 24  4  1 49  .  3  2  7  2  .  1  6  . 13  .  3  9  2  4  .  2  2  .  .  .  .  .  .  .  .
      o  6  4  .  2  3  2  . 30  2  2  2  . 18  5 16 13  5  5  . 12  3  4 42 10 20  .  .  .  .  .  .  .  .  .  .
      p  1  1  2  1  .  .  2  .  .  .  1  .  .  4  .  .  3  7  .  3  1  .  1  .  .  .  .  .  .  .  .  .  .  .  .
      q  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .
      r  5  6  4  1  1  5 10  1  2  .  6  .  9  2  3  2  6  9  .  2  7  2  3  2  1  . 12  .  .  .  .  .  .  .  .
      s  5 11  3  2  3  .  6  .  . 10  4  .  8  1  2  .  4  8  9  .  5 10  6  .  1  .  1  .  .  .  .  .  .  .  .
      t  7  9  4  .  1  .  7  1  . 42  7  .  .  5  .  . 11  .  .  3  3  3  5  .  2  .  6  .  .  .  .  .  .  .  .
      u  2  1  1  3  3  2  1  .  8  .  1  .  .  7  1  5  . 11  .  7  7  4  .  1  .  4  .  .  .  .  .  .  .  .  .
      v  1  .  1  .  .  .  6  .  .  .  2  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      w  2  1  8  .  .  .  3  .  .  5  6  .  .  .  .  2  3  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      x  0  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  2  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y  2  7  .  .  .  .  1  .  .  .  1  .  .  1  .  3  5  1  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      z  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      0  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      1  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .
      2  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      3  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .
      5  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      6  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      7  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 4.094

    Next-symbol entropy: 3.334

  And now for Portuguese:
  
    cat port.txt \
      | tr ' -' '\012\012' \
      | tr -d '~' \
      | egrep -v '^[bcçdfghijklmnpqrstuvwxyz]$' \
      | head -6035 > .bar
    dicio-wc .bar

     lines   words     bytes file        
    ------ ------- --------- ------------
      6035    6035     36915 .bar

    cat .bar \
      | sed -e 's/$/./g' \
      | count-digraph-freqs \
          -v pad='.' \
          -v chars='.aàáâãbcçdeéêfghiíjklmnoóôõöpqrstuúüvwxyz0123456789' \
          -v showentropy=1
          
    Digraph counts:

           TT     .     a     à     á     â     ã     b     c     ç     d     e     é     ê     f     g     h     i     í     j     k     l     m     n     o     ó     ô     õ     ö     p     q     r     s     t     u     ú     ü     v     w     x     y     z
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      .  6035     .   662    15     2     3     .    44   518     .   903   542    99     .   226    65    17   129     .    11     4   141   182   205   304     3     .     .     .   452   223   125   405   304   253    11     .   186     1     .     .     .
      a  3604  1209     .     .     .     .     .    30   105   123   424     .     .     .     8    53     .    76     2     1     1   200   166   183    39     .     .     .     .    44     6   464   382    60    12     .     .    12     .     1     .     3
      à    15     6     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     9     .     .     .     .     .     .     .     .     .
      á    65    22     .     .     .     .     .     .     5     .     .     .     .     .     5     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    13     1     6     .     .     .    11     .     2     .     .
      â    42     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     3    39     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      ã   225     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .   225     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      b   192     .    34     .     .     .     .     .     1     .     6    15    19     .     .     .     .    25     .    18     .     9     .     .    17     .     .     .     .     .     .    23    11     7     5     .     .     2     .     .     .     .
      c  1148     5   191     .     2     1     .     .     .     1     .   241     .     9     .     .    11   159     4     .     .     8     .     6   436     .     .     .     .     .     .    21     .    16    36     1     .     .     .     .     .     .
      ç   216     .    17     .     .     .   129     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .    14     .     .    56     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      d  1781     8   396     .     1     .     .     .     .     .     .   723     2     2     .     8     .   129     .    13     .     .     1     .   373     .     .     .     .     .     .    75     .     .    50     .     .     .     .     .     .     .
      e  3859  1236     8     .     .     .     .     1    51    13    44     .     .     .    43    46     .    75     1    50     .   157   278   438    27     .     .     .     .    58    16   324   740    87    10     .     .    26     .   116     .    14
      é   250   101     .     .     .     .     .     1     8     .     1     .     .     .     .     .     .     3     .     .     .     .    26     .     .     .     .     .     .     .     .    90     1    18     .     .     .     .     .     .     .     1
      ê    35     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     3    23     .     .     .     .     .     .     .     .     9     .     .     .     .     .     .     .     .     .
      f   436     4    78     .     4     .     .     .     .     .     .    33     1     .     .     .     .   156    56     .     .     .     .     .    57     .     .     .     .     .     .    13     .     1    32     .     .     .     .     .     1     .
      g   431     6    12     .     .     .     .     .     .     .     .   108     1     .     .     .     .    90     .     .     .     1     4     3    32     .     .     .     .     .     .    39     .     .   134     .     1     .     .     .     .     .
      h   168     1    44     .     .     .     .     .     .     .     .     5     .     .     .     .     .     1     .     .     .     .     .     .   113     .     .     .     .     .     .     .     .     .     4     .     .     .     .     .     .     .
      i  1911    24   131     .     3    35     7     6   232    25   136   106     1     1    43    88     .     .     .     4     .    98   117   262    64     .     .     .     .    29     .    96   224    94     2     1     .    40     .     6     .    36
      í   101     1     .     .     .     .     .     .    56     .     3     .     .     .     1     7     .     .     .     .     .     .     3     8     .     .     .     .     .     4     .     .     4     1     .     .     .    13     .     .     .     .
      j   118     .    64     .     9     .     .     .     .     .     .    26     .     .     .     .     .     .     .     .     .     .     .     .     3     .     .     .     .     .     .     .     .     .    16     .     .     .     .     .     .     .
      k     6     .     2     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     1     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      l  1008   105   292     .     3     .     .     .     3     1     5   131     6     2     2    15   103    83     9     .     .     1    20     .   123    12     .     .     .     .     8     .     1    16    57     .     .    10     .     .     .     .
      m  1435   460   257     .     5     .     .    27     .     .     .   222    19     .     .     .     .    65     3     .     .     .     .     .   232     .     .     .     1   130     .     .     .     .    11     3     .     .     .     .     .     .
      n  1448     3   146     .     .     .    28     .    70    42    87    90     1     .    20    79    37    91     1    10     .     .     .     .   125     7     9     .     .     .     .     .    50   503    24    10     .    15     .     .     .     .
      o  3054  1200     2     .     2     .     .    59    21     .   125     2     .     .     1    30     .    42     .     5     .   131   248   198     5     .     .     .     .    48     .   289   540    36    56     .     .    14     .     .     .     .
      ó    47     7     .     .     .     .     .     2     .     .     .     .     .     .     .    12     .     .     .     .     .     2     .     .     .     .     .     .     .     4     .     3     9     2     .     .     .     .     .     6     .     .
      ô     9     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     7     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      õ    59     .     .     .     .     .     .     .     .     .     .    59     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      ö     1     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      p   878     1   161     .     .     .     .     .     .     .     .   133     2     1     .     .     .     6     1     .     .   128     .     .   291     2     .     .     .     .     .   143     .     .     9     .     .     .     .     .     .     .
      q   267     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .   259     .     8     .     .     .     .     .
      r  2089   307   340     .    20     2     1     6    42     3    33   450     .    12    69    25     .   321     2     .     .     .    58    13   121    12     .     2     .     1     1    25    22   149    37     .     .    15     .     .     .     .
      s  2538  1271    83     .     2     .    48     .    25     .     1   286     1     .    14     .     .    67    12     .     .     .    53     .    74     9     .     1     .    42    13     .    87   316   122     .     .    11     .     .     .     .
      t  1673     .   434     .    12     1    12     .     6     .     .   351    13     3     .     .     .   249     5     .     .     .     7     .   271     2     .     .     .     .     .   259     .     .    47     .     .     .     1     .     .     .
      u  1176    45   138     .     .     .     .    15     1     8    13   201     .     .     3     3     .    45     5     6     .   124   251    61     .     .     .     .     .    65     .    86    39    50     .     .     .     .     .     2     .    15
      ú    26     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     5     7     7     .     .     .     .     .     .     .     1     4     2     .     .     .     .     .     .     .     .
      ü     9     .     .     .     .     .     .     .     .     .     .     2     .     5     .     .     .     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      v   355     .    68     .     .     .     .     .     .     .     .   116    85     .     1     .     .    62     .     .     .     1     .     .    22     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      w     2     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      x   133     .    12     .     .     .     .     .     4     .     .     5     .     .     .     .     .    21     .     .     .     .     .     .    85     .     .     .     .     1     .     .     .     5     .     .     .     .     .     .     .     .
      y     1     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      z    69    12    31     .     .     .     .     .     .     .     .    11     .     .     .     .     .    13     .     .     .     .     1     .     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 36915  6035  3604    15    65    42   225   192  1148   216  1781  3859   250    35   436   431   168  1911   101   118     6  1008  1435  1448  3054    47     9    59     1   878   267  2089  2538  1673  1176    26     9   355     2   133     1    69

    Next-symbol probability (× 99):

        TT  .  a  à  á  â  ã  b  c  ç  d  e  é  ê  f  g  h  i  í  j  k  l  m  n  o  ó  ô  õ  ö  p  q  r  s  t  u  ú  ü  v  w  x  y  z
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  . 11  .  .  .  .  1  8  . 15  9  2  .  4  1  .  2  .  .  .  2  3  3  5  .  .  .  .  7  4  2  7  5  4  .  .  3  .  .  .  .
      a 99 33  .  .  .  .  .  1  3  3 12  .  .  .  .  1  .  2  .  .  .  5  5  5  1  .  .  .  .  1  . 13 10  2  .  .  .  .  .  .  .  .
      à 99 40  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 59  .  .  .  .  .  .  .  .  .
      á 99 34  .  .  .  .  .  .  8  .  .  .  .  .  8  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 20  2  9  .  .  . 17  .  3  .  .
      â 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  7 92  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      ã 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      b 99  . 18  .  .  .  .  .  1  .  3  8 10  .  .  .  . 13  .  9  .  5  .  .  9  .  .  .  .  .  . 12  6  4  3  .  .  1  .  .  .  .
      c 99  . 16  .  .  .  .  .  .  .  . 21  .  1  .  .  1 14  .  .  .  1  .  1 38  .  .  .  .  .  .  2  .  1  3  .  .  .  .  .  .  .
      ç 99  .  8  .  .  . 59  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  6  .  . 26  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      d 99  . 22  .  .  .  .  .  .  .  . 40  .  .  .  .  .  7  .  1  .  .  .  . 21  .  .  .  .  .  .  4  .  .  3  .  .  .  .  .  .  .
      e 99 32  .  .  .  .  .  .  1  .  1  .  .  .  1  1  .  2  .  1  .  4  7 11  1  .  .  .  .  1  .  8 19  2  .  .  .  1  .  3  .  .
      é 99 40  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  1  .  .  .  . 10  .  .  .  .  .  .  .  . 36  .  7  .  .  .  .  .  .  .  .
      ê 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  8 65  .  .  .  .  .  .  .  . 25  .  .  .  .  .  .  .  .  .
      f 99  1 18  .  1  .  .  .  .  .  .  7  .  .  .  .  . 35 13  .  .  .  .  . 13  .  .  .  .  .  .  3  .  .  7  .  .  .  .  .  .  .
      g 99  1  3  .  .  .  .  .  .  .  . 25  .  .  .  .  . 21  .  .  .  .  1  1  7  .  .  .  .  .  .  9  .  . 31  .  .  .  .  .  .  .
      h 99  1 26  .  .  .  .  .  .  .  .  3  .  .  .  .  .  1  .  .  .  .  .  . 67  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .
      i 99  1  7  .  .  2  .  . 12  1  7  5  .  .  2  5  .  .  .  .  .  5  6 14  3  .  .  .  .  2  .  5 12  5  .  .  .  2  .  .  .  2
      í 99  1  .  .  .  .  .  . 55  .  3  .  .  .  1  7  .  .  .  .  .  .  3  8  .  .  .  .  .  4  .  .  4  1  .  .  . 13  .  .  .  .
      j 99  . 54  .  8  .  .  .  .  .  . 22  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  . 13  .  .  .  .  .  .  .
      k 99  . 33  .  .  .  .  .  .  .  . 17  .  .  .  .  .  .  .  . 17 33  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      l 99 10 29  .  .  .  .  .  .  .  . 13  1  .  .  1 10  8  1  .  .  .  2  . 12  1  .  .  .  .  1  .  .  2  6  .  .  1  .  .  .  .
      m 99 32 18  .  .  .  .  2  .  .  . 15  1  .  .  .  .  4  .  .  .  .  .  . 16  .  .  .  .  9  .  .  .  .  1  .  .  .  .  .  .  .
      n 99  . 10  .  .  .  2  .  5  3  6  6  .  .  1  5  3  6  .  1  .  .  .  .  9  .  1  .  .  .  .  .  3 34  2  1  .  1  .  .  .  .
      o 99 39  .  .  .  .  .  2  1  .  4  .  .  .  .  1  .  1  .  .  .  4  8  6  .  .  .  .  .  2  .  9 18  1  2  .  .  .  .  .  .  .
      ó 99 15  .  .  .  .  .  4  .  .  .  .  .  .  . 25  .  .  .  .  .  4  .  .  .  .  .  .  .  8  .  6 19  4  .  .  .  .  . 13  .  .
      ô 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 77 22  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      õ 99  .  .  .  .  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      ö 99  .  .  .  .  .  . 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      p 99  . 18  .  .  .  .  .  .  .  . 15  .  .  .  .  .  1  .  .  . 14  .  . 33  .  .  .  .  .  . 16  .  .  1  .  .  .  .  .  .  .
      q 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 96  .  3  .  .  .  .  .
      r 99 15 16  .  1  .  .  .  2  .  2 21  .  1  3  1  . 15  .  .  .  .  3  1  6  1  .  .  .  .  .  1  1  7  2  .  .  1  .  .  .  .
      s 99 50  3  .  .  .  2  .  1  .  . 11  .  .  1  .  .  3  .  .  .  .  2  .  3  .  .  .  .  2  1  .  3 12  5  .  .  .  .  .  .  .
      t 99  . 26  .  1  .  1  .  .  .  . 21  1  .  .  .  . 15  .  .  .  .  .  . 16  .  .  .  .  .  . 15  .  .  3  .  .  .  .  .  .  .
      u 99  4 12  .  .  .  .  1  .  1  1 17  .  .  .  .  .  4  .  1  . 10 21  5  .  .  .  .  .  5  .  7  3  4  .  .  .  .  .  .  .  1
      ú 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 19 27 27  .  .  .  .  .  .  .  4 15  8  .  .  .  .  .  .  .  .
      ü 99  .  .  .  .  .  .  .  .  .  . 22  . 55  .  .  . 22  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      v 99  . 19  .  .  .  .  .  .  .  . 32 24  .  .  .  . 17  .  .  .  .  .  .  6  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      w 99  . 50  .  .  .  .  .  .  .  .  .  .  .  .  .  . 50  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      x 99  .  9  .  .  .  .  .  3  .  .  4  .  .  .  .  . 16  .  .  .  .  .  . 63  .  .  .  .  1  .  .  .  4  .  .  .  .  .  .  .  .
      y 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      z 99 17 44  .  .  .  .  .  .  .  . 16  .  .  .  .  . 19  .  .  .  .  1  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 16 10  0  0  0  1  1  3  1  5 10  1  0  1  1  0  5  0  0  0  3  4  4  8  0  0  0  0  2  1  6  7  4  3  0  0  1  0  0  0  0

    Previous-symbol probability (× 99):

        TT  .  a  à  á  â  ã  b  c  ç  d  e  é  ê  f  g  h  i  í  j  k  l  m  n  o  ó  ô  õ  ö  p  q  r  s  t  u  ú  ü  v  w  x  y  z
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 16  . 18 99  3  7  . 23 45  . 50 14 39  . 51 15 10  7  .  9 66 14 13 14 10  6  .  .  . 51 83  6 16 18 21 42  . 52 50  .  .  .
      a 10 20  .  .  .  .  . 15  9 56 24  .  .  .  2 12  .  4  2  1 17 20 11 13  1  .  .  .  .  5  2 22 15  4  1  .  .  3  .  1  .  4
      à  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      á  0  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  3  .  1  .  .
      â  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      ã  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  7  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      b  1  .  1  .  .  .  .  .  .  .  .  .  8  .  .  .  .  1  . 15  .  1  .  .  1  .  .  .  .  .  .  1  .  .  .  .  .  1  .  .  .  .
      c  3  .  5  .  3  2  .  .  .  .  .  6  . 25  .  .  6  8  4  .  .  1  .  . 14  .  .  .  .  .  .  1  .  1  3  4  .  .  .  .  .  .
      ç  1  .  .  .  .  . 57  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 94  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      d  5  . 11  .  2  .  .  .  .  .  . 19  1  6  .  2  .  7  . 11  .  .  .  . 12  .  .  .  .  .  .  4  .  .  4  .  .  .  .  .  .  .
      e 10 20  .  .  .  .  .  1  4  6  2  .  .  . 10 11  .  4  1 42  . 15 19 30  1  .  .  .  .  7  6 15 29  5  1  .  .  7  . 86  . 20
      é  1  2  .  .  .  .  .  1  1  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  4  .  1  .  .  .  .  .  .  .  1
      ê  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      f  1  .  2  .  6  .  .  .  .  .  .  1  .  .  .  .  .  8 55  .  .  .  .  .  2  .  .  .  .  .  .  1  .  .  3  .  .  .  .  . 99  .
      g  1  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  5  .  .  .  .  .  .  1  .  .  .  .  .  .  2  .  . 11  . 11  .  .  .  .  .
      h  0  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      i  5  .  4  .  5 83  3  3 20 11  8  3  .  3 10 20  .  .  .  3  . 10  8 18  2  .  .  .  .  3  .  5  9  6  .  4  . 11  .  4  . 52
      í  0  .  .  .  .  .  .  .  5  .  .  .  .  .  .  2  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .
      j  0  .  2  . 14  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .
      k  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 17  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      l  3  2  8  .  5  .  .  .  .  .  .  3  2  6  .  3 61  4  9  .  .  .  1  .  4 25  .  .  .  .  3  .  .  1  5  .  .  3  .  .  .  .
      m  4  8  7  .  8  .  . 14  .  .  .  6  8  .  .  .  .  3  3  .  .  .  .  .  8  .  .  . 99 15  .  .  .  .  1 11  .  .  .  .  .  .
      n  4  .  4  .  .  . 12  .  6 19  5  2  .  .  5 18 22  5  1  8  .  .  .  .  4 15 99  .  .  .  .  .  2 30  2 38  .  4  .  .  .  .
      o  8 20  .  .  3  .  . 30  2  .  7  .  .  .  .  7  .  2  .  4  . 13 17 14  .  .  .  .  .  5  . 14 21  2  5  .  .  4  .  .  .  .
      ó  0  .  .  .  .  .  .  1  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .
      ô  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      õ  0  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      ö  0  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      p  2  .  4  .  .  .  .  .  .  .  .  3  1  3  .  .  .  .  1  .  . 13  .  .  9  4  .  .  .  .  .  7  .  .  1  .  .  .  .  .  .  .
      q  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 22  . 88  .  .  .  .  .
      r  6  5  9  . 30  5  .  3  4  1  2 12  . 34 16  6  . 17  2  .  .  .  4  1  4 25  .  3  .  .  .  1  1  9  3  .  .  4  .  .  .  .
      s  7 21  2  .  3  . 21  .  2  .  .  7  .  .  3  .  .  3 12  .  .  .  4  .  2 19  .  2  .  5  5  .  3 19 10  .  .  3  .  .  .  .
      t  4  . 12  . 18  2  5  .  1  .  .  9  5  8  .  .  . 13  5  .  .  .  .  .  9  4  .  .  .  .  . 12  .  .  4  .  .  . 50  .  .  .
      u  3  1  4  .  .  .  .  8  .  4  1  5  .  .  1  1  .  2  5  5  . 12 17  4  .  .  .  .  .  7  .  4  2  3  .  .  .  .  .  1  . 22
      ú  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      ü  0  .  .  .  .  .  .  .  .  .  .  .  . 14  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      v  1  .  2  .  .  .  .  .  .  .  .  3 34  .  .  .  .  3  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      w  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      x  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      z  0  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 4.137

    Next-symbol entropy: 3.094
          
  Finally, for Latin:
  
    cat latn-ock.txt \
      | sed \
         -e 's/^Discipulus://' \
         -e 's/^Magister://' \
      | tr '.,;:?\!()"-' '         ' \
      | tr ' ' '\012' \
      | egrep '.' \
      | head -5338 \
      | tr 'A-Z' 'a-z' \
      > .foo

    cat .foo \
      | tr ' ' '\012' \
      | sed -e 's/$/./g' \
      | count-digraph-freqs \
          -v pad='.' \
          -v chars='.abcdefghijklmnopqrstuvwxyz0123456789' \
          -v showentropy=1      

    Digraph counts:

           TT     .     a     b     c     d     e     f     g     h     i     l     m     n     o     p     q     r     s     t     u     v     x     y     0     1     2     3     4     5     6     7     8     9
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      .  5338     .   459    42   414   272   675   134    30   123   418    70   165   185   116   605   398   162   448   189   213   184     .     .     .    22     6     .     1     1     2     2     2     .
      a  2328   277     .   119    59    99   165     5    13     4    33   242   207   171     .    93     4   171   121   446    88    10     1     .     .     .     .     .     .     .     .     .     .     .
      b   475    32    23     .     .     3   116     .     .     .    81     1     .     .    35     .     .    11    20     .   151     1     .     1     .     .     .     .     .     .     .     .     .     .
      c  1291    83   162     .    26     .   127     .     .    29   300    24     .     .   247     .     1    14     .    98   179     .     .     1     .     .     .     .     .     .     .     .     .     .
      d  1025   202    45     .     1     5   302     .     .     8   267     .     5     .    99     .     .     .     .     .    91     .     .     .     .     .     .     .     .     .     .     .     .     .
      e  3455   511    46    32   183    97     5    12   130     1    21   226   258   262    22    30    27   498   486   468    16    10   114     .     .     .     .     .     .     .     .     .     .     .
      f   224     .    19     .     .     .    29    20     .     .   133     1     .     .     8     .     .     1     .     .    13     .     .     .     .     .     .     .     .     .     .     .     .     .
      g   306     .    65     .     .     .    46     .     1     .    54     .     .    43    37     .     .    28     .     .    32     .     .     .     .     .     .     .     .     .     .     .     .     .
      h   192     7    72     .     .     .     2     .     .     .    31     .     .     .    25     .     .    26     .     .    29     .     .     .     .     .     .     .     .     .     .     .     .     .
      i  3880   426   214   171   274   163    67    21    51     2    73   152   131   464   170   167    32    49   423   486   252    86     6     .     .     .     .     .     .     .     .     .     .     .
      l  1078    47    61     .     1     .   187     .     1     .   473    96     1     .    45     .     .     .     .    53   109     3     .     1     .     .     .     .     .     .     .     .     .     .
      m  1514   856    81     8     1     .    65     .     .     .    82     .    77    45    96    60     2     .     .     .   134     6     .     1     .     .     .     .     .     .     .     .     .     .
      n  1880   229   107     .   158   136   154    16    35     .   280     .     .    21   159     .    27     .    95   308   139    15     .     1     .     .     .     .     .     .     .     .     .     .
      o  1773   267     1    22    68   139    14     1     3     4     .    77   184   336     .    99     4   252   130   153     1    15     .     3     .     .     .     .     .     .     .     .     .     .
      p  1239     3   179     .     .     .   199     .     .    10    60    68     .     .   220    14     .   311    36    59    80     .     .     .     .     .     .     .     .     .     .     .     .     .
      q   518     5     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .   513     .     .     .     .     .     .     .     .     .     .     .     .     .
      r  1909   350   209    14    18    24   374    10    31     .   422     .     3    25   117    20     .    27    34    70   128    33     .     .     .     .     .     .     .     .     .     .     .     .
      s  2331  1013    52     1    55    12   219     1     6     .   247     .    19     .    31    58    14     1   117   320   165     .     .     .     .     .     .     .     .     .     .     .     .     .
      t  2880   921   309     .     2     .   502     .     .    11   516     .     .     .   155     .     8    61     .     5   382     .     .     8     .     .     .     .     .     .     .     .     .     .
      u  2727    31   187    66    27    75    86     4     5     .   215   121   463   328   141    37     .   289   417   219     9     .     7     .     .     .     .     .     .     .     .     .     .     .
      v   363     .    35     .     .     .   117     .     .     .   166     .     .     .    41     .     .     .     .     .     3     .     1     .     .     .     .     .     .     .     .     .     .     .
      x   129    42     2     .     4     .     4     .     .     .     8     .     .     .     9    54     1     .     .     5     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      y    16     .     .     .     .     .     .     .     .     .     .     .     1     .     .     2     .     8     4     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      0     2     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      1    24     8     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     1     2     2     3     4     1     1     1     .
      2     8     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     1     .     1     .     .     .     .     1     1
      3     3     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      4     4     4     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      5     5     5     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      6     3     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      7     3     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      8     4     4     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      9     1     1     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 36928  5338  2328   475  1291  1025  3455   224   306   192  3880  1078  1514  1880  1773  1239   518  1909  2331  2880  2727   363   129    16     2    24     8     3     4     5     3     3     4     1

    Next-symbol probability (× 99):

        TT  .  a  b  c  d  e  f  g  h  i  l  m  n  o  p  q  r  s  t  u  v  x  y  0  1  2  3  4  5  6  7  8  9
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  .  9  1  8  5 13  2  1  2  8  1  3  3  2 11  7  3  8  4  4  3  .  .  .  .  .  .  .  .  .  .  .  .
      a 99 12  .  5  3  4  7  .  1  .  1 10  9  7  .  4  .  7  5 19  4  .  .  .  .  .  .  .  .  .  .  .  .  .
      b 99  7  5  .  .  1 24  .  .  . 17  .  .  .  7  .  .  2  4  . 31  .  .  .  .  .  .  .  .  .  .  .  .  .
      c 99  6 12  .  2  . 10  .  .  2 23  2  .  . 19  .  .  1  .  8 14  .  .  .  .  .  .  .  .  .  .  .  .  .
      d 99 20  4  .  .  . 29  .  .  1 26  .  .  . 10  .  .  .  .  .  9  .  .  .  .  .  .  .  .  .  .  .  .  .
      e 99 15  1  1  5  3  .  .  4  .  1  6  7  8  1  1  1 14 14 13  .  .  3  .  .  .  .  .  .  .  .  .  .  .
      f 99  .  8  .  .  . 13  9  .  . 59  .  .  .  4  .  .  .  .  .  6  .  .  .  .  .  .  .  .  .  .  .  .  .
      g 99  . 21  .  .  . 15  .  .  . 17  .  . 14 12  .  .  9  .  . 10  .  .  .  .  .  .  .  .  .  .  .  .  .
      h 99  4 37  .  .  .  1  .  .  . 16  .  .  . 13  .  . 13  .  . 15  .  .  .  .  .  .  .  .  .  .  .  .  .
      i 99 11  5  4  7  4  2  1  1  .  2  4  3 12  4  4  1  1 11 12  6  2  .  .  .  .  .  .  .  .  .  .  .  .
      l 99  4  6  .  .  . 17  .  .  . 43  9  .  .  4  .  .  .  .  5 10  .  .  .  .  .  .  .  .  .  .  .  .  .
      m 99 56  5  1  .  .  4  .  .  .  5  .  5  3  6  4  .  .  .  .  9  .  .  .  .  .  .  .  .  .  .  .  .  .
      n 99 12  6  .  8  7  8  1  2  . 15  .  .  1  8  .  1  .  5 16  7  1  .  .  .  .  .  .  .  .  .  .  .  .
      o 99 15  .  1  4  8  1  .  .  .  .  4 10 19  .  6  . 14  7  9  .  1  .  .  .  .  .  .  .  .  .  .  .  .
      p 99  . 14  .  .  . 16  .  .  1  5  5  .  . 18  1  . 25  3  5  6  .  .  .  .  .  .  .  .  .  .  .  .  .
      q 99  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 98  .  .  .  .  .  .  .  .  .  .  .  .  .
      r 99 18 11  1  1  1 19  1  2  . 22  .  .  1  6  1  .  1  2  4  7  2  .  .  .  .  .  .  .  .  .  .  .  .
      s 99 43  2  .  2  1  9  .  .  . 10  .  1  .  1  2  1  .  5 14  7  .  .  .  .  .  .  .  .  .  .  .  .  .
      t 99 32 11  .  .  . 17  .  .  . 18  .  .  .  5  .  .  2  .  . 13  .  .  .  .  .  .  .  .  .  .  .  .  .
      u 99  1  7  2  1  3  3  .  .  .  8  4 17 12  5  1  . 10 15  8  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      v 99  . 10  .  .  . 32  .  .  . 45  .  .  . 11  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .
      x 99 32  2  .  3  .  3  .  .  .  6  .  .  .  7 41  1  .  .  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y 99  .  .  .  .  .  .  .  .  .  .  .  6  .  . 12  . 50 25  6  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      0 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      1 99 33  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  4  4  8  8 12 17  4  4  4  .
      2 99 37  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 12 12  . 12  .  .  .  . 12 12
      3 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      4 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      5 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      6 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      7 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      8 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      9 99 99  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 14  6  1  3  3  9  1  1  1 10  3  4  5  5  3  1  5  6  8  7  1  0  0  0  0  0  0  0  0  0  0  0  0

    Previous-symbol probability (× 99):

        TT  .  a  b  c  d  e  f  g  h  i  l  m  n  o  p  q  r  s  t  u  v  x  y  0  1  2  3  4  5  6  7  8  9
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 14  . 20  9 32 26 19 59 10 63 11  6 11 10  6 48 76  8 19  6  8 50  .  .  . 91 74  . 25 20 66 66 50  .
      a  6  5  . 25  5 10  5  2  4  2  1 22 14  9  .  7  1  9  5 15  3  3  1  .  .  .  .  .  .  .  .  .  .  .
      b  1  1  1  .  .  .  3  .  .  .  2  .  .  .  2  .  .  1  1  .  5  .  .  6  .  .  .  .  .  .  .  .  .  .
      c  3  2  7  .  2  .  4  .  . 15  8  2  .  . 14  .  .  1  .  3  6  .  .  6  .  .  .  .  .  .  .  .  .  .
      d  3  4  2  .  .  .  9  .  .  4  7  .  .  .  6  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .
      e  9  9  2  7 14  9  .  5 42  1  1 21 17 14  1  2  5 26 21 16  1  3 87  .  .  .  .  .  .  .  .  .  .  .
      f  1  .  1  .  .  .  1  9  .  .  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      g  1  .  3  .  .  .  1  .  .  .  1  .  .  2  2  .  .  1  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .
      h  1  .  3  .  .  .  .  .  .  .  1  .  .  .  1  .  .  1  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .
      i 10  8  9 36 21 16  2  9 17  1  2 14  9 24  9 13  6  3 18 17  9 23  5  .  .  .  .  .  .  .  .  .  .  .
      l  3  1  3  .  .  .  5  .  .  . 12  9  .  .  3  .  .  .  .  2  4  1  .  6  .  .  .  .  .  .  .  .  .  .
      m  4 16  3  2  .  .  2  .  .  .  2  .  5  2  5  5  .  .  .  .  5  2  .  6  .  .  .  .  .  .  .  .  .  .
      n  5  4  5  . 12 13  4  7 11  .  7  .  .  1  9  .  5  .  4 11  5  4  .  6  .  .  .  .  .  .  .  .  .  .
      o  5  5  .  5  5 13  .  .  1  2  .  7 12 18  .  8  1 13  6  5  .  4  . 19  .  .  .  .  .  .  .  .  .  .
      p  3  .  8  .  .  .  6  .  .  5  2  6  .  . 12  1  . 16  2  2  3  .  .  .  .  .  .  .  .  .  .  .  .  .
      q  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 19  .  .  .  .  .  .  .  .  .  .  .  .  .
      r  5  6  9  3  1  2 11  4 10  . 11  .  .  1  7  2  .  1  1  2  5  9  .  .  .  .  .  .  .  .  .  .  .  .
      s  6 19  2  .  4  1  6  .  2  .  6  .  1  .  2  5  3  .  5 11  6  .  .  .  .  .  .  .  .  .  .  .  .  .
      t  8 17 13  .  .  . 14  .  .  6 13  .  .  .  9  .  2  3  .  . 14  .  . 50  .  .  .  .  .  .  .  .  .  .
      u  7  1  8 14  2  7  2  2  2  .  5 11 30 17  8  3  . 15 18  8  .  .  5  .  .  .  .  .  .  .  .  .  .  .
      v  1  .  1  .  .  .  3  .  .  .  4  .  .  .  2  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .
      x  0  1  .  .  .  .  .  .  .  .  .  .  .  .  1  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      0  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      1  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 50  4 25 66 74 79 33 33 25  .
      2  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 50  4  . 33  .  .  .  . 25 99
      3  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      4  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      5  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      6  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      7  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      8  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      9  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 4.023

    Next-symbol entropy: 3.255

  Summarizing, here are the letter frequency tables for Voynichese (bio/EVA),
  English (LC), and Latin (LC):
  
    Ra  Friedm.  Currier  English  Latin
    nk  L  freq  L  freq  r  freq  r  freq
    --  - -----  - -----  - -----  - -----
    01  .  5910  .  6141  .  6929  .  5338
    02  e  3974  e  3821  e  3710  i  3880
    03  o  3635  o  3619  t  2547  e  3455
    04  y  3481  y  3468  a  2413  t  2880
    05  h  2605  h  2625  o  2249  u  2727
    06  d  2534  d  2534  i  2061  s  2331
    07  l  2172  l  2157  n  2001  a  2328
    08  k  2026  k  2002  s  1933  r  1909
    09  a  1792  a  1865  h  1813  n  1880
    10  c  1626  c  1633  r  1806  o  1773
    11  q  1565  i  1634  d  1405  m  1514
    12  i  1361  q  1547  l  1279  c  1291
    13  s  1321  s  1352  u   878  p  1239
    14  t   928  t   928  m   865  l  1078
    15  n   861  n   860  w   746  d  1025
    16  r   837  r   845  y   738  q   518
    17  p   192  p   192  c   694  b   475
    18  m    56  m    72  f   652  v   363
    19  f    33  f    27  g   577  g   306
    20  g    11  -     -  p   482  f   224
    21  -     -  -     -  b   359  h   192
    22  -     -  -     -  v   319  x   129
    23  -     -  -     -  k   213  y    16
    24  -     -  -     -  j    61  -     -
    25  -     -  -     -  x    45  -     -
    26  -     -  -     -  q    23  -     -
    27  -     -  -     -  z     7  -     -
    --  - -----  - -----  - -----  - -----
    xx  T 36920  T 37322  T 36805  L 36871
    xx  h 3.826  h 3.827  h 4.092  1 4.008

    gnuplot
    set term x11
    # set term pbm color medium
    # set output ".letfreqs.ppm"
    set xrange [-1:28]
    plot \
      ".letfreqs" using 01:03 title "Friedman" with boxes,\
      ".letfreqs" using 04:06 title "Currier"  with boxes,\
      ".letfreqs" using 07:09 title "English"  with boxes,\
      ".letfreqs" using 10:12 title "Latin"    with boxes
    pause 120
    plot \
      ".letfreqs" using 03 title "Friedman" with steps,\
      ".letfreqs" using 06 title "Currier"  with steps,\
      ".letfreqs" using 09 title "English"  with steps,\
      ".letfreqs" using 12 title "Latin"    with steps

97-11-11 stolfi
===============

  Perhaps the "d" should be included in the suffix set. Let's 
  see whether it can be prededed by suffixy letters:
  
    cat bio-f-eva-gut.wds \
      | sed \
          -e 's/^/ /' \
          -e 's/\(.d\)/ \1 /g' \
      | tr ' ' '\012' \
      | egrep -e 'd' \
      | sort | uniq -c | expand | sort +0 -1nr \
      > .foo

  Result:
  
     1733 ed
      507 d
      145 hd
       73 ld
       27 od
       13 yd
        7 ad

  Let's look at the '[loya]d' words:
  
    foreach f ( l o y a )
      echo '=== '$f
      cat bio-f-eva-gut.dic \
        | egrep -e "$f"'d'
    end

    al.al.dy al.dy che.ol.dy ch.l.d.aiin d.air.ol.dy d.al.dy
    d.ol.dy k.al.dy k.ol.dy l.d l.d.aiin l.d.al ld.al.or l.d.ar
    ld.chey ld.dy l.d.ol l.dy l.ke.ol.dy l.l.d.ar l.ol.dy ok.al.dy
    okee.dy.l.dy ok.ol.dy ol.d ol.d.a ol.d.air ol.d.y ol.k.ol.d.y ot.al.d.y ote.ol.d.y
    otoldy p.ol.d.ak.y p.ol.d.she.dy pshe.al.dy pshe.ol.dy qokaldy qokoldy
    qok.y.l.d.d.y q.ol.d.y qot.al.d.y rche.al.d shed.al.d.y sheol.d.y sh.l.d.y sh.ol.d.y s.ol.d.y
    t.ol.d.al yshe.al.d.y

    chckh.od.y che.od.y d.ar.od.y lk.od.al l.od od.aiin od.al od.ar odched.y oddche.y
    od.y ot.ar.od.l otee.od.y p.ar.od.y qod.aiin qod.ar qod.che.d qod.ee.d.y qodee.y
    qod.y qod.yke.y sh.od.y s.od.ar

    chckh.yd che.d.che.yd.aiin d.air.yd.y dsh.ol.yd olt.yd.y qok.yd.y yd.aiin yd.air.ol
    yd.ar.al yd.ar.she.y yd.y

    ch.ad.y d.ar.ad.y ok.ad.y ot.ad.y qok.ad.y t.ad.y

  so it seems that -od -ad -ld -yd are letters, as well as qod

  Let's look again at suffixes, ab initio:
  
    cat bio-f-eva-gut.wds \
      | sed \
          -e 's/\([[oaydirslmn]*\)$/- -\1/' \
      | egrep -e '- -' \
      | gawk '/./ {print $2}' \
      | sort | uniq -c | expand | sort +0 -1nr \
      > .foo
      
  It seems that the following is the "suffix alphabet":
  
    -[aoydlsm] -i*n -i*r

97-11-16 stolfi
===============

  While waiting for a big compile, I began splitting Jim Reeds's 
  mail archives into separate messages:
  
    cd ../docs
    mkdir email-arch
    mv email-BIG[123]* email-arch
    cd email-arch
    
  I then split the mail forlders to individual files using MH.
  Must clean them, convert to HTML, and add links.
  
  Adding keys for links:
  
    <<stats>>   statistics and structural analysis
    <<reads>>   transcribed text and corrections
    <<txorg>>   discussion on file organization, format, and logistics
    <<email>>   mailing list administration
    <<alpha>>   discussion about alphabet
    <<histo>>   historical and cultural issues
    <<bibli>>   bibliographic references
    <<folks>>   about people
    <<softw>>   software
    <<crypt>>   discussions related to crypto hypothesis
    <<physi>>   physical format and properties of the book
    <<picts>>   pictures and discussion thereabout
    <<jokes>>   humor (?)

97-11-20 stolfi
===============

  At Rene's suggestion, I have plotted Rayman's counts
  of  distinct characters per page:
  
    cat rayman-char-counts.txt | tr ',' ' ' > .tmp
    
    gnuplot <<EOF
    set term pbm color
    set output "rayman-char-counts.ppm"
    set xrange [0:500]
    set xlabel "total chars"
    set yrange [0:40]
    set ylabel "distinct chars"
    plot ".tmp" \
      using 2:3 title "f1r--f51v" with points
    quit
    EOF
  
    ppmtogif < rayman-char-counts.ppm > rayman-char-counts.gif
    xv rayman-char-counts.{ppm,gif}
    
97-11-22 stolfi
===============

  Let's compute the number of EVA characters per page in each transcription:
  
    cat L16-eva/INDEX \
      | sed -e 's/:.*$//g' \
      > all.units

    set u = ( `cat all.units | sed -e 's/^/L16-eva\//g'` )
    
    cat ${u} \
      | count-bytes-per-scribe \
         -v scribes='BCDFGIJKLQRTUZ' \
      > .bytes-per-page-and-scribe
      
  On a different track, I wrote a distribution-sorting program
  (sort-distr.c) and used it to sort the new label location maps
  (Note-010.txt, in preparation).
      
97-11-23 stolfi
===============

  Let's have another quick look at the A/B differences in midfix
  frequencies of Note-009.txt.  perhaps the difference
  will become sharper (or disappear) if we collapse k/t
  and replace ch=sh=ee, cth=ete, etc.
  
    foreach guy ( Friedman.f Currier.c )
      foreach lang ( A.a B.b )
        cat Note-009/he${lang:e}-${guy:e}.factored \
          | grep -v -e '- -' \
          | eva2erb \
          | sort | uniq -c | expand | sort +0 -1nr \
          > .he${lang:e}-${guy:e}-unifs-all.frq

        cat Note-009/he${lang:e}-${guy:e}.factored \
          | grep -e '- -' \
          | gawk '/./ {print $1}' \
          | eva2erb \
          | sort | uniq -c | expand | sort +0 -1nr \
          > .he${lang:e}-${guy:e}-prefs-all.frq

        cat Note-009/he${lang:e}-${guy:e}.factored \
          | grep -e '- -' \
          | gawk '/./ {print $2}' \
          | eva2erb \
          | sort | uniq -c | expand | sort +0 -1nr \
          > .he${lang:e}-${guy:e}-midfs-all.frq

        cat Note-009/he${lang:e}-${guy:e}.factored \
          | grep -e '- -' \
          | gawk '/./ {print $3}' \
          | eva2erb \
          | sort | uniq -c | expand | sort +0 -1nr \
          > .he${lang:e}-${guy:e}-suffs-all.frq

        foreach elem ( pref midf suff unif )
          set file = "he${lang:e}-${guy:e}-${elem}s-all"
          echo "${file}.frq -> ${file}.fmt"
          cat .${file}.frq \
            | compute-freqs \
            | gawk '\
                  BEGIN {\
                    printf "by '"${guy:r}"'\nlanguage '"${lang:r}"'\n"; \
                    printf "freq pc '"${elem}"'ix\n---- -- ----------------\n";} \
                  /./   {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \
                  END   {printf "---- -- ----------------\n%4d 99 TOTAL\n",t;} \
                ' \
            > .${file}.fmt
        end
      end
    end  
    dicio-wc .he{a,b}-{f,c}-{pref,midf,suff,unif}s-all.fmt

    foreach elem ( pref midf suff unif )
      set tfiles = ( )
      foreach guy ( f c )
        foreach lang ( a b )
          set file = "he${lang}-${guy}-${elem}s-all"
          set tfiles = ( ${tfiles} .${file}.fmt )
        end
      end
      pr -m -t -i' '1 -w 88  ${tfiles} \
        | expand \
        > .herbal-${elem}-cmp.txt
    end
    dicio-wc .herbal-{pref,midf,suff,unif}-cmp.txt

  Looking at the suffix frequencies, it seems that the main difference 
  between A and B is that the latter uses "d" instead of some letter
  that should be in the midfix.  If we eliminate the "-do" suffix
  and renormalize, the 
  
    by Friedman           by Friedman           by Friedman          
    language A            language B            language B minus -do          
    freq pc suffix        freq pc suffix        freq pc suffix       
    ---- -- ------------- ---- -- ------------- ---- -- -------------
    2200 37 -o             642 26 -do            583 32 -o    
    1008 17 -ol            583 24 -o             254 14 -or          
     960 16 -or            254 10 -or            183 10 -ol          
     365  6 -oiin          183  8 -ol            145  8 -oiin        
     239  4 -odo           145  6 -oiin          116  6 -odo         
     127  2 -om            116  5 -odo            63  4 -            
     124  2 -               63  3 -               41  2 -om          
      85  1 -odoiin         41  2 -om             34  2 -doiin
    ---- -- ------------- ---- -- ------------- ---- -- -------------                                             
    5967 99 TOTAL         2431 99 TOTAL         1789 99 TOTAL                                                     
  
  I still can't see much resemblance in the midfixes.  Moreover the B
  midfixes seem longer on the average.  So it is not a matter of
  moving some suffix letter to the midfix.
  
  Basically the B language does not use the ckh/cth gallows.
  Perhaps ckh = ked, or something of the sort?
  
    by Friedman           by Friedman           by Currier            by Currier
    language A            language B            language A            language B
    freq pc midfix        freq pc midfix        freq pc midfix        freq pc midfix
    ---- -- ------------- ---- -- ------------- ---- -- ------------- ---- -- -------------
    1595 27 -ee-           590 24 -k-           1472 28 -ee-           404 21 -k-
     913 15 -k-            279 12 -eee-          865 16 -k-            240 13 -ke-
     856 14 -kee-          274 11 -ke-           705 13 -kee-          219 12 -eee-
     459  8 -eke-          269 11 -ee-           385  7 -eke-          202 11 -ee-
     418  7 -eee-          261 11 -kee-          316  6 -eee-          187 10 -kee-
     155  3 -ke-            76  3 -eeeke-        128  2 -eeok-          62  3 -eeeke-
     152  3 -keee-          72  3 -keee-         128  2 -keee-          48  3 -eeee-
     132  2 -eeok-          60  3 -eeek-         101  2 -ke-            48  3 -eeek-
     110  2 -pee-           49  2 -pee-          100  2 -pee-           44  2 -keee-
      99  2 -eeee-          48  2 -eeee-          75  1 -epe-           34  2 -pee-
      93  2 -ekee-          48  2 -p-             72  1 -eeee-          34  2 -peee-
      81  1 -epe-           39  2 -peee-          69  1 -eeeke-         33  2 -eke-
      73  1 -eeeke-         33  1 -eke-           66  1 -eeokee-        32  2 -p-
      60  1 -p-             25  1 -ekee-          56  1 -ekee-          23  1 -ekee-
      57  1 -eek-           24  1 -eek-           55  1 -p-             18  1 -eek-
      55  1 -eeokee-        20  1 -eeekee-        52  1 -eek-           16  1 -eeeeke-

  Well, since the prefixes seem OK, let's compare the midfix+suffix together:
  
    foreach guy ( Friedman.f )
      foreach lang ( A.a B.b )
        cat Note-009/he${lang:e}-${guy:e}.factored \
          | grep -e '- -' \
          | gawk '/./ {print ($2 $3)}' \
          | sed -e 's/--//g' \
          | eva2erb \
          | sort | uniq -c | expand | sort +0 -1nr \
          > .he${lang:e}-${guy:e}-tails-all.frq

        foreach elem ( tail )
          set file = "he${lang:e}-${guy:e}-${elem}s-all"
          echo "${file}.frq -> ${file}.fmt"
          cat .${file}.frq \
            | compute-freqs \
            | gawk '\
                  BEGIN {\
                    printf "by '"${guy:r}"'\nlanguage '"${lang:r}"'\n"; \
                    printf "freq pc '"${elem}"'ix\n---- -- ----------------\n";} \
                  /./   {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \
                  END   {printf "---- -- ----------------\n%4d 99 TOTAL\n",t;} \
                ' \
            > .${file}.fmt
        end
      end
    end  
    dicio-wc .he{a,b}-{f}-{tail}s-all.fmt

    foreach elem ( tail )
      set tfiles = ( )
      foreach guy ( f )
        foreach lang ( a b )
          set file = "he${lang}-${guy}-${elem}s-all"
          set tfiles = ( ${tfiles} .${file}.fmt )
        end
      end
      pr -m -t -i' '1 -w 88  ${tfiles} \
        | expand \
        > .herbal-${elem}-cmp.txt
    end
    dicio-wc .herbal-{tail}-cmp.txt
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       774    3722     41887 .herbal-tail-cmp.txt

    by Friedman              by Friedman
    language A               language B
    freq pc tailix           freq pc tailix
    ---- -- ---------------- ---- -- ----------------
     404  7 -keeo             153  6 -kor
     395  7 -eeo              150  6 -kedo
     370  6 -eeol             129  5 -keedo
     337  6 -eeor             117  5 -eeedo
     197  3 -ko               108  4 -koiin
     189  3 -kol               88  4 -kol
     176  3 -ekeo              87  4 -eedo
     166  3 -eeeo              71  3 -keeo
     162  3 -koiin             65  3 -ko
     146  2 -keeor             55  2 -eeekeo
     132  2 -keeol             51  2 -eeeo
     119  2 -kor               31  1 -keeedo
      99  2 -keeeo             31  1 -keo
      91  2 -eeeor             30  1 -eeor
      81  1 -ekeor             30  1 -kom
      80  1 -ekeol             29  1 -eeeko
      78  1 -eeoiin            28  1 -eeo
      64  1 -eeodo             28  1 -keeeo
      61  1 -eeeeo             27  1 -eeol
      60  1 -eeoko             24  1 -eeodo
      57  1 -ekeeo             23  1 -keodo
      48  1 -eeeol             23  1 -koin
      47  1 -eeekeo            22  1 -eeeor
      44  1 -keol              21  1 -eeeodo
      42  1 -eeodoiin          21  1 -ekeo
      40  1 -eeoekeo           20  1 -kodo
      39  1 -eeokeeo           19  1 -eeeeo
      39  1 -keo               19  1 -peeedo
      37  1 -keeodo            18  1 -keol
      33  1 -eeom              18  1 -peedo
      30  1 -peeo              17  1 -eeeol
      29  1 -kom               16  1 -eeeekeo
      28  1 -keor              13  1 -eeko
      27  1 -peeor             12  1 -eeeedo
      24  0 -ekeodo            12  1 -eeekeeo
      24  0 -epeo              12  1 -ekeeo
      24  0 -kodo              12  1 -koldo
      23  0 -keeoiin           11  1 -eedoiin
      22  0 -eeoro             11  1 -eeekedo
      21  0 -eekeeo            11  1 -koir
      20  0 -ekeoiin           10  0 -ekeedo
      19  0 -eee               10  0 -k
      19  0 -k                 10  0 -keeodo
      19  0 -koldo             10  0 -kolo
      18  0 -eeoin              9  0 -keor
      17  0 -eeeko              9  0 -peeo
      17  0 -eeer               8  0 -eeed
      17  0 -eeod               8  0 -eeekoiin
      17  0 -ekeom              8  0 -por
      17  0 -kod                7  0 -eedol
      16  0 -eeekeeo            7  0 -eeedoiin
      16  0 -eeko               7  0 -eeee
      16  0 -eeolo              7  0 -eeek
      16  0 -eeon               7  0 -eeer
      16  0 -keeod              7  0 -eeoko
      16  0 -peeol              7  0 -ekedo
      15  0 -eeeodo             7  0 -koro
      15  0 -eeokol             7  0 -peeeo
      15  0 -eer                6  0 -eeekol
      15  0 -keeeeo             6  0 -ked
      14  0 -eekoiin            6  0 -kedoiin
      14  0 -eeoeeo             6  0 -keedor
      14  0 -eeokoiin           6  0 -keeol
      14  0 -epeol              6  0 -poiin
      14  0 -keeom              5  0 -eed
      13  0 -eeoldo             5  0 -eee
      13  0 -ekeeeo             5  0 -eeekeedo
      12  0 -ee                 5  0 -eeekeeeo
      12  0 -eeeer              5  0 -eeekor
      12  0 -eeeoiin            5  0 -kedor
      12  0 -eeoo               5  0 -keeor
      11  0 -ekeeor             5  0 -keer
      11  0 -keeeor             5  0 -kodoiin
      11  0 -keodo              5  0 -koror
      11  0 -kodoiin            5  0 -peeedor
      11  0 -koin               4  0 -eedom
      10  0 -eedo               4  0 -eedor
      10  0 -eekor              4  0 -eeedor
      10  0 -eeok               4  0 -eeeekeeo
      10  0 -eeokeo             4  0 -eeeer
      10  0 -ekeeol             4  0 -eeeoekeo
      10  0 -epeor              4  0 -eeepeedo
      10  0 -kee                4  0 -eeepo
      10  0 -keeeol             4  0 -eekoiin
      10  0 -koo                4  0 -eer
      10  0 -peeeo              4  0 -keed
       9  0 -eeeeor             4  0 -keeeeo
       9  0 -eeokor             4  0 -keeod
       9  0 -ekeeodo            4  0 -peedoiin
       8  0 -eeeeol             4  0 -peeol
       8  0 -eeeodoiin          3  0 -eedolo
       8  0 -eeeom              3  0 -eeedol
       8  0 -eeodol             3  0 -eeeeko
       8  0 -eeodor             3  0 -eeeked
       8  0 -eeokeeeo           3  0 -eeeod
       8  0 -eeokeeol           3  0 -eekedo
    ---- -- ---------------- ---- -- ---------------- 
    5967 99 TOTAL            2431 99 TOTAL            
    
  Inspired by Landini's paper, let me prepare a graph 
  of A-freq × B-freq for each segment:
  
    foreach guy ( Friedman.f )
      foreach elem ( pref midf suff unif tail )
        set pfile = "herbal-${guy:e}-${elem}s-all"
        set afile = "hea-${guy:e}-${elem}s-all"
        set bfile = "heb-${guy:e}-${elem}s-all"
        echo "${afile}.frq, ${bfile}.frq -> ${pfile}.plt"
        /n/gnu/bin/join \
            -a 1 -a 2 -e 0 \
            -j1 2 -j2 2 \
            -o1.1,2.1,0 \
            ${afile}.frq ${bfile}.frq \
          > .${pfile}.plt
        plot-lang-diffs ${guy:r} ${elem} ${pfile}.plt
      end
    end  
    dicio-wc .he-{f}-{tail}s-all.fmt  

97-11-24 stolfi
===============

  I stole some text in pinyin from http://www-personal.umich.edu/~wbaxter/,
  cleaned it some and saved it to chin-mch.txt.
  
  This is a bad sample: in the first statistics I ran, "zhong1 guo2"
  (China) came out neat the top. That's because half the sample is a
  Voice of America semi-political speech...
  
  So I removed all (but one) occurrences of "zhong1 guo2" from the sample.

  Let's run some statistics.  Fist, words overall:
  
    cat chin-mch.txt \
      | tr ' ' '\012' \
      | grep '.' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      | compute-freqs \
      > .chin.frq

    count freqy word 
    ----- ----- -----------
      244 0.065 de
      118 0.031 shi4
       78 0.021 ren2
       62 0.016 you3
       61 0.016 ta1
       55 0.015 xue2
       54 0.014 wen2
       50 0.013 shi2
       50 0.013 zai4
       42 0.011 guo2
       41 0.011 yi2
       40 0.011 yi4
       37 0.010 le
       35 0.009 ge
       34 0.009 shuo1
       33 0.009 bu4
      ... ..... .....

  Now, without tones:
  
    cat chin-mch.txt \
      | tr ' ' '\012' \
      | tr -d '0-9' \
      | grep '.' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      | compute-freqs \
      > .chin-notone.frq  
      
    count freqy word 
    ----- ----- -----------
      245 0.065 de
      223 0.059 shi
      129 0.034 yi
       93 0.025 ren
       71 0.019 you
       65 0.017 bu
       61 0.016 ta
       60 0.016 guo
       58 0.015 wen
       55 0.015 xue
       55 0.015 zi
       50 0.013 zai
       47 0.012 ji
       44 0.012 yu
       44 0.012 zhi
       43 0.011 ge
       40 0.011 mei

  Now for the initial consonant:
  
    cat chin-mch.txt \
      | tr ' ' '\012' \
      | tr -d '0-9' \
      | sed -e 's/[aeiouü].*$//g' \
      | grep '.' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      | compute-freqs \
      > .chin-initial.frq   
      
    count freqy word 
    ----- ----- -----------
      473 0.126 d
      402 0.107 y
      364 0.097 sh
      215 0.057 j
      209 0.056 x
      198 0.053 h
      197 0.053 zh
      178 0.047 g
      173 0.046 l
      166 0.044 z
      157 0.042 b
      138 0.037 w
      130 0.035 r
      130 0.035 t
       91 0.024 f
       90 0.024 m
       89 0.024 n
       89 0.024 q
       75 0.020 k
       74 0.020 ch

  Now for the final (vowels plus terminators):
  
    cat chin-mch.txt \
      | tr ' ' '\012' \
      | tr -d '0-9' \
      | sed -e 's/^[^aeiouü]*//g' \
      | grep '.' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      | compute-freqs \
      > .chin-final.frq  
      
    count freqy word 
    ----- ----- -----------
      654 0.173 i
      434 0.115 e
      311 0.082 u
      238 0.063 en
      179 0.047 ai
      168 0.045 uo
      145 0.038 ou
      130 0.034 a
      126 0.033 an
      123 0.033 ing
      122 0.032 ong
      118 0.031 ei
      113 0.030 ian
      109 0.029 eng
      102 0.027 ao
       98 0.026 ang
       73 0.019 ui
       67 0.018 ue
       59 0.016 iao

  Changing subject again, I have been looking at the differences between 
  languages A and B, particularly the tail (midfix+suffix) distribution.
  They really look like different languages.  Even taking into account
  possible letter confusion, there seems no simple correspondence 
  between the tails of one and those of the other.
  
  Just to be sure, let's try to recompute the tail distributions after collapsing 
  everything that could be equivalent:
  
    t,k ---------> t
    p,f ---------> p
    r,s ---------> e
    ei ----------> o
    o,a,y -------> o
    ch,sh -------> ee
    cth,ckh -----> tee
    cph,cfh -----> pee
    iiii,iii,ii -> i
    
    foreach lang ( a b )
      cat Note-009/he${lang}-f.factored \
        | sed \
            -e 's/sh/ee/g'   \
            -e 's/ch/ee/g'   \
            -e 's/s/e/g'     \
            -e 's/r/e/g'     \
            -e 's/k/t/g'     \
            -e 's/f/p/g'     \
            -e 's/cth/tee/g' \
            -e 's/ckh/tee/g' \
            -e 's/cph/pee/g' \
            -e 's/cfh/pee/g' \
            -e 's/ei/o/g'    \
            -e 's/a/o/g'     \
            -e 's/y/o/g'     \
            -e 's/iiii/i/g'  \
            -e 's/iii/i/g'   \
            -e 's/ii/i/g'    \
        > .he${lang}-f-ere.factored

      cat .he${lang}-f-ere.factored \
        | grep -e '- -' \
        | gawk '/./ {print $2}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-ere-midfs-all.frq

      cat .he${lang}-f-ere.factored \
        | grep -e '- -' \
        | gawk '/./ {print $3}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-ere-suffs-all.frq

      cat .he${lang}-f-ere.factored \
        | grep -e '- -' \
        | gawk '/./ {print ($2 $3)}' \
        | sed -e 's/--//g' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-ere-tails-all.frq
    end
    dicio-wc .he{a,b}-f-ere-{midf,suff,tail}s-all.frq
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       179     358      2964 .hea-f-ere-midfs-all.frq
       131     262      1815 .hea-f-ere-suffs-all.frq
       655    1310     10600 .hea-f-ere-tails-all.frq
       133     266      2169 .heb-f-ere-midfs-all.frq
        82     164      1118 .heb-f-ere-suffs-all.frq
       420     840      6716 .heb-f-ere-tails-all.frq
  
  
    foreach elem ( midf suff tail )
      foreach lang ( A.a B.b )
        set file = "he${lang:e}-f-ere-${elem}s-all"
        echo "${file}.frq -> ${file}.fmt"
        cat .${file}.frq \
          | compute-freqs \
          | gawk '\
                BEGIN {\
                  printf "by Friedman\nlanguage '"${lang:r}"'\n"; \
                  printf "freq pc '"${elem}"'ix\n---- -- ------------------\n";} \
                /./   {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \
                END   {printf "---- -- ------------------\n%4d 99 TOTAL\n",t;} \
              ' \
          > .${file}.fmt
      end
    end

    foreach elem ( midf suff tail )
      set tfiles = ( )
      foreach lang ( a b )
        set file = "he${lang}-f-ere-${elem}s-all"
        set tfiles = ( ${tfiles} .${file}.fmt )
      end
      pr -m -t -i' '1 -w 54  ${tfiles} \
        | expand \
        > .herbal-f-ere-${elem}-cmp.txt
    end
    dicio-wc .herbal-f-ere-{midf,suff,tail}-cmp.txt
    
  Here are the results:

    by Friedman                by Friedman
    language A                 language B
    freq pc midfix             freq pc midfix
    ---- -- ------------------ ---- -- ------------------
    1595 27 -ee-                590 24 -t-
    1313 22 -tee-               293 12 -tee-
     913 15 -t-                 279 12 -eee-
     419  7 -eee-               274 11 -te-
     241  4 -teee-              269 11 -ee-
     191  3 -pee-                95  4 -teee-
     155  3 -te-                 66  3 -eetee-
     132  2 -eeot-               60  3 -eeet-
     100  2 -eeee-               52  2 -pee-
     100  2 -eeotee-             49  2 -eeee-
      99  2 -eetee-              48  2 -p-
      60  1 -p-                  45  2 -peee-
      57  1 -eet-                25  1 -eeetee-
      46  1 -peee-               24  1 -eet-
      40  1 -teeee-              15  1 -eeete-
      24  0 -eeet-               14  1 -eeteee-
    .... .. .......            .... .. .....
    ---- -- ------------------ ---- -- ------------------
    5967 99 TOTAL              2431 99 TOTAL   
    
  Tails: 

    by Friedman                by Friedman
    language A                 language B
    freq pc tailix             freq pc tailix
    ---- -- ------------------ ---- -- ------------------
     579 10 -teeo               153  6 -toe
     395  7 -eeo                150  6 -tedo
     370  6 -eeol               135  6 -teedo
     337  6 -eeoe               131  5 -toin
     226  4 -teeoe              118  5 -eeedo
     212  4 -teeol               92  4 -teeo
     197  3 -to                  88  4 -tol
     189  3 -tol                 87  4 -eedo
     178  3 -toin                65  3 -to
     167  3 -eeeo                52  2 -eeteeo
     156  3 -teeeo               51  2 -eeeo
     119  2 -toe                 41  2 -teeedo
      96  2 -eeoin               39  2 -teeeo
      91  2 -eeeoe               31  1 -teo
    
  Hmm, it seems that scribe A does not use "d" in the suffixes
  very much.  Perhaps if we delete "d" we will get a better resemblance:
  
    foreach lang ( a b )
      cat Note-009/he${lang}-f.factored \
        | sed \
            -e 's/d//g'      \
            -e 's/sh/ee/g'   \
            -e 's/ch/ee/g'   \
            -e 's/s/e/g'     \
            -e 's/r/e/g'     \
            -e 's/k/t/g'     \
            -e 's/f/p/g'     \
            -e 's/cth/tee/g' \
            -e 's/ckh/tee/g' \
            -e 's/cph/pee/g' \
            -e 's/cfh/pee/g' \
            -e 's/ei/o/g'    \
            -e 's/a/o/g'     \
            -e 's/y/o/g'     \
            -e 's/iiii/i/g'  \
            -e 's/iii/i/g'   \
            -e 's/ii/i/g'    \
        > .he${lang}-f-erf.factored
    end

    foreach lang ( a b )
      cat .he${lang}-f-erf.factored \
        | grep -e '- -' \
        | gawk '/./ {print $2}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erf-midfs-all.frq

      cat .he${lang}-f-erf.factored \
        | grep -e '- -' \
        | gawk '/./ {print $3}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erf-suffs-all.frq

      cat .he${lang}-f-erf.factored \
        | grep -e '- -' \
        | gawk '/./ {print ($2 $3)}' \
        | sed -e 's/--//g' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erf-tails-all.frq
    end
    dicio-wc .he{a,b}-f-erf-{midf,suff,tail}s-all.frq
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       162     324      2666 .hea-f-erf-midfs-all.frq
        85     170      1159 .hea-f-erf-suffs-all.frq
       535    1070      8572 .hea-f-erf-tails-all.frq
       125     250      2028 .heb-f-erf-midfs-all.frq
        54     108       722 .heb-f-erf-suffs-all.frq
       329     658      5186 .heb-f-erf-tails-all.frq
  
    foreach elem ( midf suff tail )
      foreach lang ( A.a B.b )
        set file = "he${lang:e}-f-erf-${elem}s-all"
        echo "${file}.frq -> ${file}.fmt"
        cat .${file}.frq \
          | compute-freqs \
          | gawk '\
                BEGIN {\
                  printf "by Friedman\nlanguage '"${lang:r}"'\n"; \
                  printf "freq pc '"${elem}"'ix\n---- -- ------------------\n";} \
                /./   {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \
                END   {printf "---- -- ------------------\n%4d 99 TOTAL\n",t;} \
              ' \
          > .${file}.fmt
      end
    end

    foreach elem ( midf suff tail )
      set tfiles = ( )
      foreach lang ( a b )
        set file = "he${lang}-f-erf-${elem}s-all"
        set tfiles = ( ${tfiles} .${file}.fmt )
      end
      pr -m -t -i' '1 -w 54  ${tfiles} \
        | expand \
        > .herbal-f-erf-${elem}-cmp.txt
    end
    dicio-wc .herbal-f-erf-{midf,suff,tail}-cmp.txt
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       168     893      6707 .herbal-f-erf-midf-cmp.txt
        91     449      3316 .herbal-f-erf-suff-cmp.txt
       541    2624     20105 .herbal-f-erf-tail-cmp.txt

    by Friedman                by Friedman
    language A                 language B
    freq pc tailix             freq pc tailix
    ---- -- ------------------ ---- -- ------------------
     611 10 -teeo               231 10 -teeo
     422  7 -eeo                185  8 -teo
     374  6 -eeol               172  7 -eeeo
     338  6 -eeoe               153  6 -toe
     228  4 -teeoe              131  5 -toin
     218  4 -teeol              116  5 -eeo
     216  4 -to                  90  4 -tol
     195  3 -tol                 84  4 -teeeo
     180  3 -toin                69  3 -to
     172  3 -eeeo                59  2 -eeteeo
     161  3 -teeeo               36  2 -eeeeo
     120  2 -toe                 34  1 -eeoe
      98  2 -eeoin               34  1 -eeol
      91  2 -eeeoe               30  1 -peeeo
      79  1 -eeoteeo             30  1 -tom
      76  1 -eeoo                29  1 -eeeto
      70  1 -eeteeo              29  1 -eeoo
      69  1 -teeoo               29  1 -peeo
      64  1 -eeeeo               26  1 -eeeoe
      62  1 -eeoto               26  1 -teoo

  Good news, at least we got the top entry to match.
  Now what else can we do?  We could map "teeoe" and
  "teeol" to "teo", but that seems a bit ad-hoc...
  
  Let's try again.  Let's compare the frequencies of 
  "k" and "t", "sh" and "ch" in each language"
  
    foreach lang ( a b )
      cat Note-009/he${lang}-f.factored \
        | sed \
            -e 's/[- .:]//g' \
            -e 's/ch/C/' \
            -e 's/sh/S/' \
            -e 's/$/\./' \
        | count-digraph-freqs \
            -v pad="." \
            -v showentropy=0 \
            -v chars=".CSaoeilmnrchtpkfsqjdvxyg"
    end
    
  Language A:

    Digraph counts:

           TT     .     C     S     a     o     e     i     l     m     n     r     c     h     t     p     k     f     s     q     d     y
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      n  1376  1341     1     .     1     7     .     .     3     2     .     1     1     .     1     .     .     .     2     .     8     8
      m   265   261     .     .     1     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      r  1569  1302    32     6    59    59     1     5     4     3     .     2     4     .     .     .     1     .     2     .    16    73
      l  1720  1310    35    14    11    88     7     .     .     3     .     2     6     .     6     1     7     1    40     1   120    68
      y  3189  2543    91    16     5    16     3     .     3     1     .     2    10     .   183    21   186     2    16     2    89     .
      s   669   316    51     5   104   107    16     .     .     2     .     .     4    12     .     1     3     .     1     .     .    47
      d  2234   160   145    51  1109   175    24     .    14     5     .     2    10     .     3     3     5     .    11     .     7   510
      k  1650    24   377    75   223   252   258     .     .     1     .     .    43   226     .     .     .     .     3     .     2   166
      t  1790    17   423    57   161   273   124     .     1     .     .     .    28   522     1     1     .     .     4     1     5   172
      p   324     7   117    11     9    35     .     .     .     .     .     .    16   101     1     .     .     .     .     .     3    24
      f   106     9    28     5     6    14     .     .     .     .     .     .     2    30     .     .     .     .     .     .     4     8
      .  7812     .  1507   745    79  1145    26    12    41    16     3    57   615     .   267    95   352    33   356   708  1266   489
      c  1001     .     .     .     .     .     .     .     .     .     .     .     .   122   522   101   226    30     .     .     .     .
      o  5711   410    59    24    74    18    60   101  1325    91     7   993   141     .   726    83   742    35   117     4   641    60
      a  2318    43     4     .     1     4     .  1305   311   131    54   412     3     .     4     2     7     2    10     .    19     6
      i  2601     2     1     2     .     1     3  1173     4     6  1300    83     3     .     1     .    14     .     3     .     5     .
      e  1958    32    11     1   118   529   475     1     1     3    12     4    12     .    34    11    37     1    79     .    10   587
      h  1013    10     4     1    93   335   177     1     2     .     .     2     .     .     1     .     1     .     9     .     4   373
      S  1016    15     5     .    47   525   233     .     3     .     .     .    21     .     6     .    13     1     4     .     6   137
      C  2892    10     .     3   217  1427   549     3     7     1     .     9    80     .    32     5    51     1    12     .    28   457
      q   716     .     1     .     .   698     2     .     1     .     .     .     2     .     2     .     5     .     .     .     1     4
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 41930  7812  2892  1016  2318  5711  1958  2601  1720   265  1376  1569  1001  1013  1790   324  1650   106   669   716  2234  3189

    Next-symbol probability (× 99):

        TT  .  C  S  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  d  y
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  . 19  9  1 15  .  .  1  .  .  1  8  .  3  1  4  .  5  9 16  6
      C 99  .  .  .  7 49 19  .  .  .  .  .  3  .  1  .  2  .  .  .  1 16
      S 99  1  .  .  5 51 23  .  .  .  .  .  2  .  1  .  1  .  .  .  1 13
      a 99  2  .  .  .  .  . 56 13  6  2 18  .  .  .  .  .  .  .  .  1  .
      o 99  7  1  .  1  .  1  2 23  2  . 17  2  . 13  1 13  1  2  . 11  1
      e 99  2  1  .  6 27 24  .  .  .  1  .  1  .  2  1  2  .  4  .  1 30
      i 99  .  .  .  .  .  . 45  .  . 49  3  .  .  .  .  1  .  .  .  .  .
      l 99 75  2  1  1  5  .  .  .  .  .  .  .  .  .  .  .  .  2  .  7  4
      m 99 98  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      n 99 96  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  1
      r 99 82  2  .  4  4  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  5
      c 99  .  .  .  .  .  .  .  .  .  .  .  . 12 52 10 22  3  .  .  .  .
      h 99  1  .  .  9 33 17  .  .  .  .  .  .  .  .  .  .  .  1  .  . 36
      t 99  1 23  3  9 15  7  .  .  .  .  .  2 29  .  .  .  .  .  .  . 10
      p 99  2 36  3  3 11  .  .  .  .  .  .  5 31  .  .  .  .  .  .  1  7
      k 99  1 23  5 13 15 15  .  .  .  .  .  3 14  .  .  .  .  .  .  . 10
      f 99  8 26  5  6 13  .  .  .  .  .  .  2 28  .  .  .  .  .  .  4  7
      s 99 47  8  1 15 16  2  .  .  .  .  .  1  2  .  .  .  .  .  .  .  7
      q 99  .  .  .  . 97  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  1
      d 99  7  6  2 49  8  1  .  1  .  .  .  .  .  .  .  .  .  .  .  . 23
      y 99 79  3  .  .  .  .  .  .  .  .  .  .  .  6  1  6  .  .  .  3  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 18  7  2  5 13  5  6  4  1  3  4  2  2  4  1  4  0  2  2  5  8

    Previous-symbol probability (× 99):

        TT  .  C  S  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  d  y
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 18  . 52 73  3 20  1  .  2  6  .  4 61  . 15 29 21 31 53 98 56 15
      C  7  .  .  .  9 25 28  .  .  .  .  1  8  .  2  2  3  1  2  .  1 14
      S  2  .  .  .  2  9 12  .  .  .  .  .  2  .  .  .  1  1  1  .  .  4
      a  5  1  .  .  .  .  . 50 18 49  4 26  .  .  .  1  .  2  1  .  1  .
      o 13  5  2  2  3  .  3  4 76 34  1 63 14  . 40 25 45 33 17  1 28  2
      e  5  .  .  .  5  9 24  .  .  1  1  .  1  .  2  3  2  1 12  .  . 18
      i  6  .  .  .  .  .  . 45  .  2 94  5  .  .  .  .  1  .  .  .  .  .
      l  4 17  1  1  .  2  .  .  .  1  .  .  1  .  .  .  .  1  6  .  5  2
      m  1  3  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      n  3 17  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .
      r  4 17  1  1  3  1  .  .  .  1  .  .  .  .  .  .  .  .  .  .  1  2
      c  2  .  .  .  .  .  .  .  .  .  .  .  . 12 29 31 14 28  .  .  .  .
      h  2  .  .  .  4  6  9  .  .  .  .  .  .  .  .  .  .  .  1  .  . 12
      t  4  . 14  6  7  5  6  .  .  .  .  .  3 51  .  .  .  .  1  .  .  5
      p  1  .  4  1  .  1  .  .  .  .  .  .  2 10  .  .  .  .  .  .  .  1
      k  4  . 13  7 10  4 13  .  .  .  .  .  4 22  .  .  .  .  .  .  .  5
      f  0  .  1  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .
      s  2  4  2  .  4  2  1  .  .  1  .  .  .  1  .  .  .  .  .  .  .  1
      q  2  .  .  .  . 12  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      d  5  2  5  5 47  3  1  .  1  2  .  .  1  .  .  1  .  .  2  .  . 16
      y  8 32  3  2  .  .  .  .  .  .  .  .  1  . 10  6 11  2  2  .  4  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

  Language B

    Digraph counts:

           TT     .     C     S     a     o     e     i     l     m     n     r     c     h     t     p     k     f     s     q     d     x     y
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      n   513   500     1     .     .     1     .     .     .     .     .     4     .     .     .     .     .     .     1     1     1     .     4
      y  1754  1473    18     8     1     6     2     .     5     4     .     2     1     .    75    17   110     8     4     .    20     .     .
      m   113   107     .     .     2     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     3     .     .
      r   670   532     6     2    77    16     2     4     1     .     .     .     1     .     .     .     .     .     .     .     8     .    21
      l   612   315    35    15    36    34     2     .     1     .     .     4     3     .     8     .    55     4    10     1    52     .    37
      s   191    94     6     .    60     7     5     .     1     .     .     .     .     6     .     2     1     .     1     .     2     .     6
      d  1477    71    19    13   421    38    12     1     6     .     .     1     2     .     .     1     2     .     2     .     .     .   888
      f    86     5    33     1    20     7     2     .     .     .     .     .     2     9     .     .     .     .     1     .     1     .     5
      p   142     5    65     9    22    12     1     .     .     .     .     .     5    12     .     .     .     .     .     .     4     .     7
      .  3223     .   540   256   171   760    14     7    49     5     .    21    42     .   137    53   163    21    75   330   341     2   236
      x     4     1     .     .     .     3     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .
      c   219     .     .     .     .     .     .     .     .     .     .     .     .    23    63    12   112     9     .     .     .     .     .
      o  1695    51     5     2    16     8    21     4   297     3     1   174    35     .   216    35   517    28    40     .   230     1    11
      a  1368    17     .     1     .     1     .   569   245    99     4   398     4     .     1     1    10     1     7     .     8     .     2
      i  1051     .     .     .     .     2     .   464     1     .   508    64     3     .     .     1     6     .     .     .     2     .     .
      k  1106    20    94    21   374    49   330     .     1     .     .     .     4   112     .     .     .     .     4     .     3     .    94
      t   530     5    73    18   128    53   145     .     .     .     .     .     5    63     .     .     .     .     .     .     .     .    40
      h   225     3     1     1     5    10    65     1     .     .     .     .     1     .     .     .     .     1     .     .    26     1   110
      S   350     2     .     .     5    50   206     .     .     .     .     .    12     .     .     .     6     .     3     .    44     .    22
      C   909     2     .     1    19    93   406     1     2     1     .     1    71     .     9     3    25     1     6     .   212     .    56
      e  1497    20    13     2    11   219   279     .     3     .     .     1    27     .    21    17    99    13    37     .   520     .   215
      q   332     .     .     .     .   326     5     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT 18067  3223   909   350  1368  1695  1497  1051   612   113   513   670   219   225   530   142  1106    86   191   332  1477     4  1754

    Next-symbol probability (× 99):

        TT  .  C  S  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  d  x  y
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  . 17  8  5 23  .  .  2  .  .  1  1  .  4  2  5  1  2 10 10  .  7
      C 99  .  .  .  2 10 44  .  .  .  .  .  8  .  1  .  3  .  1  . 23  .  6
      S 99  1  .  .  1 14 58  .  .  .  .  .  3  .  .  .  2  .  1  . 12  .  6
      a 99  1  .  .  .  .  . 41 18  7  . 29  .  .  .  .  1  .  1  .  1  .  .
      o 99  3  .  .  1  .  1  . 17  .  . 10  2  . 13  2 30  2  2  . 13  .  1
      e 99  1  1  .  1 14 18  .  .  .  .  .  2  .  1  1  7  1  2  . 34  . 14
      i 99  .  .  .  .  .  . 44  .  . 48  6  .  .  .  .  1  .  .  .  .  .  .
      l 99 51  6  2  6  6  .  .  .  .  .  1  .  .  1  .  9  1  2  .  8  .  6
      m 99 94  .  .  2  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  3  .  .
      n 99 96  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  1
      r 99 79  1  . 11  2  .  1  .  .  .  .  .  .  .  .  .  .  .  .  1  .  3
      c 99  .  .  .  .  .  .  .  .  .  .  .  . 10 28  5 51  4  .  .  .  .  .
      h 99  1  .  .  2  4 29  .  .  .  .  .  .  .  .  .  .  .  .  . 11  . 48
      t 99  1 14  3 24 10 27  .  .  .  .  .  1 12  .  .  .  .  .  .  .  .  7
      p 99  3 45  6 15  8  1  .  .  .  .  .  3  8  .  .  .  .  .  .  3  .  5
      k 99  2  8  2 33  4 30  .  .  .  .  .  . 10  .  .  .  .  .  .  .  .  8
      f 99  6 38  1 23  8  2  .  .  .  .  .  2 10  .  .  .  .  1  .  1  .  6
      s 99 49  3  . 31  4  3  .  1  .  .  .  .  3  .  1  1  .  1  .  1  .  3
      q 99  .  .  .  . 97  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      d 99  5  1  1 28  3  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 60
      x 99 25  .  .  . 74  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y 99 83  1  .  .  .  .  .  .  .  .  .  .  .  4  1  6  .  .  .  1  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 18  5  2  7  9  8  6  3  1  3  4  1  1  3  1  6  0  1  2  8  0 10

    Previous-symbol probability (× 99):

        TT  .  C  S  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  d  x  y
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 18  . 59 72 12 44  1  1  8  4  .  3 19  . 26 37 15 24 39 98 23 50 13
      C  5  .  .  .  1  5 27  .  .  1  .  . 32  .  2  2  2  1  3  . 14  .  3
      S  2  .  .  .  .  3 14  .  .  .  .  .  5  .  .  .  1  .  2  .  3  .  1
      a  7  1  .  .  .  .  . 54 40 87  1 59  2  .  .  1  1  1  4  .  1  .  .
      o  9  2  1  1  1  .  1  . 48  3  . 26 16  . 40 24 46 32 21  . 15 25  1
      e  8  1  1  1  1 13 18  .  .  .  .  . 12  .  4 12  9 15 19  . 35  . 12
      i  6  .  .  .  .  .  . 44  .  . 98  9  1  .  .  1  1  .  .  .  .  .  .
      l  3 10  4  4  3  2  .  .  .  .  .  1  1  .  1  .  5  5  5  .  3  .  2
      m  1  3  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .  .  .
      n  3 15  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  1  .  .  .  .
      r  4 16  1  1  6  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  1
      c  1  .  .  .  .  .  .  .  .  .  .  .  . 10 12  8 10 10  .  .  .  .  .
      h  1  .  .  .  .  1  4  .  .  .  .  .  .  .  .  .  .  1  .  .  2 25  6
      t  3  .  8  5  9  3 10  .  .  .  .  .  2 28  .  .  .  .  .  .  .  .  2
      p  1  .  7  3  2  1  .  .  .  .  .  .  2  5  .  .  .  .  .  .  .  .  .
      k  6  1 10  6 27  3 22  .  .  .  .  .  2 49  .  .  .  .  2  .  .  .  5
      f  0  .  4  .  1  .  .  .  .  .  .  .  1  4  .  .  .  .  1  .  .  .  .
      s  1  3  1  .  4  .  .  .  .  .  .  .  .  3  .  1  .  .  1  .  .  .  .
      q  2  .  .  .  . 19  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      d  8  2  2  4 30  2  1  .  1  .  .  .  1  .  .  1  .  .  1  .  .  . 50
      x  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
      y 10 45  2  2  .  .  .  .  1  4  .  .  .  . 14 12 10  9  2  .  1  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

  
  The relative frequencies of "t" and "k", "sh" and "ch" are as follows:

    Language A:  t = 1790 k = 1650  ratio t/k = 1.085
    Language B:  t =  530 k = 1106  ratio t/k = 0.479

    Language A:  S = 1016 C = 2892  ratio S/C = 0.351
    Language B:  S =  350 C =  909  ratio S/C = 0.385
    
  So it seems we must collapse t and k, otherwise it will be very hard to 
  find a correspondence between the two languages. 
  
  We could keep ch and sh distinct, but their next-symbol
  probabilities are so similar that it seems silly to distinguish
  them.

  Just to be sure, let's compare the sh and ch contexts in the two
  languages:
  
    foreach lang ( a b )
      foreach f ( sh.ch ch.sh )
        cat Note-009/he${lang}-f.factored \
          | sed \
              -e 's/[- .:]//g' \
              -e 's/k/t/' \
              -e 's/p/f/' \
              -e 's/ckh/K/' \
              -e 's/cph/P/' \
              -e 's/'"${f:r}"'/@/' \
              -e 's/'"${f:e}"'/~/' \
          | grep '@' \
          | sort | uniq -c | expand \
          | sort +0 -1nr \
          | compute-freqs \
          > .tmp-he${lang}-${f:r}.frq
      end
    end
    dicio-wc .tmp-he{a,b}-{sh,ch}.frq
    
     lines   words     bytes file        
    ------ ------- --------- ------------
       322     966      6438 .tmp-hea-sh.frq
       860    2580     17698 .tmp-hea-ch.frq
       173     519      3466 .tmp-heb-sh.frq
       389    1167      7964 .tmp-heb-ch.frq

    Language A                          Language B                       
   ----------------------------------  ----------------------------------     
    contexts of sh    contexts of ch    contexts of sh   contexts of ch  
   ----------------- ----------------  ---------------- -----------------         
    98 0.096 @ol     221 0.076 @ol      35 0.100 @edy    60 0.066 @edy   
    92 0.091 @o      150 0.052 @or      16 0.046 @dy     51 0.056 @dy    
    63 0.062 @or      95 0.033 @y       11 0.031 @ol     43 0.047 @cthy  
    61 0.060 @y       88 0.030 qot@y    10 0.029 @ey     22 0.024 @ety   
    39 0.038 @ey      55 0.019 @ey      10 0.029 @y      22 0.024 t@dy   
    23 0.023 @ody     53 0.018 t@y       9 0.026 @ody    20 0.022 qot@dy 
    19 0.019 @eey     44 0.015 ot@y      8 0.023 @eedy   15 0.017 @ey    
    15 0.015 @eol     37 0.013 @oty      8 0.023 @eody   13 0.014 @ol    
    14 0.014 @aiin    36 0.012 t@or      7 0.020 @eo     12 0.013 @ecthy 
    14 0.014 @e       34 0.012 @aiin     6 0.017 @ety    12 0.013 @ody   
    14 0.014 @odaiin  33 0.011 @eor      6 0.017 @or     12 0.013 ot@dy  
    12 0.012 @eor     32 0.011 ot@ol     5 0.014 @ed     11 0.012 @eody  
    11 0.011 t@o      31 0.011 @ody      5 0.014 @eey    10 0.011 @y     
    10 0.010 @eo      30 0.010 t@ol      5 0.014 @eol     9 0.010 @ty    
    10 0.010 @octhy   30 0.010 yt@y      5 0.014 d@edy    9 0.010 t@edy  
    10 0.010 ot@y     29 0.010 @cthy     5 0.014 t@dy     7 0.008 @daiin 
     9 0.009 @cthy    29 0.010 @o        4 0.011 @cthey   7 0.008 @eol   
 
  Obviously "sh" and "ch" are very different.
  
  Just to make double sure, we can play the same game with t and k:
  
    foreach lang ( a b )
      foreach f ( t.k k.t )
        cat Note-009/he${lang}-f.factored \
          | sed \
              -e 's/[- .:]//g' \
              -e 's/p/f/' \
              -e 's/'"${f:r}"'/@/' \
              -e 's/'"${f:e}"'/~/' \
          | grep '@' \
          | sort | uniq -c | expand \
          | sort +0 -1nr \
          | compute-freqs \
          > .tmp-he${lang}-${f:r}.frq
      end
    end
    dicio-wc .tmp-he{a,b}-{t,k}.frq
    pr -m -t -i' '1 -w 104 .tmp-he{a,b}-{t,k}.frq \
      | expand \
      > .tmp-t-k-cmp.txt

     lines   words     bytes file        
    ------ ------- --------- ------------
       642    1926     13627 .tmp-hea-t.frq
       683    2049     14438 .tmp-hea-k.frq
       271     813      5693 .tmp-heb-t.frq
       437    1311      9223 .tmp-heb-k.frq


     Language A                                          Language B                                
    ------------------------------------------------    ---------------------------------------------   
     contexts of t             contexts of k             contexts of t             contexts of k   
    ---------------------     ----------------------    --------------------      -------------------       
     96 0.054 c@hy             39 0.024 qo@chy           18 0.034 o@edy            41 0.037 qo@edy
     51 0.029 c@hol            33 0.020 o@y              16 0.030 o@ar             35 0.032 chc@hy
     49 0.028 qo@chy           29 0.018 @chy             14 0.027 o@al             35 0.032 o@aiin
     42 0.024 c@hor            28 0.017 c@hy             13 0.025 o@aiin           29 0.026 o@ar
     38 0.021 o@y              27 0.017 o@aiin           12 0.023 o@y              27 0.025 qo@ar
     34 0.019 c@hey            25 0.015 qo@y             12 0.023 qo@edy           25 0.023 o@edy
     29 0.016 o@chy            22 0.014 qo@ol            11 0.021 y@edy            23 0.021 o@al
     28 0.016 o@ol             21 0.013 o@ol             10 0.019 @edy             20 0.018 qo@aiin
     28 0.016 qo@y             20 0.012 @chor             9 0.017 @ar              19 0.017 che@y
     27 0.015 o@aiin           19 0.012 @chol             8 0.015 chc@hy           18 0.016 @ar
     24 0.014 @chy             18 0.011 @aiin             7 0.013 @chdy            17 0.015 o@y
     24 0.014 o@chol           18 0.011 @ol               7 0.013 o@am             16 0.015 o@eedy
     20 0.011 c@ho             18 0.011 y@chy             7 0.013 o@chdy           15 0.014 @chdy
     20 0.011 cho@y            17 0.010 cho@y             7 0.013 o@eol            15 0.014 qo@chdy
     18 0.010 qo@ol            16 0.010 qo@aiin           7 0.013 qo@ar            15 0.014 y@ar
     17 0.010 @ol              15 0.009 c@hol             7 0.013 y@eedy           14 0.013 @edy
    
  Hm, there is some resemblance, but not as much as I would like.
  Perhaps it will get better if I delete the [oqy] prefixes and
  eplace cth,ckh by tch, kch:

    foreach lang ( a b )
      foreach f ( t.k k.t )
        cat Note-009/he${lang}-f.factored \
          | sed \
              -e 's/[- .:]//g' \
              -e 's/p/f/' \
              -e 's/^[qoy]*//' \
              -e 's/c\([tkpf]\)h/\1ch/' \
              -e 's/'"${f:r}"'/@/' \
              -e 's/'"${f:e}"'/~/' \
          | grep '@' \
          | sort | uniq -c | expand \
          | sort +0 -1nr \
          | compute-freqs \
          > .tmp-he${lang}-${f:r}.frq
      end
    end
    dicio-wc .tmp-he{a,b}-{t,k}.frq 
    pr -m -t -i' '1 -w 104 .tmp-he{a,b}-{t,k}.frq \
      | expand \
      > .tmp-t-k-cmp.txt
    
     lines   words     bytes file        
    ------ ------- --------- ------------
       446    1338      9376 .tmp-hea-t.frq
       453    1359      9489 .tmp-hea-k.frq
       205     615      4279 .tmp-heb-t.frq
       318     954      6630 .tmp-heb-k.frq

    
     Language A                                          Language B                                
    ------------------------------------------------    ---------------------------------------------   
     contexts of t             contexts of k             contexts of t             contexts of k   
    ---------------------     ----------------------    --------------------      -------------------       
     218 0.123 @chy            137 0.084 @chy             51 0.097 @edy             89 0.081 @ar
     109 0.061 @chol            80 0.049 @y               35 0.067 @ar              89 0.081 @edy
      99 0.056 @chor            75 0.046 @aiin            26 0.049 @chdy            77 0.070 @aiin
      82 0.046 @y               70 0.043 @ol              23 0.044 @y               44 0.040 @al
      73 0.041 @ol              62 0.038 @chol            20 0.038 @aiin            43 0.039 @eedy
      70 0.039 @chey            61 0.037 @chor            19 0.036 @al              41 0.037 @chdy
      63 0.036 @aiin            50 0.031 @chey            16 0.030 @chedy           35 0.032 ch@chy
      47 0.027 @or              49 0.030 @eey             13 0.025 @eedy            30 0.027 @y
      40 0.023 @cho             36 0.022 @or              12 0.023 @eey             25 0.023 @chy
      29 0.016 @chody           30 0.018 @cho             11 0.021 @am              25 0.023 @eey
      26 0.015 cho@chy          25 0.015 @eol             11 0.021 @chy             19 0.017 @eody
      23 0.013 @char            23 0.014 @al              10 0.019 @chey            19 0.017 che@y
      20 0.011 @eey             21 0.013 ch@chy           10 0.019 @eol             18 0.016 @ain
      20 0.011 cho@y            20 0.012 @ey              10 0.019 @ody             18 0.016 @am
      19 0.011 @chaiin          20 0.012 cho@chy           9 0.017 @or              14 0.013 @ol
      17 0.010 @al              19 0.012 @shy              8 0.015 @ol              13 0.012 @chedy
      17 0.010 ch@chy           18 0.011 @chody            8 0.015 ch@chy           11 0.010 @ey

  Not perfect, but convincing enough...
  
  Ok. let's try again to equalize the tail distributions:
  
    foreach lang ( a b )
      cat Note-009/he${lang}-f.factored \
        | sed \
            -e 's/d//g'      \
            -e 's/k/t/g'     \
            -e 's/f/p/g'     \
            -e 's/cth/tch/g' \
            -e 's/ckh/tch/g' \
            -e 's/cph/pch/g' \
            -e 's/cfh/pch/g' \
            -e 's/ei/a/g'    \
            -e 's/a/o/g'    \
        > .he${lang}-f-erg.factored
    end

    foreach lang ( a b )
      cat .he${lang}-f-erg.factored \
        | gawk '/./ {print ($1 $2 $3)}' \
        | sed -e 's/--//g' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-words-all.frq

      cat .he${lang}-f-erg.factored \
        | grep -v -e '- -' \
        | gawk '/./ {print $1}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-unifs-all.frq

      cat .he${lang}-f-erg.factored \
        | grep -e '- -' \
        | gawk '/./ {print $1}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-prefs-all.frq

      cat .he${lang}-f-erg.factored \
        | grep -e '- -' \
        | gawk '/./ {print $2}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-midfs-all.frq

      cat .he${lang}-f-erg.factored \
        | grep -e '- -' \
        | gawk '/./ {print $3}' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-suffs-all.frq

      cat .he${lang}-f-erg.factored \
        | grep -e '- -' \
        | gawk '/./ {print ($2 $3)}' \
        | sed -e 's/--//g' \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .he${lang}-f-erg-tails-all.frq
    end
    dicio-wc .he{a,b}-f-erg-{word,unif,pref,midf,suff,tail}s-all.frq

     lines   words     bytes file        
    ------ ------- --------- ------------
      1563    3126     23155 .hea-f-erg-words-all.frq
       193     386      2524 .hea-f-erg-unifs-all.frq
        47      94       600 .hea-f-erg-prefs-all.frq
       286     572      4647 .hea-f-erg-midfs-all.frq
       126     252      1732 .hea-f-erg-suffs-all.frq
       888    1776     14121 .hea-f-erg-tails-all.frq
       880    1760     12786 .heb-f-erg-words-all.frq
       132     264      1735 .heb-f-erg-unifs-all.frq
        28      56       345 .heb-f-erg-prefs-all.frq
       193     386      3105 .heb-f-erg-midfs-all.frq
        76     152      1018 .heb-f-erg-suffs-all.frq
       506    1012      7898 .heb-f-erg-tails-all.frq

    foreach elem ( word unif pref midf suff tail )
      foreach lang ( A.a B.b )
        set file = "he${lang:e}-f-erg-${elem}s-all"
        echo "${file}.frq -> ${file}.fmt"
        cat .${file}.frq \
          | compute-freqs \
          | gawk '\
                BEGIN {\
                  printf "by Friedman\nlanguage '"${lang:r}"'\n"; \
                  printf "freq pc '"${elem}"'ix\n---- -- ------------------\n";} \
                /./   {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \
                END   {printf "---- -- ------------------\n%4d 99 TOTAL\n",t;} \
              ' \
          > .${file}.fmt
      end
    end

    foreach elem ( word unif pref midf suff tail )
      set tfiles = ( )
      foreach lang ( a b )
        set file = "he${lang}-f-erg-${elem}s-all"
        set tfiles = ( ${tfiles} .${file}.fmt )
      end
      pr -m -t -i' '1 -w 54  ${tfiles} \
        | expand \
        > .herbal-f-erg-${elem}-cmp.txt
    end
    dicio-wc .herbal-f-erg-{word,unif,pref,midf,suff,tail}-cmp.txt

     lines   words     bytes file        
    ------ ------- --------- ------------
      1569    7361     55938 .herbal-f-erg-word-cmp.txt
       199    1007      7275 .herbal-f-erg-unif-cmp.txt
        53     257      1901 .herbal-f-erg-pref-cmp.txt
       292    1469     11188 .herbal-f-erg-midf-cmp.txt
       132     638      4738 .herbal-f-erg-suff-cmp.txt
       894    4214     32524 .herbal-f-erg-tail-cmp.txt

  With these transformations, the prefixes are obviously still 
  the same in both languages:
  
    by Friedman                by Friedman
    language A                 language B
    freq pc prefix             freq pc prefix
    ---- -- ------------------ ---- -- ------------------
    3857 65 -                  1269 52 -
     825 14 o-                  504 21 o-
     607 10 qo-                 300 12 qo-
     440  7 y-                  227  9 y-
      56  1 s-                   66  3 ol-
      42  1 ol-                  26  1 l-
      21  0 so-                   6  0 s-
      16  0 l-                    5  0 o:i-
      13  0 or-                   4  0 or-
      12  0 r-                    3  0 lo-
      11  0 oy-                   2  0 olo-
       7  0 o:i-                  2  0 qol-
       6  0 yo-                   2  0 r-
       5  0 os-                   1  0 lol-
       4  0 ro-                   1  0 lqo-
       4  0 sol-                  1  0 o:ii-
       4  0 sy-                   1  0 o:n-
       3  0 lo-                   1  0 oo-
       2  0 ls-                   1  0 oro-
       2  0 o:in-                 1  0 orol-
       2  0 oo-                   1  0 oy-
       2  0 oro-                  1  0 so:i-
       2  0 qoo:i-                1  0 sol-
  
  The suffixes are close enough:
  
    by Friedman                by Friedman
    language A                 language B
    freq pc suffix             freq pc suffix
    ---- -- ------------------ ---- -- ------------------
    1853 31 -y                 1173 48 -y
    1028 17 -ol                 250 10 -or
     881 15 -or                 202  8 -ol
     456  8 -o                  179  7 -oiin
     370  6 -oiin               122  5 -oy
     266  5 -oy                  96  4 -
     136  2 -                    73  3 -o
     130  2 -om                  47  2 -om
      96  2 -ooiin               34  1 -os
      84  1 -oly                 33  1 -oin
      77  1 -s                   31  1 -oly
      76  1 -os                  31  1 -s
      53  1 -oin                 20  1 -oir
      44  1 -ory                 11  1 -ooiin
      40  1 -oor                 10  0 -oor
      35  1 -on                   9  0 -ory
      28  1 -ool                  6  0 -orom
      12  0 -n                    6  0 -oror
      12  0 -ols                  6  0 -yy
      12  0 -yy                   5  0 -ool

  The midfixes are still very different:
  
    by Friedman                by Friedman
    language A                 language B
    freq pc midfix             freq pc midfix
    ---- -- ------------------ ---- -- ------------------
    1090 18 -tch-               590 24 -t-
    1045 18 -ch-                274 11 -te-
     913 15 -t-                 172  7 -ch-
     526  9 -sh-                163  7 -che-
     251  4 -che-               141  6 -tch-
     191  3 -tche-              135  6 -tee-
     181  3 -pch-               110  5 -she-
     155  3 -te-                 79  3 -sh-
     142  2 -she-                64  3 -tche-
     131  2 -tee-                57  2 -chtch-
      96  2 -chot-               48  2 -chet-
      93  2 -tsh-                48  2 -p-
      69  1 -chtch-              46  2 -pch-
      61  1 -chotch-             39  2 -pche-
      60  1 -p-                  24  1 -shee-
      58  1 -chee-               23  1 -chee-
      50  1 -cht-                19  1 -cht-
      43  1 -pche-               18  1 -ee-
      36  1 -shee-               18  1 -tsh-
      30  1 -tchee-              18  1 -tshe-
      25  0 -eee-                16  1 -chetch-
  
  And the tails, oh my:
  
    by Friedman                by Friedman
    language A                 language B
    freq pc tailix             freq pc tailix
    ---- -- ------------------ ---- -- ------------------
     379  6 -tchy               171  7 -tey
     269  5 -chol               149  6 -tor
     232  4 -chor               113  5 -tchy
     195  3 -tol                108  4 -toiin
     192  3 -tchol               99  4 -teey
     191  3 -tchor               97  4 -chey
     182  3 -ty                  89  4 -tol
     163  3 -toiin               73  3 -chy
     154  3 -chy                 63  3 -ty
     121  2 -tchey               58  2 -shey
     116  2 -sho                 52  2 -tchey
     114  2 -tor                 49  2 -chtchy
     104  2 -shol                30  1 -shy
      95  2 -tcho                30  1 -tom
      88  2 -chey                28  1 -pchey
      83  1 -shy                 25  1 -pchy
      82  1 -shor                25  1 -teoy
      75  1 -teey                23  1 -chety
      61  1 -cho                 23  1 -toin
      58  1 -cheor               22  1 -toly
      58  1 -choiin              21  1 -teol
      53  1 -tchoy               20  1 -chol
      47  1 -chotchy             18  1 -toy
      45  1 -choy                17  1 -sheey
      45  1 -shey                16  1 -chetchy
      44  1 -teol                16  1 -chor
      43  1 -chtchy              16  1 -choy
      42  1 -pchy                14  1 -teo

  The unifixes are rather OK, I think, except for the 
  inversion between "oiin" and "or":
  
    by Friedman                by Friedman
    language A                 language B
    freq pc unifix             freq pc unifix
    ---- -- ------------------ ---- -- ------------------
     441 24 oiin                149 19 or
     175 10 or                  126 16 oiin
     145  8 ol                   75 10 ol
     107  6 y                    35  4 y
      88  5 s                    25  3 soiin
      77  4 oin                  22  3 om
      55  3 om                   18  2 oroiin
      40  2 soiin                17  2 oly
      31  2 ooiin                16  2 oy
      30  2 sor                  13  2 oloiin
      28  2 oir                  12  2 oin
      28  2 sol                  12  2 olor
      25  1 o                    12  2 s
      25  1 sy                   10  1 ory
      20  1 qooiin                9  1 ooiin
  
  The words as a whole are rather different:
  
    by Friedman                by Friedman
    language A                 language B
    freq pc wordix             freq pc wordix
    ---- -- ------------------ ---- -- ------------------
     441  6 oiin                149  5 or
     247  3 chol                126  4 oiin
     201  3 chor                 80  3 chey
     182  2 tchy                 75  2 ol
     175  2 or                   64  2 chy
     145  2 ol                   56  2 qotey
     126  2 chy                  54  2 otey
     108  1 sho                  54  2 otor
     107  1 tchol                53  2 shey
     107  1 y                    49  2 otoiin
     104  1 tchor                48  2 chtchy
     101  1 qotchy               44  1 otol
     100  1 shol                 43  1 tchy
      88  1 s                    35  1 qotor
      79  1 otol                 35  1 y
      79  1 shor                 33  1 tor
      77  1 oin                  31  1 oteey
      77  1 oty                  30  1 oty

97-11-25 stolfi
===============

  Checking the contexts of "daiin"
  
    cat hea-f-eva.wds \
      | sed -e '/[-\/]/d' \
      | enum-word-pairs \
      | grep -w daiin \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      > .foo
      
         30 chol daiin
         23 daiin =
         13 daiin daiin
         11 daiin cthy
         10 shol daiin
          9 chor daiin
          8 daiin cthor
          7 chy daiin
          7 daiin dain
          6 cthy daiin
          6 daiin chol
          6 daiin chor
          6 daiin cthol
          6 daiin sho
          6 dain daiin

  Hmm, my guess that EVA "daiin" = Chinese "de" needs some improvement...
  

    cat chin-mch.txt \
      | tr ' ' '\012' \
      | egrep -e '.' \
      | enum-word-pairs \
      | grep -w de \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      > .foo  

  Denis Mardle posted counts of -iiin, -iin, -in, -n  
  per page in the "stars" section.  Here NL = num lines,
  NP = num paragraphs.

    page  NL NP  i3  i2  i1  i0
    ----- -- -- --- --- --- ---
    f105v 38 10   0   4  83   5
    f105r 37 10   1   1  47   6
    f113v 49 15   0  20  83   5    
    f114r 45 13   0  22  90   7
    f113r 51 17   0  21  75   4    
    f104r 45 13   1  18  63   3
    f107r 51 15   1  31  93   4    
    f106v 47 15   0  24  65   1
    f106r 47 15   0  24  65   1
    f114v 41 12   0  24  65   3
    f104v 44 13   0  26  59   2
    f107v 49 15   0  43  85   1    
    f108r 50 16   1  22  39   0
    f112v 47 14   1  34  59   7
    f112r 45 12   0  19  31   3
    f108v 53 16   1  39  53   0
    f115r 45 13   1  26  34   2    
    f111r 54 17   0  50  45   1    
    f115v 45 13   0  38  28   2    
    f103v 46 14   4  40  31   1
    f103r 54 19   2  46  27   0
    f111v 51 19   5 113  41   1    
    f116r 30 10   2  54  13   1
    no st.20  2   6  39   8   0

26-11-97 stolfi
===============

  Using data posted by John Grove, 
  I split several of my textual units (L16-eva/f*) into smaller
  units, distinguishing real "parags" from his so-called
  "titles" (which are actually short lines placed at the 
  *end* of a parags block.
  
  The files affected were
  
    f1r.P    -> f1r.P1 f1r.T1 f1r.P2 f1r.T2 f1r.P3 f1r.T3 f1r.P4 f1r.T4
    f8r.P    -> f8r.P1 f8r.T1 f8r.P2 f8r.T2 f8r.P3 f8r.T3
    f9r.P    -> f9r.P f9r.T
    f16r.P   -> f16r.P1 f16r.T1 f16r.P2
    f18r.P   -> f18r.P f18r.T
    f19v.P   -> f19v.P f19v.T
    f22v.P   -> f22v.P f22v.T
    f24r.P   -> f24r.P f24r.T
    f25r.P   -> f25r.P f25r.T
    f27r.P   -> f27r.P f27r.T
    f28v.P   -> f28v.P1 f28v.T1 f28v.P2 f28v.T2
    f31r.P   -> f31r.P f31r.T
    f39r.P   -> f39r.P f39r.P
    f40v.P   -> f40v.P f40v.T
    f41v.P   -> f41v.P f41v.T
    f42r.P   -> f42r.P1 f42r.T1 f42r.P2 f42r.T2 f42r.P3 f42r.T3 
    f42v.P   -> f42v.P f42v.T
    (new)    -> f57v.T
    (new)    -> f58v.T
    (new)    -> f65r.L
    (old)    -> f66r.W  {entered months ago}
    f82r.P   -> f82r.P1 f82r.T1 f82r.P2
    (new)    -> f85r2.T
    f85r1.P  -> f85r1.P f85r1.T
    f86v5.P  -> f86v5.P f86v5.T
    f94r.P   -> f94r.P f94r.T
    f101v1.P -> f101v1.P f101v1.T
    f101v2.P -> f101v2.P f101v2.T
    f105r.P  -> f105r.P1 f105r.T1 f105r.P2 f105r.T2
    f108v.P  -> f108v.P f108v.T
    f114r.P  -> f114r.P1 f114r.T1 f114r.P2 f114r.T2
    
  Validating it:
  
    pushd L16-eva
    rm -f .bugs
    foreach f ( f[0-9]* )
      echo '=== '$f' ===' >>& .bugs
      cat $f \
        | ../validate-new-evt-format \
            -v chars='aoeilmnrchtpkfsqgjdvxy' \
            -v location="$f" \
        >>& .bugs
    end
    popd 

  Must redo Note-010 from scratch.

  Rene Zandberger sent me corrected -I*D statistics for the "stars"
  section. (Although Denis says his statistics were already 
  checked against the Yale copyflo).  The format is

    - page code. The first T is the quire (T=20) and the second
      character the 'page in quire' (A=f103r, ..., X=f116v)
    - nr of words (not sure how commas were counted)
    - nr of words containing iiin
    - nr of words containing iin
    - nr of words continaing in
    - nr of words containing n
    
  His numbers were cumulative; I reduced them to exclusive counts
  by piping the table through
  
    gawk \
      ' /./ { \
          printf "    %s %s %5d %5d %5d %5d %5d\n", \
          $1, $2, $3, $4, $5-$4, $6-$5, $7-$6; \
        } \
      '
  Here is the result:

    page     words -iiin  -iin   -in    -n
    -------- ----- ----- ----- ----- -----
    f103r TA   526     0    33    41     2
    f103v TB   454     1    34    37     4
    f104r TC   448     1    66    17     1
    f104v TD   477     3    59    24     0
    f105r TE   379     6    48     1     1
    f105v TF   399     5    85     4     0
    f106r TG   432     1    65    24     0
    f106v TH   444     1    67    23     0
    f107r TI   487     4    93    30     1
    f107v TJ   462     1    84    43     0
    f108r TK   494     0    39    22     1
    f108v TL   581     0    52    39     1
    f111r TM   623     1    44    51     0
    f111v TN   568     1    41   113     6
    f112r TO   401     3    32    21     0
    f112v TP   420     7    60    33     1
    f113r TQ   528     4    79    21     0
    f113v TR   502     5    84    20     0
    f114r TS   460     5    91    23     0
    f114v TT   376     2    68    23     0
    f115r TU   461     1    40    21     2
    f115v TV   410     2    32    33     0
    f116r TW   554     1    25    90     8

  Then I ran the table thrice through
    
    sort-distr -s 18 -n 4 -d 
    
  It converged to this stable order after the second iteration:
    
    page     words -iiin  -iin   -in    -n
    -------- ----- ----- ----- ----- -----
    f105v TF   399     5    85     4     0
    f105r TE   379     6    48     1     1
    f113v TR   502     5    84    20     0
    f114r TS   460     5    91    23     0
    f113r TQ   528     4    79    21     0
    f104r TC   448     1    66    17     1
    f107r TI   487     4    93    30     1
    f114v TT   376     2    68    23     0
    f106v TH   444     1    67    23     0
    f106r TG   432     1    65    24     0
    f104v TD   477     3    59    24     0
    f107v TJ   462     1    84    43     0
    f115r TU   461     1    40    21     2
    f108r TK   494     0    39    22     1
    f112v TP   420     7    60    33     1
    f112r TO   401     3    32    21     0
    f108v TL   581     0    52    39     1
    f115v TV   410     2    32    33     0
    f103v TB   454     1    34    37     4
    f111r TM   623     1    44    51     0
    f103r TA   526     0    33    41     2
    f111v TN   568     1    41   113     6
    f116r TW   554     1    25    90     8
  
                                                                                        
            f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f  f
            1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
            0  0  1  1  1  0  0  1  0  0  0  0  1  0  1  1  0  1  0  1  0  1  1
            5  5  3  4  3  4  7  4  6  6  4  7  5  8  2  2  8  5  3  1  3  1  6
            v  r  v  r  r  r  r  v  v  r  v  v  r  r  v  r  v  v  v  r  r  v  r
           -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    f105v   0  6 13 14 14 14 17 18 18 20 22 26 27 29 28 32 35 42 43 45 46 63 67
    f105r   6  0 13 13 14 14 17 18 18 20 21 25 26 28 26 30 34 40 42 43 45 61 65
    f113v  13 13  0  1  2  3  5  5  6  7  9 13 15 16 16 19 22 29 30 32 33 50 54
    f114r  14 13  1  0  1  2  4  5  5  6  8 12 14 15 15 18 21 28 30 31 33 49 53
    f113r  14 14  2  1  0  2  3  4  4  5  7 12 13 14 14 18 20 27 29 30 32 49 52
    f104r  14 14  3  2  2  0  4  4  4  6  8 12 14 15 15 19 21 28 29 31 32 49 53
    f107r  17 17  5  4  3  4  0  1  2  3  4  9 10 11 11 15 17 24 26 27 29 46 49
    f114v  18 18  5  5  4  4  1  0  1  2  4  8 10 10 11 14 16 24 25 27 28 45 49
    f106v  18 18  6  5  4  4  2  1  0  1  4  8 10 10 11 15 16 24 25 27 28 45 49
    f106r  20 20  7  6  5  6  3  2  1  0  3  6  8  9 10 13 15 22 24 25 27 43 47
    f104v  22 21  9  8  7  8  4  4  4  3  0  5  6  7  7 10 13 20 22 23 24 41 45
    f107v  26 25 13 12 12 12  9  8  8  6  5  0  4  3  6  8  9 16 18 19 21 37 41
    f115r  27 26 15 14 13 14 10 10 10  8  6  4  0  2  4  6  8 15 16 18 19 36 39
    f108r  29 28 16 15 14 15 11 10 10  9  7  3  2  0  5  6  6 14 15 16 18 35 38
    f112v  28 26 16 15 14 15 11 11 11 10  7  6  4  5  0  4  8 14 15 17 19 35 38
    f112r  32 30 19 18 18 19 15 14 15 13 10  8  6  6  4  0  5 10 12 13 15 31 35
    f108v  35 34 22 21 20 21 17 16 16 15 13  9  8  6  8  5  0  8 10 10 12 29 32
    f115v  42 40 29 28 27 28 24 24 24 22 20 16 15 14 14 10  8  0  4  3  5 21 25
    f103v  43 42 30 30 29 29 26 25 25 24 22 18 16 15 15 12 10  4  0  5  4 20 23
    f111r  45 43 32 31 30 31 27 27 27 25 23 19 18 16 17 13 10  3  5  0  3 18 22
    f103r  46 45 33 33 32 32 29 28 28 27 24 21 19 18 19 15 12  5  4  3  0 17 20
    f111v  63 61 50 49 49 49 46 45 45 43 41 37 36 35 35 31 29 21 20 18 17  0  4
    f116r  67 65 54 53 52 53 49 49 49 47 45 41 39 38 38 35 32 25 23 22 20  4  0
           
  I created a picture of the distance matrix by piping the data through
  
    sort-distr -s 18 -n 4 -d -p - | pnmscale 8 | ppmtogif > .stars-dist.gif

  The obvious groupings are:
  
    W = { f105v f105r }
    
    X = { f113v f114r f113r f104r f107r f114v f106v f106r f104v f107v
          f115r f108r f112v f112r f108v }
      
    Y = { f115v f103v f111r f103r }
    
    Z = { f111v f116r }
    
  The first row of group X is fairly homogeneous; the second row seems
  to be a gradient, in the order shown.

  The previous ordering was 
  
    w = { f105r f105v }

    x = { f113v f114r f113r f104r f107r f106r f106v
          f114v f104v f107v f108r f112v f112r f108v f115r }

    y = { f111r f115v f103v f103r }

    z = { f111v f116r }

  So indeed the corrections had no significant effect.

  Denis later posted the per-paragraph counts:

    gawk \
      ' \
        /^#/ {p=$2;n=0;next;} \
        /./{n++;printf "%s.P%02d %3d%3d%3d%3d%3d\n", p,n,$1,$2,$3,$4,$5;} \
      '

  Here is Denis's data, transposed and fitted with paragraph labels.
  Each row is one paragraph.  The data columns are the number of lines in 
  the paragraph and the counts of -D, -ID, -IID, and -IIID endings.
  The data for page 108v had a missing column, so I filled it with '99's.

    page       ln i0 i1 i2 i3
    ---------  -- -- -- -- --
    f103r.P01   4  0  3  0  0
    f103r.P02   3  1  3  0  0
    f103r.P03   1  0  1  0  0
    f103r.P04   4  0  7  4  0
    f103r.P05   3  0  5  1  0
    f103r.P06   2  0  1  1  0
    f103r.P07   3  0  6  3  0
    f103r.P08   3  0  1  1  0
    f103r.P09   3  0  6  0  0
    f103r.P10   3  0  3  1  0
    f103r.P11   3  0  0  6  0
    f103r.P12   2  0  0  1  0
    f103r.P13   2  0  4  0  0
    f103r.P14   3  0  2  0  0
    f103r.P15   3  0  0  1  0
    f103r.P16   2  0  0  1  0
    f103r.P17   3  1  0  2  0
    f103r.P18   4  0  4  3  0
    f103r.P19   3  0  0  2  0
    f103v.P01   4  0  2  2  0
    f103v.P02   4  0  3  5  0
    f103v.P03   3  0  2  2  0
    f103v.P04   2  1  2  0  0
    f103v.P05   3  0  4  2  0
    f103v.P06   3  0  1  3  0
    f103v.P07   1  0  3  0  0
    f103v.P08   6  3  7  3  1
    f103v.P09   3  0  0  4  0
    f103v.P10   3  0  2  2  0
    f103v.P11   4  0  2  4  0
    f103v.P12   2  0  1  0  0
    f103v.P13   4  0  4  2  0
    f103v.P14   4  0  7  2  0
    f104r.P01   4  0  1  5  2
    f104r.P02   5  0  5  4  0
    f104r.P03   2  0  0  0  0
    f104r.P04   4  0  2  6  0
    f104r.P05   3  0  1  0  0
    f104r.P06   3  1  1  4  0
    f104r.P07   5  0  2  5  0
    f104r.P08   3  0  0  7  0
    f104r.P09   4  0  2  7  0
    f104r.P10   2  0  0  5  0
    f104r.P11   3  0  0  6  0
    f104r.P12   4  0  1  6  1
    f104r.P13   3  0  3  8  0
    f104v.P01   5  0  2  9  0
    f104v.P02   2  0  2  0  0
    f104v.P03   4  0  1  6  0
    f104v.P04   3  0  1  5  0
    f104v.P05   4  0  3  3  1
    f104v.P06   3  0  1  2  0
    f104v.P07   5  0  1 11  1
    f104v.P08   2  0  0  4  0
    f104v.P09   3  0  0  5  0
    f104v.P10   3  0  5  4  0
    f104v.P11   2  0  1  3  0
    f104v.P12   4  0  6  3  0
    f104v.P13   4  0  3  4  0
    f105r.P01   5  1  0  8  0
    f105r.P02   4  0  0  3  2
    f105r.P03   4  0  0  4  1
    f105r.P04   3  0  0  4  0
    f105r.P05   7  0  0 15  0
    f105r.P06   2  0  1  2  1
    f105r.P07   2  0  0  0  0
    f105r.P08   3  0  0  4  1
    f105r.P09   4  0  0  4  1
    f105r.P10   3  0  0  3  0
    f105v.P01   4  0  0  8  0
    f105v.P02   3  0  0  5  0
    f105v.P03   6  0  0  9  2
    f105v.P04   4  0  0 12  2
    f105v.P05   2  0  0  5  0
    f105v.P06   3  0  1  7  0
    f105v.P07   2  0  0  5  1
    f105v.P08   4  0  1  6  0
    f105v.P09   3  0  0  8  0
    f105v.P10   7  0  2 18  0
    f106r.P01   3  0  0  0  0
    f106r.P02   4  0  0  4  0
    f106r.P03   2  0  1  1  0
    f106r.P04   3  0  1  4  0
    f106r.P05   2  0  0  5  0
    f106r.P06   3  0  4  4  0
    f106r.P07   4  0  2  8  0
    f106r.P08   2  0  1  0  0
    f106r.P09   3  0  3  3  0
    f106r.P10   5  0  3 12  0
    f106r.P11   2  0  1  2  0
    f106r.P12   2  0  1  2  0
    f106r.P13   2  0  0  3  0
    f106r.P14   4  0  0  8  1
    f106r.P15   6  0  7  9  0
    f106v.P01   4  0  5  5  0
    f106v.P02   4  0  4  9  0
    f106v.P03   2  0  2  3  0
    f106v.P04   4  0  1  5  1
    f106v.P05   3  0  1  1  0
    f106v.P06   2  0  2  3  0
    f106v.P07   2  0  1  3  0
    f106v.P08   4  0  2  5  0
    f106v.P09   2  0  1  1  0
    f106v.P10   4  0  1  7  0
    f106v.P11   2  0  0  3  0
    f106v.P12   3  0  0  5  0
    f106v.P13   3  0  0  3  0
    f106v.P14   2  0  0  3  0
    f106v.P15   6  0  4  9  0
    f107r.P01   3  0  3  4  0
    f107r.P02   4  0  3  5  0
    f107r.P03   5  1  3  9  0
    f107r.P04   3  0  2  8  0
    f107r.P05   2  0  0  3  0
    f107r.P06   3  0  0  5  0
    f107r.P07   3  0  1  8  2
    f107r.P08   3  0  2  3  1
    f107r.P09   3  0  0  6  0
    f107r.P10   4  0  2  9  1
    f107r.P11   4  0  2  8  0
    f107r.P12   4  0  4  6  0
    f107r.P13   3  0  2  6  0
    f107r.P14   3  0  1  8  0
    f107r.P15   4  0  6  5  0
    f107v.P01   4  0  5  8  1
    f107v.P02   3  0  2  7  0
    f107v.P03   3  0  3  6  0
    f107v.P04   5  0  5  9  0
    f107v.P05   4  0  2 11  0
    f107v.P06   3  0  1  5  0
    f107v.P07   2  0  3  2  0
    f107v.P08   4  0  3  4  0
    f107v.P09   3  0  1  5  0
    f107v.P10   3  0  3  9  0
    f107v.P11   2  0  2  2  0
    f107v.P12   4  0  4 10  0
    f107v.P13   2  0  3  0  0
    f107v.P14   2  0  0  1  0
    f107v.P15   5  0  6  6  0
    f108r.P01   4  0  0  2  0
    f108r.P02   3  0  0  2  0
    f108r.P03   3  0  1  4  0
    f108r.P04   3  0  3  8  0
    f108r.P05   3  0  3  4  0
    f108r.P06   4  0  2  4  0
    f108r.P07   3  0  2  2  0
    f108r.P08   5  0  3  6  0
    f108r.P09   2  1  0  2  0
    f108r.P10   4  0  1  0  0
    f108r.P11   2  0  0  0  0
    f108r.P12   2  0  0  0  0
    f108r.P13   2  0  1  2  0
    f108r.P14   4  0  3  1  0
    f108r.P15   3  0  2  1  0
    f108r.P16   3  0  1  1  0
    f108v.P01   4  0 99  2  0
    f108v.P02   2  0 99  2  0
    f108v.P03   5  0 99  4  0
    f108v.P04   3  0 99  1  0
    f108v.P05   5  0 99  1  0
    f108v.P06   3  0 99  3  0
    f108v.P07   3  0 99  1  0
    f108v.P08   4  1 99  2  0
    f108v.P09   3  0 99  4  0
    f108v.P10   3  0 99  2  0
    f108v.P11   3  0 99  5  0
    f108v.P12   3  0 99  4  0
    f108v.P13   3  0 99  2  0
    f108v.P14   3  0 99  3  0
    f108v.P15   2  0 99  1  0
    f108v.P16   4  0 99  2  0
    f111r.P01   5  0  1  4  0
    f111r.P02   2  0  1  2  0
    f111r.P03   2  0  3  2  0
    f111r.P04   3  0  2  4  1
    f111r.P05   4  0  3  4  0
    f111r.P06   3  0  0  3  0
    f111r.P07   3  0  4  2  0
    f111r.P08   3  0  4  2  0
    f111r.P09   3  0  2  4  0
    f111r.P10   3  0  2  2  0
    f111r.P11   2  0  2  0  0
    f111r.P12   2  0  3  1  0
    f111r.P13   3  0  4  1  0
    f111r.P14   5  0  5  4  0
    f111r.P15   4  0  4  1  0
    f111r.P16   3  0  4  1  0
    f111r.P17   4  0  6  8  0
    f111v.P01   3  0  5  0  0
    f111v.P02   2  0  0  2  0
    f111v.P03   2  0  3  4  0
    f111v.P04   2  0  3  0  0
    f111v.P05   3  0  6  4  0
    f111v.P06   3  0  7  5  0
    f111v.P07   2  0  6  3  0
    f111v.P08   2  0  7  4  0
    f111v.P09   4  1 14  4  0
    f111v.P10   2  0  3  1  0
    f111v.P11   3  0  5  0  0
    f111v.P12   2  0  8  2  0
    f111v.P13   1  0  5  0  0
    f111v.P14   2  4  4  0  0
    f111v.P15   3  0  7  2  0
    f111v.P16   2  0  3  1  0
    f111v.P17   6  0  9  4  0
    f111v.P18   3  0  8  1  0
    f111v.P19   4  0 10  4  1
    f112r.P01   4  0  2  5  0
    f112r.P02   6  0  2  2  0
    f112r.P03   4  0  1  3  0
    f112r.P04   4  0  2  2  0
    f112r.P05   4  0  1  5  0
    f112r.P06   4  0  2  2  0
    f112r.P07   4  0  1  2  1
    f112r.P08   3  0  2  1  1
    f112r.P09   3  0  2  1  0
    f112r.P10   2  0  1  2  0
    f112r.P11   3  0  2  2  1
    f112r.P12   4  0  1  4  0
    f112v.P01   6  0  2  5  2
    f112v.P02   4  0  2  6  2
    f112v.P03   4  0  4  7  0
    f112v.P04   5  0  5  7  1
    f112v.P05   3  0  4  4  0
    f112v.P06   2  0  3  3  0
    f112v.P07   2  0  0  2  0
    f112v.P08   3  0  3  3  0
    f112v.P09   2  0  0  2  0
    f112v.P10   4  0  1  6  0
    f112v.P11   3  0  3  5  1
    f112v.P12   3  1  3  4  0
    f112v.P13   3  0  2  1  0
    f112v.P14   3  0  2  4  1
    f113r.P01   3  0  1  2  0
    f113r.P02   4  0  2  8  0
    f113r.P03   2  0  0  5  0
    f113r.P04   3  0  1  6  0
    f113r.P05   3  0  0  1  2
    f113r.P06   4  0  1  6  1
    f113r.P07   2  0  2  1  1
    f113r.P08   3  0  0  4  0
    f113r.P09   2  0  0  2  0
    f113r.P10   3  0  2  6  0
    f113r.P11   4  0  4  6  0
    f113r.P12   3  0  2  4  0
    f113r.P13   2  0  0  4  0
    f113r.P14   3  0  2  4  0
    f113r.P15   3  0  0  3  0
    f113r.P16   3  0  1  6  0
    f113r.P17   4  0  3  7  0
    f113v.P01   3  0  2  3  0
    f113v.P02   3  0  1  4  0
    f113v.P03   3  0  0  4  0
    f113v.P04   3  0  0  2  0
    f113v.P05   5  0  2 11  0
    f113v.P06   3  0  0  5  0
    f113v.P07   4  0  2 10  0
    f113v.P08   3  0  0  4  0
    f113v.P09   5  0  1  8  4
    f113v.P10   3  0  2  6  0
    f113v.P11   2  0  0  4  0
    f113v.P12   4  0  4  6  1
    f113v.P13   3  0  2  8  0
    f113v.P14   2  0  1  2  0
    f113v.P15   3  0  3  6  0
    f114r.P01   3  0  0  3  0
    f114r.P02   4  0  1 11  1
    f114r.P03   3  0  0  7  0
    f114r.P04   3  0  0  3  1
    f114r.P05   5  0  3  7  0
    f114r.P06   3  0  0  5  1
    f114r.P07   2  0  2  4  0
    f114r.P08   4  0  0  9  0
    f114r.P09   4  0  5  7  1
    f114r.P10   3  0  2 11  1
    f114r.P11   4  0  3  8  1
    f114r.P12   3  0  2  7  1
    f114r.P13   4  0  4  8  0
    f114v.P01   5  0  8  2  0
    f114v.P02   2  0  0  3  1
    f114v.P03   3  0  3  7  0
    f114v.P04   3  0  3  5  1
    f114v.P05   4  0  1  5  0
    f114v.P06   2  0  1  3  0
    f114v.P07   3  0  2  4  0
    f114v.P08   3  0  1  6  0
    f114v.P09   3  0  1  4  0
    f114v.P10   4  0  2  7  0
    f114v.P11   3  0  0  6  0
    f114v.P12   6  0  2 13  1
    f115r.P01   3  0  1  0  0
    f115r.P02   4  0  4  0  0
    f115r.P03   2  0  1  2  0
    f115r.P04   3  0  3  4  0
    f115r.P05   6  0  6  4  0
    f115r.P06   3  0  1  2  0
    f115r.P07   4  0  1  2  0
    f115r.P08   3  0  1  1  1
    f115r.P09   6  0  0 10  1
    f115r.P10   2  1  2  2  0
    f115r.P11   3  0  1  4  0
    f115r.P12   2  0  1  1  0
    f115r.P13   4  0  4  2  0
    f115v.P01   5  0  4  4  0
    f115v.P02   2  0  2  3  0
    f115v.P03   3  0  2  3  0
    f115v.P04   2  0  0  2  0
    f115v.P05   5  0  7  1  0
    f115v.P06   3  0  5  2  0
    f115v.P07   3  0  3  2  0
    f115v.P08   5  0  5  2  0
    f115v.P09   2  0  0  1  0
    f115v.P10   3  0  2  1  0
    f115v.P11   3  0  2  2  0
    f115v.P12   4  0  5  2  2
    f115v.P13   5  0  1  3  0
    f116r.P01   3  0  6  0  1
    f116r.P02   3  0  4  2  0
    f116r.P03   3  0  5  0  0
    f116r.P04   3  0  7  3  0
    f116r.P05   2  0  5  0  0
    f116r.P06   3  1  2  1  0
    f116r.P07   3  0  7  2  0
    f116r.P08   3  0  8  0  0
    f116r.P09   3  0  5  1  0
    f116r.P10   4  1  5  4  0
    f116r.P11  15  3 31  8  0
    f116r.P12   5  3  8  0  0

  Here is the same data, minus the line counts and the "f1" prefix,
  with identical entries fused together:
  
    page     i0 i1 i2 i3 pages with same counts
    -------  -- -- -- -- -------------------------
    03r.P01   0  3  0  0 03r.P01 03v.P07 07v.P13 11v.P04
    03r.P02   1  3  0  0 03r.P02
    03r.P03   0  1  0  0 03r.P03 03v.P12 04r.P05 06r.P08 08r.P10 15r.P01
    03r.P04   0  7  4  0 03r.P04 11v.P08
    03r.P05   0  5  1  0 03r.P05 08v.P04 16r.P09
    03r.P06   0  1  1  0 03r.P06,08 06r.P03 06v.P05,09 08r.P16 15r.P12
    03r.P07   0  6  3  0 03r.P07 04v.P12 11v.P07
    03r.P09   0  6  0  0 03r.P09
    03r.P10   0  3  1  0 03r.P10 08r.P14 11r.P12 11v.P10 11v.P16
    03r.P11   0  0  6  0 03r.P11 04r.P11 07r.P09 14v.P11
    03r.P12   0  0  1  0 03r.P12,15,16 07v.P14 15v.P09
    03r.P13   0  4  0  0 03r.P13 15r.P02 
    03r.P14   0  2  0  0 03r.P14 04v.P02 11r.P11
    03r.P17   1  0  2  0 03r.P17 08r.P09
    03r.P18   0  4  3  0 03r.P18
    03r.P19   0  0  2  0 03r.P19 08r.P01,02 08v.P13 11v.P02 12v.P07,09 13r.P09 13v.P04 15v.P04
    03v.P01   0  2  2  0 03v.P01,03,10 07v.P11 08r.P07 08v.P02 11r.P10 12r.P02,04,06 15v.P11
    03v.P02   0  3  5  0 03v.P02 07r.P02
    03v.P04   1  2  0  0 03v.P04
    03v.P05   0  4  2  0 03v.P05 03v.P13 08v.P01 11r.P07 11r.P08 15r.P13 16r.P02
    03v.P06   0  1  3  0 03v.P06 04v.P11 06v.P07 08v.P14 12r.P03 14v.P06 15v.P13
    03v.P08   3  7  3  1 03v.P08
    03v.P09   0  0  4  0 03v.P09 04v.P08 05r.P04 06r.P02 13r.P08,13 13v.P03,08,11
    03v.P11   0  2  4  0 03v.P11 08r.P06 08v.P09 11r.P09 13r.P12,14 14r.P07 14v.P07
    03v.P14   0  7  2  0 03v.P14 11v.P15 16r.P07
    04r.P01   0  1  5  2 04r.P01
    04r.P02   0  5  4  0 04r.P02 04v.P10 11r.P14
    04r.P03   0  0  0  0 04r.P03 05r.P07 06r.P01 08r.P11,12
    04r.P04   0  2  6  0 04r.P04 07r.P13 13r.P10 13v.P10
    04r.P06   1  1  4  0 04r.P06
    04r.P07   0  2  5  0 04r.P07 06v.P08 12r.P01
    04r.P08   0  0  7  0 04r.P08 14r.P03
    04r.P09   0  2  7  0 04r.P09 07v.P02 14v.P10
    04r.P10   0  0  5  0 04r.P10 04v.P09 05v.P02,05 06r.P05 06v.P12 07r.P06 13r.P03 13v.P06
    04r.P12   0  1  6  1 04r.P12 13r.P06
    04r.P13   0  3  8  0 04r.P13 08r.P04
    04v.P01   0  2  9  0 04v.P01
    04v.P03   0  1  6  0 04v.P03 05v.P08 12v.P10 13r.P04,16 14v.P08
    04v.P04   0  1  5  0 04v.P04 07v.P06 07v.P09 08v.P11 12r.P05 14v.P05
    04v.P05   0  3  3  1 04v.P05
    04v.P06   0  1  2  0 04v.P06 06r.P11,12 08r.P13 08v.P10 11r.P02 12r.P10 13r.P01 13v.P14 15r.P03,06,07
    04v.P07   0  1 11  1 04v.P07 14r.P02
    04v.P13   0  3  4  0 04v.P13 07r.P01 07v.P08 08r.P05 11r.P05 11v.P03 15r.P04
    05r.P01   1  0  8  0 05r.P01
    05r.P02   0  0  3  2 05r.P02
    05r.P03   0  0  4  1 05r.P03 05r.P08 05r.P09
    05r.P05   0  0 15  0 05r.P05
    05r.P06   0  1  2  1 05r.P06 12r.P07
    05r.P10   0  0  3  0 05r.P10 06r.P13 06v.P11,13,14 07r.P05 13r.P15 14r.P01
    05v.P01   0  0  8  0 05v.P01 05v.P09
    05v.P03   0  0  9  2 05v.P03
    05v.P04   0  0 12  2 05v.P04
    05v.P06   0  1  7  0 05v.P06 06v.P10
    05v.P07   0  0  5  1 05v.P07 14r.P06
    05v.P10   0  2 18  0 05v.P10
    06r.P04   0  1  4  0 06r.P04 08r.P03 11r.P01 12r.P12 13v.P02 14v.P09 15r.P11
    06r.P06   0  4  4  0 06r.P06 08v.P03,12 12v.P05 15v.P01
    06r.P07   0  2  8  0 06r.P07 07r.P04,11 13r.P02 13v.P13
    06r.P09   0  3  3  0 06r.P09 12v.P06 12v.P08
    06r.P10   0  3 12  0 06r.P10
    06r.P14   0  0  8  1 06r.P14
    06r.P15   0  7  9  0 06r.P15
    06v.P01   0  5  5  0 06v.P01
    06v.P02   0  4  9  0 06v.P02 06v.P15
    06v.P03   0  2  3  0 06v.P03,06 13v.P01 15v.P02,03
    06v.P04   0  1  5  1 06v.P04
    07r.P03   1  3  9  0 07r.P03
    07r.P07   0  1  8  2 07r.P07
    07r.P08   0  2  3  1 07r.P08
    07r.P10   0  2  9  1 07r.P10
    07r.P12   0  4  6  0 07r.P12 13r.P11
    07r.P14   0  1  8  0 07r.P14
    07r.P15   0  6  5  0 07r.P15
    07v.P01   0  5  8  1 07v.P01
    07v.P03   0  3  6  0 07v.P03 08r.P08 13v.P15
    07v.P04   0  5  9  0 07v.P04
    07v.P05   0  2 11  0 07v.P05 13v.P05
    07v.P07   0  3  2  0 07v.P07 08v.P16 11r.P03 15v.P07
    07v.P10   0  3  9  0 07v.P10
    07v.P12   0  4 10  0 07v.P12
    07v.P15   0  6  6  0 07v.P15
    08r.P15   0  2  1  0 08r.P15 08v.P05,15 12r.P09 12v.P13 15v.P10
    08v.P07   0  9  1  0 08v.P07
    08v.P08   1  6  2  0 08v.P08
    11r.P04   0  2  4  1 11r.P04 12v.P14
    11r.P13   0  4  1  0 11r.P13 11r.P15 11r.P16
    11r.P17   0  6  8  0 11r.P17
    11v.P01   0  5  0  0 11v.P01 11v.P11 11v.P13 16r.P03 16r.P05
    11v.P05   0  6  4  0 11v.P05 15r.P05
    11v.P06   0  7  5  0 11v.P06
    11v.P09   1 14  4  0 11v.P09
    11v.P12   0  8  2  0 11v.P12 14v.P01
    11v.P14   4  4  0  0 11v.P14
    11v.P17   0  9  4  0 11v.P17
    11v.P18   0  8  1  0 11v.P18
    11v.P19   0 10  4  1 11v.P19
    12r.P08   0  2  1  1 12r.P08 13r.P07
    12r.P11   0  2  2  1 12r.P11
    12v.P01   0  2  5  2 12v.P01
    12v.P02   0  2  6  2 12v.P02
    12v.P03   0  4  7  0 12v.P03
    12v.P04   0  5  7  1 12v.P04 14r.P09
    12v.P11   0  3  5  1 12v.P11 14v.P04
    12v.P12   1  3  4  0 12v.P12
    13r.P05   0  0  1  2 13r.P05
    13r.P17   0  3  7  0 13r.P17 14r.P05 14v.P03
    13v.P07   0  2 10  0 13v.P07
    13v.P09   0  1  8  4 13v.P09
    13v.P12   0  4  6  1 13v.P12
    14r.P04   0  0  3  1 14r.P04 14v.P02
    14r.P08   0  0  9  0 14r.P08
    14r.P10   0  2 11  1 14r.P10
    14r.P11   0  3  8  1 14r.P11
    14r.P12   0  2  7  1 14r.P12
    14r.P13   0  4  8  0 14r.P13
    14v.P12   0  2 13  1 14v.P12
    15r.P08   0  1  1  1 15r.P08
    15r.P09   0  0 10  1 15r.P09
    15r.P10   1  2  2  0 15r.P10
    15v.P05   0  7  1  0 15v.P05
    15v.P06   0  5  2  0 15v.P06 15v.P08 15v.P12
    16r.P01   0  6  0  1 16r.P01
    16r.P04   0  7  3  0 16r.P04 08v.P06
    16r.P06   1  2  1  0 16r.P06
    16r.P08   0  8  0  0 16r.P08
    16r.P10   1  5  4  0 16r.P10
    16r.P11   3 31  8  0 16r.P11
    16r.P12   3  8  0  0 16r.P12
  
  Here it is again, piped through 
  
    sort-distr -s 12 -n 4 -d -g -fs -bs -r 1 -p .stars-para-dist.ppm 
    
  The justification for "-g" is that there may be confusion between
  -i^nd and -i^(n+1)d, so the suffixes are in a sense arranged in a line.
  
    page     i0 i1 i2 i3 pages with same counts
    -------  -- -- -- -- -------------------------
    03r.P01   0  3  0  0 03r.P01 03v.P07 07v.P13 11v.P04
    03r.P02   1  3  0  0 03r.P02
    03r.P03   0  1  0  0 03r.P03 03v.P12 04r.P05 06r.P08 08r.P10 15r.P01
    03r.P04   0  7  4  0 03r.P04 11v.P08
    03r.P05   0  5  1  0 03r.P05 08v.P04 16r.P09
    03r.P06   0  1  1  0 03r.P06,08 06r.P03 06v.P05,09 08r.P16 15r.P12
    03r.P07   0  6  3  0 03r.P07 04v.P12 11v.P07
    03r.P09   0  6  0  0 03r.P09
    03r.P10   0  3  1  0 03r.P10 08r.P14 11r.P12 11v.P10 11v.P16
    03r.P11   0  0  6  0 03r.P11 04r.P11 07r.P09 14v.P11
    03r.P12   0  0  1  0 03r.P12,15,16 07v.P14 15v.P09
    03r.P13   0  4  0  0 03r.P13 15r.P02 
    03r.P14   0  2  0  0 03r.P14 04v.P02 11r.P11
    03r.P17   1  0  2  0 03r.P17 08r.P09
    03r.P18   0  4  3  0 03r.P18
    03r.P19   0  0  2  0 03r.P19 08r.P01,02 08v.P13 11v.P02 12v.P07,09 13r.P09 13v.P04 15v.P04
    03v.P01   0  2  2  0 03v.P01,03,10 07v.P11 08r.P07 08v.P02 11r.P10 12r.P02,04,06 15v.P11
    03v.P02   0  3  5  0 03v.P02 07r.P02
    03v.P04   1  2  0  0 03v.P04
    03v.P05   0  4  2  0 03v.P05 03v.P13 08v.P01 11r.P07 11r.P08 15r.P13 16r.P02
    03v.P06   0  1  3  0 03v.P06 04v.P11 06v.P07 08v.P14 12r.P03 14v.P06 15v.P13
    03v.P08   3  7  3  1 03v.P08
    03v.P09   0  0  4  0 03v.P09 04v.P08 05r.P04 06r.P02 13r.P08,13 13v.P03,08,11
    03v.P11   0  2  4  0 03v.P11 08r.P06 08v.P09 11r.P09 13r.P12,14 14r.P07 14v.P07
    03v.P14   0  7  2  0 03v.P14 11v.P15 16r.P07
    04r.P01   0  1  5  2 04r.P01
    04r.P02   0  5  4  0 04r.P02 04v.P10 11r.P14
    04r.P03   0  0  0  0 04r.P03 05r.P07 06r.P01 08r.P11,12
    04r.P04   0  2  6  0 04r.P04 07r.P13 13r.P10 13v.P10
    04r.P06   1  1  4  0 04r.P06
    04r.P07   0  2  5  0 04r.P07 06v.P08 12r.P01
    04r.P08   0  0  7  0 04r.P08 14r.P03
    04r.P09   0  2  7  0 04r.P09 07v.P02 14v.P10
    04r.P10   0  0  5  0 04r.P10 04v.P09 05v.P02,05 06r.P05 06v.P12 07r.P06 13r.P03 13v.P06
    04r.P12   0  1  6  1 04r.P12 13r.P06
    04r.P13   0  3  8  0 04r.P13 08r.P04
    04v.P01   0  2  9  0 04v.P01
    04v.P03   0  1  6  0 04v.P03 05v.P08 12v.P10 13r.P04,16 14v.P08
    04v.P04   0  1  5  0 04v.P04 07v.P06 07v.P09 08v.P11 12r.P05 14v.P05
    04v.P05   0  3  3  1 04v.P05
    04v.P06   0  1  2  0 04v.P06 06r.P11,12 08r.P13 08v.P10 11r.P02 12r.P10 13r.P01 13v.P14 15r.P03,06,07
    04v.P07   0  1 11  1 04v.P07 14r.P02
    04v.P13   0  3  4  0 04v.P13 07r.P01 07v.P08 08r.P05 11r.P05 11v.P03 15r.P04
    05r.P01   1  0  8  0 05r.P01
    05r.P02   0  0  3  2 05r.P02
    05r.P03   0  0  4  1 05r.P03 05r.P08 05r.P09
    05r.P05   0  0 15  0 05r.P05
    05r.P06   0  1  2  1 05r.P06 12r.P07
    05r.P10   0  0  3  0 05r.P10 06r.P13 06v.P11,13,14 07r.P05 13r.P15 14r.P01
    05v.P01   0  0  8  0 05v.P01 05v.P09
    05v.P03   0  0  9  2 05v.P03
    05v.P04   0  0 12  2 05v.P04
    05v.P06   0  1  7  0 05v.P06 06v.P10
    05v.P07   0  0  5  1 05v.P07 14r.P06
    05v.P10   0  2 18  0 05v.P10
    06r.P04   0  1  4  0 06r.P04 08r.P03 11r.P01 12r.P12 13v.P02 14v.P09 15r.P11
    06r.P06   0  4  4  0 06r.P06 08v.P03,12 12v.P05 15v.P01
    06r.P07   0  2  8  0 06r.P07 07r.P04,11 13r.P02 13v.P13
    06r.P09   0  3  3  0 06r.P09 12v.P06 12v.P08
    06r.P10   0  3 12  0 06r.P10
    06r.P14   0  0  8  1 06r.P14
    06r.P15   0  7  9  0 06r.P15
    06v.P01   0  5  5  0 06v.P01
    06v.P02   0  4  9  0 06v.P02 06v.P15
    06v.P03   0  2  3  0 06v.P03,06 13v.P01 15v.P02,03
    06v.P04   0  1  5  1 06v.P04
    07r.P03   1  3  9  0 07r.P03
    07r.P07   0  1  8  2 07r.P07
    07r.P08   0  2  3  1 07r.P08
    07r.P10   0  2  9  1 07r.P10
    07r.P12   0  4  6  0 07r.P12 13r.P11
    07r.P14   0  1  8  0 07r.P14
    07r.P15   0  6  5  0 07r.P15
    07v.P01   0  5  8  1 07v.P01
    07v.P03   0  3  6  0 07v.P03 08r.P08 13v.P15
    07v.P04   0  5  9  0 07v.P04
    07v.P05   0  2 11  0 07v.P05 13v.P05
    07v.P07   0  3  2  0 07v.P07 08v.P16 11r.P03 15v.P07
    07v.P10   0  3  9  0 07v.P10
    07v.P12   0  4 10  0 07v.P12
    07v.P15   0  6  6  0 07v.P15
    08r.P15   0  2  1  0 08r.P15 08v.P05,15 12r.P09 12v.P13 15v.P10
    08v.P07   0  9  1  0 08v.P07
    08v.P08   1  6  2  0 08v.P08
    11r.P04   0  2  4  1 11r.P04 12v.P14
    11r.P13   0  4  1  0 11r.P13 11r.P15 11r.P16
    11r.P17   0  6  8  0 11r.P17
    11v.P01   0  5  0  0 11v.P01 11v.P11 11v.P13 16r.P03 16r.P05
    11v.P05   0  6  4  0 11v.P05 15r.P05
    11v.P06   0  7  5  0 11v.P06
    11v.P09   1 14  4  0 11v.P09
    11v.P12   0  8  2  0 11v.P12 14v.P01
    11v.P14   4  4  0  0 11v.P14
    11v.P17   0  9  4  0 11v.P17
    11v.P18   0  8  1  0 11v.P18
    11v.P19   0 10  4  1 11v.P19
    12r.P08   0  2  1  1 12r.P08 13r.P07
    12r.P11   0  2  2  1 12r.P11
    12v.P01   0  2  5  2 12v.P01
    12v.P02   0  2  6  2 12v.P02
    12v.P03   0  4  7  0 12v.P03
    12v.P04   0  5  7  1 12v.P04 14r.P09
    12v.P11   0  3  5  1 12v.P11 14v.P04
    12v.P12   1  3  4  0 12v.P12
    13r.P05   0  0  1  2 13r.P05
    13r.P17   0  3  7  0 13r.P17 14r.P05 14v.P03
    13v.P07   0  2 10  0 13v.P07
    13v.P09   0  1  8  4 13v.P09
    13v.P12   0  4  6  1 13v.P12
    14r.P04   0  0  3  1 14r.P04 14v.P02
    14r.P08   0  0  9  0 14r.P08
    14r.P10   0  2 11  1 14r.P10
    14r.P11   0  3  8  1 14r.P11
    14r.P12   0  2  7  1 14r.P12
    14r.P13   0  4  8  0 14r.P13
    14v.P12   0  2 13  1 14v.P12
    15r.P08   0  1  1  1 15r.P08
    15r.P09   0  0 10  1 15r.P09
    15r.P10   1  2  2  0 15r.P10
    15v.P05   0  7  1  0 15v.P05
    15v.P06   0  5  2  0 15v.P06 15v.P08 15v.P12
    16r.P01   0  6  0  1 16r.P01
    16r.P04   0  7  3  0 16r.P04 08v.P06
    16r.P06   1  2  1  0 16r.P06
    16r.P08   0  8  0  0 16r.P08
    16r.P10   1  5  4  0 16r.P10
    16r.P11   3 31  8  0 16r.P11
    16r.P12   3  8  0  0 16r.P12

    totals
     54  2 46 27  0
     46  4 40 31  1
     45  1 18 63  3
     44  0 26 59  2
     37  1  1 47  6
     38  0  4 83  5
     47  0 24 65  1
     47  0 24 65  1
     51  1 31 93  4
     49  0 43 85  1
     50  1 22 39  0
     53  1  ? 39  0
     54  0 50 45  1
     51  5 b3 41  1
     45  0 19 31  3
     47  1 34 59  7
     51  0 21 75  4
     49  0 20 83  5
     45  0 22 90  7
     41  0 24 65  3
     45  1 26 34  2
     45  0 38 28  2
     30  2 54 13  1
     20  6 39  8  0

97-11-29 stolfi
===============

  Created a sed script "fnum-to-pnum" that maps "f" page numbers
  (like f66r2) to sequential numbers 001-266.  Note that missing
  pages are included too.

    gawk '/@/{n++; printf "%s p%03d\n", $1, n; next} /-/{print; next}' 
    
97-11-30 stolfi
===============

  Discovered that the smooth gradient in Denis's page counts
  is not surprising: since two of the counts dominate,
  and my routine normalizes them to unit sum, the data 
  is inherently unidimensional.
  
  Here is an attempt to reorder the stars pages by hand so as 
  to make the ratio count(-iin)/count(-in) more uniform:
  
    page     words -iiin  -iin   -in    -n   ratio
    -------- ----- ----- ----- ----- -----   -----
    f103r TA   526     0    33    41     2   0.446
    f103v TB   454     1    34    37     4   0.479
    f108r TK   494     0    39    22     1   0.639
    f108v TL   581     0    52    39     1   0.571
    f104r TC   448     1    66    17     1   0.795
    f104v TD   477     3    59    24     0   0.711
    f107r TI   487     4    93    30     1   0.756
    f107v TJ   462     1    84    43     0   0.661
    f114r TS   460     5    91    23     0   0.798
    f114v TT   376     2    68    23     0   0.747
    f106r TG   432     1    65    24     0   0.730
    f106v TH   444     1    67    23     0   0.744
    f113r TQ   528     4    79    21     0   0.790
    f113v TR   502     5    84    20     0   0.808
    f105r TE   379     6    48     1     1   0.980
    f105v TF   399     5    85     4     0   0.955
    f112r TO   401     3    32    21     0   0.604
    f112v TP   420     7    60    33     1   0.645
    f115r TU   461     1    40    21     2   0.656
    f115v TV   410     2    32    33     0   0.492
    f111r TM   623     1    44    51     0   0.463
    f111v TN   568     1    41   113     6   0.266
    f116r TW   554     1    25    90     8   0.217
    f116v TW     0     0     0     0     0   0.000

  Creating a picture of this sorted data:
  
    sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-bh-dist.gif  
    xv .stars-bh-dist.gif 

  Another attempt:
  
    page     words -iiin  -iin   -in    -n   ratio
    -------- ----- ----- ----- ----- -----   -----
    f103r TA   526     0    33    41     2   0.446
    f103v TB   454     1    34    37     4   0.479
    f108r TK   494     0    39    22     1   0.639
    f108v TL   581     0    52    39     1   0.571
    f104r TC   448     1    66    17     1   0.795
    f104v TD   477     3    59    24     0   0.711
    f107r TI   487     4    93    30     1   0.756
    f107v TJ   462     1    84    43     0   0.661
    f113r TQ   528     4    79    21     0   0.790
    f113v TR   502     5    84    20     0   0.808
    f105r TE   379     6    48     1     1   0.980
    f105v TF   399     5    85     4     0   0.955
    f114r TS   460     5    91    23     0   0.798
    f114v TT   376     2    68    23     0   0.747
    f106r TG   432     1    65    24     0   0.730
    f106v TH   444     1    67    23     0   0.744
    f112r TO   401     3    32    21     0   0.604
    f112v TP   420     7    60    33     1   0.645
    f115r TU   461     1    40    21     2   0.656
    f115v TV   410     2    32    33     0   0.492
    f111r TM   623     1    44    51     0   0.463
    f111v TN   568     1    41   113     6   0.266
    f116r TW   554     1    25    90     8   0.217
    f116v TW     0     0     0     0     0   0.000
  
    sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-h2-dist.gif  
    xv .stars-h2-dist.gif 

  Yet nother attempt:
  
    page     words -iiin  -iin   -in    -n   ratio
    -------- ----- ----- ----- ----- -----   -----
    f103r TA   526     0    33    41     2   0.446
    f103v TB   454     1    34    37     4   0.479
    f108r TK   494     0    39    22     1   0.639
    f108v TL   581     0    52    39     1   0.571
    f104r TC   448     1    66    17     1   0.795
    f104v TD   477     3    59    24     0   0.711
    f112r TO   401     3    32    21     0   0.604
    f112v TP   420     7    60    33     1   0.645
    f113r TQ   528     4    79    21     0   0.790
    f113v TR   502     5    84    20     0   0.808
    f105r TE   379     6    48     1     1   0.980
    f105v TF   399     5    85     4     0   0.955
    f114r TS   460     5    91    23     0   0.798
    f114v TT   376     2    68    23     0   0.747
    f106r TG   432     1    65    24     0   0.730
    f106v TH   444     1    67    23     0   0.744
    f107r TI   487     4    93    30     1   0.756
    f107v TJ   462     1    84    43     0   0.661
    f115r TU   461     1    40    21     2   0.656
    f115v TV   410     2    32    33     0   0.492
    f111r TM   623     1    44    51     0   0.463
    f111v TN   568     1    41   113     6   0.266
    f116r TW   554     1    25    90     8   0.217
    f116v TW     0     0     0     0     0   0.000
    
    sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-h3-dist.gif  
    xv .stars-h3-dist.gif 

  Let's look at f58r/f58v too:
  
    foreach s ( n in iin iiin )
      cat L16-eva/f58r.P | egrep '[^i]'"$s"'[-,. =]' > .f58r-$s.evt
    end
  
    page     words -iiin  -iin   -in    -n   ratio
    -------- ----- ----- ----- ----- -----   -----
    f058r HB   362     0    29     1     0   0.967
  
  Let's have a closer look at the occurrences of "daiin" in the stars section:

    rm -f .daiin-stars.occs
    foreach f ( L16-eva/f{103,104,105,106,107,108,111,112,113,114,115,116}{r,v}.P* )
      echo $f
      echo '# '$f >> .daiin-stars.occs
      cat $f | egrep '[-= ,.]daiin|^#' >> .daiin-stars.occs
    end
    
  Edited .daiin-stars.occs by hand, removing/adding adjacent words until
  each occurrence of "daiin" is on a separate line with 2 words on either side.
  Result: 208 occurrences of "daiin" in the stars section.

  Let's look also at "saiin":

    rm -f .saiin-stars.occs
    foreach f ( L16-eva/f{103,104,105,106,107,108,111,112,113,114,115,116}{r,v}.P* )
      echo $f
      echo '# '$f >> .saiin-stars.occs
      cat $f | egrep '[-= ,.]saiin|^#' >> .saiin-stars.occs
    end

  Many of the "daiin" and "saiin" are at the beginning of a line (but
  not the first of the paragraph).
  
  Some of them are at the end of paragraph.

  These are the words that occur near "daiin": 

    ct rfreq cfreq word           ct rfreq cfreq word            ct rfreq cfreq word       
    -- ----- ----- -----------    -- ----- ----- -----------     -- ----- ----- -----------
     7 0.034 0.034 cheo            8 0.040 0.040 chedy            6 0.029 0.029 chedy
     5 0.024 0.058 chedy           7 0.035 0.075 chey             5 0.024 0.053 okeey
     5 0.024 0.082 qokeey          5 0.025 0.100 cheey            5 0.024 0.077 qokeey
     3 0.014 0.096 oteo            5 0.025 0.125 shey             4 0.019 0.096 daiin
     3 0.014 0.111 sheeo           3 0.015 0.140 al               3 0.014 0.111 ar
     2 0.010 0.120 chckhy          3 0.015 0.155 ar               3 0.014 0.125 chol
     2 0.010 0.130 chdy            3 0.015 0.170 daiin            3 0.014 0.139 lchedy
     2 0.010 0.139 cheeo           3 0.015 0.185 sheol            3 0.014 0.154 lshey
     2 0.010 0.149 chey            2 0.010 0.195 char             3 0.014 0.168 okaiin
     2 0.010 0.159 daiin           2 0.010 0.205 chedar           3 0.014 0.183 qokaiin

     2 0.010 0.168 dal             2 0.010 0.215 cheol            3 0.014 0.197 qokal         
     2 0.010 0.178 dalam           2 0.010 0.225 chl              3 0.014 0.212 qokeedy       
     2 0.010 0.188 keeo            2 0.010 0.235 okar             3 0.014 0.226 qotchedy      
     2 0.010 0.197 llchey          2 0.010 0.245 okeey            2 0.010 0.236 aiin          
     2 0.010 0.207 okal            2 0.010 0.255 ol               2 0.010 0.245 chedal        
     2 0.010 0.216 ol              2 0.010 0.265 otal             2 0.010 0.255 chodaiin      
     2 0.010 0.226 otal            2 0.010 0.275 otaral           2 0.010 0.264 dal           
     2 0.010 0.236 qokchedy        2 0.010 0.285 otedy            2 0.010 0.274 lkeey         
     2 0.010 0.245 qokeeal         2 0.010 0.295 oteey            2 0.010 0.284 lshedy        
     2 0.010 0.255 qokeeo          2 0.010 0.305 qokchdy          2 0.010 0.293 oky           

     2 0.010 0.264 qopchedy        2 0.010 0.315 qotalal          2 0.010 0.303 otar          
     2 0.010 0.274 sheey           2 0.010 0.325 shaiin           2 0.010 0.312 otedy         
     2 0.010 0.284 shockhy         2 0.010 0.335 shedy            2 0.010 0.322 oteey         
     2 0.010 0.293 ycheo           2 0.010 0.345 sheed            2 0.010 0.332 oteody        
     1 0.005 0.298 acthy           2 0.010 0.355 sheey            2 0.010 0.341 qodaiin       
     1 0.005 0.303 aiin            2 0.010 0.365 shek             2 0.010 0.351 qokar         
     1 0.005 0.308 ainkam          2 0.010 0.375 sheody           2 0.010 0.361 qokedy        
     1 0.005 0.312 al              2 0.010 0.385 shody            2 0.010 0.370 qokeol        
     1 0.005 0.317 alky            2 0.010 0.395 shol             2 0.010 0.380 qoky          
     1 0.005 0.322 alol            1 0.005 0.400 aiin             2 0.010 0.389 qoty          
     1 0.005 0.327 am              1 0.005 0.405 airols           2 0.010 0.399 saiin         
     1 0.005 0.332 ar              1 0.005 0.410 aky              2 0.010 0.409 sheey         
     1 0.005 0.337 aralary         1 0.005 0.415 alaiin           2 0.010 0.418 tchedy        
     1 0.005 0.341 archcthy        1 0.005 0.420 alal             2 0.010 0.428 teeedy        
     1 0.005 0.346 chcphydy        1 0.005 0.425 aldair           1 0.005 0.433 *asor         
     1 0.005 0.351 chdaly          1 0.005 0.430 alsar            1 0.005 0.438 akaiin        
     1 0.005 0.356 chea            1 0.005 0.435 aral             1 0.005 0.442 chckhaiin     
     1 0.005 0.361 chedaiin        1 0.005 0.440 aroteey          1 0.005 0.447 chcphedy      
     1 0.005 0.365 chedal          1 0.005 0.445 chckhy           1 0.005 0.452 chdar         
     1 0.005 0.370 chedyrl         1 0.005 0.450 chcthar          1 0.005 0.457 chdor         
     1 0.005 0.375 cheeey          1 0.005 0.455 chcthdy          1 0.005 0.462 chdy          
     1 0.005 0.380 cheey           1 0.005 0.460 chcthed          1 0.005 0.466 cheal         
     1 0.005 0.385 cheky           1 0.005 0.465 chcthy           1 0.005 0.471 chear         
     1 0.005 0.389 cheoda*         1 0.005 0.470 cheaiin          1 0.005 0.476 checkhey      
     1 0.005 0.394 cheody          1 0.005 0.475 cheal            1 0.005 0.481 checkhy       
     1 0.005 0.399 cheol           1 0.005 0.480 checkhy          1 0.005 0.486 cheeal        
     1 0.005 0.404 cheot           1 0.005 0.485 checthal         1 0.005 0.490 cheedy        
     1 0.005 0.409 chllkeey        1 0.005 0.490 ched             1 0.005 0.495 cheeky        
     1 0.005 0.413 cho             1 0.005 0.495 chedaiin         1 0.005 0.500 cheocthy      
     1 0.005 0.418 chockhey        1 0.005 0.500 chedal           1 0.005 0.505 cheodaiin     
     1 0.005 0.423 chodeeal        1 0.005 0.505 cheedy           1 0.005 0.510 chey          
     1 0.005 0.428 chody           1 0.005 0.510 cheeeo           1 0.005 0.514 chocthy       
     1 0.005 0.433 chotam          1 0.005 0.515 cheeir           1 0.005 0.519 chody         
     1 0.005 0.438 chotchedy       1 0.005 0.520 cheeteey         1 0.005 0.524 chokedair     
     1 0.005 0.442 chy             1 0.005 0.525 chekeek          1 0.005 0.529 choty         
     1 0.005 0.447 cphaiin         1 0.005 0.530 cheo             1 0.005 0.534 dchedy        
     1 0.005 0.452 dail            1 0.005 0.535 cheocthy         1 0.005 0.538 deeedy        
     1 0.005 0.457 dala            1 0.005 0.540 cheodaiin        1 0.005 0.543 dol           
     1 0.005 0.462 dched           1 0.005 0.545 cheodar          1 0.005 0.548 dsheeo        
     1 0.005 0.466 dcheo           1 0.005 0.550 cheolor          1 0.005 0.553 eedol         
     1 0.005 0.471 dchol           1 0.005 0.555 chkaiin          1 0.005 0.558 eeykeody      
     1 0.005 0.476 decthdy         1 0.005 0.560 choaiin          1 0.005 0.562 kair          
     1 0.005 0.481 eedy            1 0.005 0.565 chocfhdy         1 0.005 0.567 kal           
     1 0.005 0.486 kar             1 0.005 0.570 chody            1 0.005 0.572 kchdy         
     1 0.005 0.490 kchedy          1 0.005 0.575 chol             1 0.005 0.577 kchedy        
     1 0.005 0.495 kcheo           1 0.005 0.580 cholchey         1 0.005 0.582 keedal        
     1 0.005 0.500 keesho          1 0.005 0.585 chopchy          1 0.005 0.587 keeo          
     1 0.005 0.505 keol            1 0.005 0.590 chotaiin         1 0.005 0.591 kolkair       
     1 0.005 0.510 ky              1 0.005 0.595 chsd             1 0.005 0.596 lcheeol       
     1 0.005 0.514 l               1 0.005 0.600 ckheol           1 0.005 0.601 lechody       
     1 0.005 0.519 larorol         1 0.005 0.605 dal              1 0.005 0.606 lkar          
     1 0.005 0.524 lchedam         1 0.005 0.610 dam              1 0.005 0.611 lkeeol        
     1 0.005 0.529 lkaiiir         1 0.005 0.615 dar              1 0.005 0.615 lkol          
     1 0.005 0.534 lkal            1 0.005 0.620 daram            1 0.005 0.620 lky           
     1 0.005 0.538 lkam            1 0.005 0.625 daryom           1 0.005 0.625 oain          
     1 0.005 0.543 lkeeeady        1 0.005 0.630 dchdos           1 0.005 0.630 oar           
     1 0.005 0.548 lkeo            1 0.005 0.635 dchedar          1 0.005 0.635 ocheey        
     1 0.005 0.553 lkeol           1 0.005 0.640 dckhy            1 0.005 0.639 octhd         
     1 0.005 0.558 lklor           1 0.005 0.645 dshedal          1 0.005 0.644 odair         
     1 0.005 0.562 llod            1 0.005 0.650 lkchedy          1 0.005 0.649 okchey        
     1 0.005 0.567 lm              1 0.005 0.655 lor              1 0.005 0.654 okechey       
     1 0.005 0.572 lteedy          1 0.005 0.660 ochedaiin        1 0.005 0.659 okedy         
     1 0.005 0.577 ochedaiin       1 0.005 0.665 ockhedy          1 0.005 0.663 okeedaiin     
     1 0.005 0.582 ochedal         1 0.005 0.670 octhd            1 0.005 0.668 okeedy        
     1 0.005 0.587 ocheey          1 0.005 0.675 octhdy           1 0.005 0.673 okeeedy       
     1 0.005 0.591 ofam            1 0.005 0.680 octhy            1 0.005 0.678 okeeshy       
     1 0.005 0.596 ofar            1 0.005 0.685 ofchedaiin       1 0.005 0.683 okol          
     1 0.005 0.601 okaiin          1 0.005 0.690 okaiin           1 0.005 0.688 ol            
     1 0.005 0.606 okchedy         1 0.005 0.695 okairdy          1 0.005 0.692 oldaiin       
     1 0.005 0.611 okchey          1 0.005 0.700 okal             1 0.005 0.697 olkchey       
     1 0.005 0.615 okchy           1 0.005 0.705 okchey           1 0.005 0.702 olkeedaiin    
     1 0.005 0.620 okeedy          1 0.005 0.710 okedal           1 0.005 0.707 olkeeey       
     1 0.005 0.625 okey            1 0.005 0.715 okedy            1 0.005 0.712 olshy         
     1 0.005 0.630 oleedy          1 0.005 0.720 okeedaky         1 0.005 0.716 opailo        
     1 0.005 0.635 olkaey          1 0.005 0.725 okeedy           1 0.005 0.721 opchedaiin    
     1 0.005 0.639 olky            1 0.005 0.730 okey             1 0.005 0.726 opcheed       
     1 0.005 0.644 olr             1 0.005 0.735 olaiin           1 0.005 0.731 oraiin        
     1 0.005 0.649 oly             1 0.005 0.740 olam             1 0.005 0.736 otaiin        
     1 0.005 0.654 om              1 0.005 0.745 olkaiin          1 0.005 0.740 otair         
     1 0.005 0.659 opaiin          1 0.005 0.750 olkaiir          1 0.005 0.745 otarar        
     1 0.005 0.663 opaik           1 0.005 0.755 oly              1 0.005 0.750 otchedy       
     1 0.005 0.668 opalam          1 0.005 0.760 opairam          1 0.005 0.755 otchod        
     1 0.005 0.673 opam            1 0.005 0.765 opal             1 0.005 0.760 otechdy       
     1 0.005 0.678 opchdy          1 0.005 0.770 or               1 0.005 0.764 otedal        
     1 0.005 0.683 opchy           1 0.005 0.775 oraiin           1 0.005 0.769 oteor         
     1 0.005 0.688 or              1 0.005 0.780 otar             1 0.005 0.774 oteoy         
     1 0.005 0.692 oram            1 0.005 0.785 oteedaiin        1 0.005 0.779 pcheol        
     1 0.005 0.697 ore             1 0.005 0.790 oteedo           1 0.005 0.784 pchor         
     1 0.005 0.702 orkchdy         1 0.005 0.795 oteol            1 0.005 0.788 pdal          
     1 0.005 0.707 os              1 0.005 0.800 por              1 0.005 0.793 pdaro         
     1 0.005 0.712 osh*o           1 0.005 0.805 qkair            1 0.005 0.798 qckheey       
     1 0.005 0.716 oshey           1 0.005 0.810 qkeodaiin        1 0.005 0.803 qlky          
     1 0.005 0.721 otaiin          1 0.005 0.815 qoair            1 0.005 0.808 qoeedaiin     
     1 0.005 0.726 otaiinodaly     1 0.005 0.820 qoeedaiin        1 0.005 0.812 qoek          
     1 0.005 0.731 otaik           1 0.005 0.825 qoek             1 0.005 0.817 qokairar      
     1 0.005 0.736 otam            1 0.005 0.830 qofchdar         1 0.005 0.822 qokchdy       
     1 0.005 0.740 otar            1 0.005 0.835 qokaiin          1 0.005 0.827 qokchey       
     1 0.005 0.745 otary           1 0.005 0.840 qokchedy         1 0.005 0.832 qokechy       
     1 0.005 0.750 otaryly         1 0.005 0.845 qokchey          1 0.005 0.837 qokedar       
     1 0.005 0.755 otcham          1 0.005 0.850 qokeeo           1 0.005 0.841 qokeeey       
     1 0.005 0.760 otcheo          1 0.005 0.855 qokeeody         1 0.005 0.846 qokeeo        
     1 0.005 0.764 otchey          1 0.005 0.860 qokeey           1 0.005 0.851 qotaiin       
     1 0.005 0.769 oteeey          1 0.005 0.865 qopol            1 0.005 0.856 qotal         
     1 0.005 0.774 oteey           1 0.005 0.870 saiin            1 0.005 0.861 qotar         
     1 0.005 0.779 oteol           1 0.005 0.875 shal             1 0.005 0.865 qotchy        
     1 0.005 0.784 otey            1 0.005 0.880 shechy           1 0.005 0.870 qotear        
     1 0.005 0.788 oto             1 0.005 0.885 sheckhy          1 0.005 0.875 qoteey        
     1 0.005 0.793 pcha            1 0.005 0.890 shecthey         1 0.005 0.880 qoteody       
     1 0.005 0.798 pchal           1 0.005 0.895 shecthy          1 0.005 0.885 qoteol        
     1 0.005 0.803 pcheo           1 0.005 0.900 shedaiin         1 0.005 0.889 r             
     1 0.005 0.808 qckhey          1 0.005 0.905 sheeal           1 0.005 0.894 raiin         
     1 0.005 0.812 qekor           1 0.005 0.910 sheedy           1 0.005 0.899 rain          
     1 0.005 0.817 qodaiin         1 0.005 0.915 sheekchy         1 0.005 0.904 ralom         
     1 0.005 0.822 qokairy         1 0.005 0.920 sheeky           1 0.005 0.909 sair          
     1 0.005 0.827 qokam           1 0.005 0.925 sheet            1 0.005 0.913 sar           
     1 0.005 0.832 qokaram         1 0.005 0.930 sheor            1 0.005 0.918 saraiin       
     1 0.005 0.837 qokchy          1 0.005 0.935 shl              1 0.005 0.923 sheckhy       
     1 0.005 0.841 qokeedaram      1 0.005 0.940 tair             1 0.005 0.928 shedar        
     1 0.005 0.846 qokol           1 0.005 0.945 tchar            1 0.005 0.933 sheed         
     1 0.005 0.851 qopchdy         1 0.005 0.950 teodaiin         1 0.005 0.938 sheedy        
     1 0.005 0.856 qotaiin         1 0.005 0.955 ychedal          1 0.005 0.942 sheeky        
     1 0.005 0.861 qotam           1 0.005 0.960 ycheeo           1 0.005 0.947 sheeodar      
     1 0.005 0.865 qotar           1 0.005 0.965 ydaiin           1 0.005 0.952 sheeol        
     1 0.005 0.870 qotchy          1 0.005 0.970 ykchedy          1 0.005 0.957 solpchd       
     1 0.005 0.875 qotedar         1 0.005 0.975 ykeedan          1 0.005 0.962 tchar         
     1 0.005 0.880 qoteody         1 0.005 0.980 ykeedy           1 0.005 0.966 teeoar        
     1 0.005 0.885 qotey           1 0.005 0.985 yokoey           1 0.005 0.971 teody         
     1 0.005 0.889 qoty            1 0.005 0.990 ytam             1 0.005 0.976 ty            
     1 0.005 0.894 r               1 0.005 0.995 ytar             1 0.005 0.981 ykchedy       
     1 0.005 0.899 raiin           1 0.005 1.000 yteedy           1 0.005 0.986 ykeey         
     1 0.005 0.904 rodam                                          1 0.005 0.990 yteedy        
     1 0.005 0.909 rol                                            1 0.005 0.995 yteeody       
     1 0.005 0.913 ry                                             1 0.005 1.000 yteody        
     1 0.005 0.918 sham           
     1 0.005 0.923 shchy          
     1 0.005 0.928 sheal
     1 0.005 0.933 sheoked
     1 0.005 0.938 shey
     1 0.005 0.942 shod
     1 0.005 0.947 ssheo
     1 0.005 0.952 tchedaiin
     1 0.005 0.957 tedam
     1 0.005 0.962 teeo
     1 0.005 0.966 tolpchy
     1 0.005 0.971 tsho
     1 0.005 0.976 ycheeo
     1 0.005 0.981 yka*om
     1 0.005 0.986 ykcheo
     1 0.005 0.990 ykeeo
     1 0.005 0.995 ykeo
     1 0.005 1.000 ysheo


    Second word before "daiin", sorted by shape:
    -- ----------------------------------------------------------------
     7 aiin 6 okaiin 6 qokaiin 2 lkaiin
     6 chedy 5 lchedy 3 cheey 2 lfchedy
     4 otar 3 qotar
     4 oteedy 3 qokey 2 okeey 
     3 daiin 2 dair
     3 dar
     3 otchedy
     
    first 10 words account for 23% of all "daiin"s
    first 20 words account for 34% of all "daiin"s

    First word before "daiin", sorted by "shape"
    -- ----------------------------------------------------------------
    23 cheo sheeo cheeo chedy chdy chey llchey 
     9 qokeey qokeeo keeo
     3 oteo
     2 daiin  
     2 qokeeal 2 okal 2 otal 
     2 qokchedy   

    first 10 words account for 15% of all "daiin"s
    first 20 words account for 25% of all "daiin"s

    First word after "daiin", Sorted by "shape"
    -- ----------------------------------------------------------------
    40 chedy chey cheey shey shedy sheed sheey sheody shody sheol cheol
     8 al ar ol
     3 daiin

    first 10 words account for 20% of all "daiin"s
    first 20 words account for 30% of all "daiin"s

    Second word after "daiin", Sorted by "shape"
    -- ----------------------------------------------------------------
    30 okeey qokeey qokeedy qotchedy qokedy qokeol lkeey otedy oteey oteody teeedy
    16 chedy chedal lchedy lshey lshedy
     7 qokal qokar otar
     6 daiin qodaiin
     6 okaiin qokaiin
     6 oky qoky qoty
     3 ar
     3 chol
     2 saiin

    first 10 words account for 18% of all "daiin"s
    first 20 words account for 30% of all "daiin"s

  These words are tentative members of the "daiin constellation:
  
    chedy chey shey cheo cheey  
    okeey oteey qokeey qokeedy otedy
    chedar sheol chol 
    okaiin qokaiin

  And these may be associate:

    al ar ol 
 
97-12-01 stolfi
===============

  While trying to redo the label occurrence maps, I noticed this strong correlation between "shedy" and "okal/qokal":
  
    shedy     B p150   427    .   .   .   4   7   6  10   4   2  15  44  46  25  32  44  59   5   1   3   1  31   7   4  14  12  17   9   3   9  13 f77v.L.4;U
    [q]okal   Z p138   314    2   2   4   1   5   5   4   7   9  15  21  22  11  27  28  21   9   4  15   7  13   7   4  17  20   7  10   8   6   3 f72r2.S.5;K

  Now "shedy" is part of a label on f77v (the bottom left tube), and "okal" is Rene/Robert's conjecture for the name of the Sun
  
  Here are some families of labels with clearly similar patterns of reference:

    1 otaiin           Z p137   346    7   4   8   5   3   8   1   7   1   1   4  10  19  23  10   7  18   4   6   5  11  22  15  32  10  11  28  27  28  11 f72r1.S.13;K
    1 aiin             A p127   378    1   3   1   3  11  13   5   4  12   6   4   7  12   6   5   5  37  16   6   9   7  16  32  25  20  22  22  23  19  26 f68v2.R.9;C
    1 oteey            A p119   133    1   2   3   .   1   2   2   1   2   2   3   2   3   4   5   3   1   .   6   6  13   6   3   9  11  11  11   7   5   8 f67r1.S.6;C

    3 okaiin           P p181   758    4   2   9   7  12  24   5   6  12   9  28  33  41  69  42  31  24   6  17   4  28  34  35  44  50  23  54  48  21  36 f89r1.m.3;K
    3 otar             Z p136   184    .   2   .   1   4   3   3   6   5   4   8   5   7   7   5   8  19   2   9   .   6  11  10   9   4   3   5  15  13  10 f71v.S2.4;K
    3 otal             Z p138   198    1   2   .   3   2   4   3   6   4   9   3   9   9  12  14   6  19   2   5   2   6   9   5  15  15   1   9   7   7   9 f72r2.S.1;K

    9 tar              Z p134    47    1   .   .   .   1   2   .   1   3   2   .   .   3   1   4   1   7   1   5   .   1   2   .   2   2   3   3   1   1   . f70v1.S.6;K
    9 okar             T p114   255    .   1   1   1   9  16   6   9  11   9  19   1  10  12   9   6  29   5  17   2   2  11   3   7  14   2  11  13   7  12 f58v.T.1;U
    9 saiin            T p042   238    4   3   5   4   7   8   5   3   5   6  17  15  12   2  11  10   5  19   9   7   8   6   6   4   5  15  20   4   7   6 f22v.T.16;F
    9 orar             P p206    13    .   .   .   .   .   .   .   .   1   .   1   .   .   2   .   1   .   .   .   3   .   1   .   1   .   .   1   .   .   2 f101v1.R1.2;C
    9 otchdy           A p127    47    .   .   .   1   1   .   2   5   1   2   2   .   .   .   1   6   3   .   3   .   .   6   3   2   .   1   .   2   4   2 f68v2.R.8;C

    2 cheody           P p182    76    2   .   2   2   2   .   3   1   3   4   1   .   .   .   .   .   2  11   4   2   4   7   4   2   6   1   .   4   6   3 f89r2.m2.4;L
    2 arody            T p078     9    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   1   .   .   1   .   .   3   .   1   .   1   1   1   . f40v.T.19;F
    2 okair            Z p138    36    .   .   .   .   .   3   .   1   2   1   .   .   .   2   2   .   .   1   2   .   .   3   5   4   4   1   .   1   3   1 f72r2.S.18;K

    4 okam             Z p138    48    1   .   1   2   3   5   2   3   2   1   1   1   .   .   .   .   1   1   2   1   2   2   3   .   4   1   1   1   1   6 f72r2.S.3;K
    4 oaiin            Z p137    65    3   2   1   2   2   3   3   4   4   2   1   .   1   .   .   .   1   5   .   1   4   2   1   1   2   3   6   4   5   2 f72r1.S.8;K

    6 otarar           P p206     6    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   1   .   .   .   .   1   .   1   1   .   .   1   .   1 f101v1.R1.3;C
    6 oram             P p179     9    .   .   .   .   .   .   .   .   .   1   .   .   1   .   .   1   .   .   .   .   .   1   1   .   .   1   2   1   .   . f88r.m.2;L
    6 otam             Z p138    53    1   .   1   2   3   .   1   4   2   2   2   .   .   1   .   1   6   1   2   .   2   3   2   3   2   2   3   5   1   1 f72r2.S.7;K
    6 opchey           Z p134    35    .   .   3   1   .   .   1   2   1   1   2   .   .   1   .   2   4   .   .   1   1   2   3   .   3   1   2   3   1   . f70v1.S.7;K

    7 otody            Z p133    19    .   .   1   .   2   .   1   1   .   .   .   .   .   .   .   .   1   2   3   1   .   1   .   .   .   .   1   2   2   1 f70v2.S1.4;K
    7 chos             P p202    27    .   .   1   2   2   .   1   .   1   2   .   .   .   .   .   .   .   3   2   3   2   .   .   1   .   .   .   2   4   1 f100v.B.13;K
    7 oteol            T p075    42    3   .   1   .   2   1   5   2   .   .   .   1   .   1   .   1   .   2   2   6   3   1   1   1   1   2   .   5   .   1 f39r.T.16;F
    7 chodar           S p124    11    1   2   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3   1   .   1   1   1   .   .   .   .   1   . f68r2.S.22;R
    7 odaiin           P p184   113    7   4   4   5   1   .   2   6   9   1   1   .   1   1   .   1   7   5   6   3   2  11   9   6   5   1   1   .   9   5 f89v1.b.3;L

    a tol              A p121    53    .   1   2   3   1   .   3   4   3   2   3   3   .   5   2   1   2   2   2   3   2   1   .   2   2   1   .   1   .   2 f67v2.C1.1;C
    a or               A p121   296    9   6  12  10  14  17   5   8  17   5  12   8  20   9   2  13  26   9  13  15   8   6  13   1   5   4  10   4   8   7 f67v2.C1.2;C
    a qor              I p117   296    9   6  12  10  14  17   5   8  17   5  12   8  20   9   2  13  26   9  13  15   8   6  13   1   5   4  10   4   8   7 f66r.L.3;F

    d otain            Z p137    21    1   .   1   .   1   1   1   .   .   1   5   2   1   .   3   .   1   .   .   .   .   .   .   .   .   .   2   .   1   . f72r1.S.13;K
    d chety            A p127    20    1   .   .   .   .   1   2   2   2   2   1   .   1   2   1   .   .   1   .   .   1   1   .   .   .   .   .   .   .   2 f68v2.R.4;C

    e daly             I p117    20    .   1   .   .   1   1   3   1   .   4   1   2   .   4   .   .   .   1   .   .   .   .   .   1   .   .   .   .   .   . f66r.L.6;C
    e oky              Z p133   227    3  10   8  11   7  13   9   7   6  13  12   9   7  10  21   8   8   4  13   3  12   2   2   4   4   2   4   4   4   7 f70v2.S1.1;K
    e dar              T p166   291    8  10  13  17  13  14  15   8   7  10  17   7  11   4  11  18  22  12  14   7   7   2   9   7   1   5   .   4  10   8 f85r2.T.1;U
    e oty              Z p133   180    5  13  10  10   3   6   6  10   8   5   7   5   8  15   7   3   5   3   7   2   7   9   1   6   1   2   4   3   3   6 f70v2.S1.2;K
    e otol             T p165   104    2   6   9   6   5   3   7   5   5   2   1   3   2   7   1   3   7   4   2   4   1   5   2   1   2   .   .   3   5   1 f85r1.T.34;F

    g ody              Z p133    56    2   2   3   .   3   4   5   3   3   .   .   .   3   .   2   .   6   3   4   2   2   .   2   3   3   .   1   .   .   . f70v2.S1.1;K
    g okol             P p182   137    8   5   6  10   .   4   7   6   3   4   .   8   7   4   2   5   5  11   9  12   2   5   2   1   6   .   .   .   4   1 f89r2.m1.2;Q

    h dal              Z p133   252    8   3   7  12   7   6   5   3  15  20  14  18   9   6  15  14  10  27   9   8   6   4   4   2   8   3   1   1   5   2 f70v2.S2.10;K
    h okain            T p159    99    .   .   1   3   2   3   .   7   2   9  16  16  11   4   6   2   1   .   1   1   2   2   1   4   1   .   .   3   .   1 f82r.T1.18;F

    i y                T p229    41    4   .   3   3   3   2   3   2   4   .   3   1   6   2   1   .   .   2   1   .   .   .   .   .   .   .   .   1   .   . f114r.T1.34;G
    i cham             Z p138    20    4   1   .   1   1   .   2   1   3   2   .   1   1   .   .   .   .   .   .   .   .   1   .   .   .   .   1   .   .   1 f72r2.S.18;K
    i cphy             A p121    13    2   2   2   .   .   .   1   2   1   .   .   .   .   .   .   .   1   .   1   .   .   .   .   .   .   .   .   .   1   . f67v2.C2.2;C
    i dan              P p181    10    4   .   1   1   .   2   1   .   .   .   .   .   .   .   .   .   .   .   1   .   .   .   .   .   .   .   .   .   .   . f89r1.m.3;K
    i daiin            T p117   980   41  69  54  60  51  70  34  45  46  20  17  19  14  10  24  31  17  63  29  52  22  15  24  13  21  31  11  17  39  21 f66r.W.1;U
    i dchol            S p124    21    2   3   2   2   1   1   .   4   .   1   .   .   1   1   .   .   1   1   .   .   .   .   .   .   .   .   .   .   .   1 f68r2.S.5;R
    i otor             P p184    50    2   7   7   3   1   6   2   3   .   2   .   .   1   .   .   .   1   .   3   .   .   1   3   3   .   .   .   2   1   2 f89v1.b.2;L
    i chol             T p206   390   40  30  42  20  12  11  20  21  27   7   2   2   4   2   3   1   8  15  14  33   6  12   3   8   9   6   6   9  11   6 f101v1.T.10;F
    i okchor           S p124    26    2   7   6   .   2   2   2   .   1   1   .   .   .   .   .   .   .   .   1   .   .   .   .   .   .   1   .   .   .   1 f68r2.S.8;R
    i shol             T p081   180   17  18  12   9   1   7  25  11   6   9   2   1   6   3   5   2   2   4   7   7   4   4   3   3   5   1   1   4   1   . f42r.T3.23;F
    i shy              A p127   105    6  10  11  14   7   5   8   3   3   5   3   .   3   4   2   1   2   5   3   1   .   .   .   4   1   .   1   .   .   3 f68v2.R.3;C
    i shor             Z p136    93   10  15   6   6   8   5   7   7   2   2   .   1   .   1   .   1   6   1   3   4   .   2   2   .   1   .   .   1   .   2 f71v.S2.4;C
    i dy               Z p138   207    6  32  12  15   9  11  13  10  10   7  10  10   6   6   9   6   7   5   3   9   4   .   .   3   1   .   2   1   .   . f72r2.S.5;K

  I decided to improve my find-occurrences script so that it reports
  the actual string matched, as well as the pattern.  Then we can
  capture all variants of interesting labels, such as "otolor"...

97-12-05 stolfi
===============

  Still working on the new label reference maps.
  
  Rene sent me his VTX text-extraction tool, and a 
  a set of page-header lines of the form
  
    <f1r>          {$I=T $Q=A $P=A $L=A $H=1}
    <f1v>          {$I=H $Q=A $P=B $L=A $H=1}
    <f2r>          {$I=H $Q=A $P=C $L=A $H=1}
    <f2v>          {$I=H $Q=A $P=D $L=A $H=1}
  
  that are used by VTX to find the requested pages.  I added those
  lines in front of all the relevant files in L16-eva (the "page
  comments" files such as "f1r", not the text unit files such as
  "f1r.T").
  
  I also compared his data against my own index (L16-eva/INDEX), fixed
  some errors in the latter, and noted some discrepancies in the
  section codes.  (Basically, some of his sectins were assigned on the
  basis of the page's location in the bound book, rather than its
  contents.).

97-12-21 stolfi
===============

  While preparing the new label location maps (Note-010.html), I got
  curious about the colocates of some words.
  
  Let's start with "daiin" which is very common and almost as 
  frequent in both languages:
  
    compare-word-colocates \
      '\bdaiin\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
       84 daiin /                    19 / daiin                    19 / daiin
       46 / daiin                     5 daiin /                     9 daiin /
       30 chol daiin                  4 - daiin                     7 daiin chey
       23 daiin =                     4 daiin chedy                 6 daiin ol
       12 - daiin                     3 chckhy daiin                5 daiin chedy
       11 daiin cthy                  3 chedy daiin                 5 shey daiin
       10 daiin daiin                 3 daiin or                    4 daiin daiin
       10 shol daiin                  3 daiin otal                  4 daiin shedy
        9 chor daiin                  2 ar daiin                    4 daiin shey
        8 daiin -                     2 daiin chcthy                4 qokal daiin

  It is reassuring that in both languages "daiin" likes line-start and
  line-end positions.  However it is curious that in language B
  "daiin" favors line-starts, while in language A it prefers
  line-ends.
  
  Let's modify the code so that it ignores line breaks.
  Let's also map 't' to 'k', final [ao] to y,
  initial y or qy to o or qo:

    compare-word-colocates \
      '\bd[ao]iin\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
       30 chol daiin                  4 chckhy daiin                7 daiin chey
       23 daiin =                     4 daiin chedy                 6 daiin ol
       15 daiin ckhy                  4 daiin okal                  5 daiin chedy
       13 daiin daiin                 3 ar daiin                    5 daiin okedy
       11 daiin qokchy                3 chedy daiin                 5 qokal daiin
       10 shol daiin                  3 daiin okaiin                5 qoky daiin
        9 chor daiin                  3 daiin or                    5 shey daiin
        9 okol daiin                  3 daiin shody                 4 daiin daiin
        8 chy daiin                   3 okaiin daiin                4 daiin okaiin
        8 ckhy daiin                  2 daiin chckhy                4 daiin shedy
  
  Perhaps some of A's chol, shol, ckhy corresponds to B's chey, shey, chedy, shedy.
  
  Let's try with "okaiin":
  
    compare-word-colocates \
      '\b[q]*[oy]k[ao]iin\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds
  
    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        7 daiin okaiin                5 okaiin okaiin              22 shedy qokaiin
        5 chol okaiin                 4 okaiin =                   17 chedy qokaiin
        4 okaiin =                    3 daiin okaiin               17 qokaiin chedy
        4 okaiin ckhy                 3 okaiin chckhy              13 qokaiin shedy
        4 okaiin okaiin               3 okaiin daiin               12 qokaiin ol
        4 or okaiin                   3 okaiin okar                10 shey qokaiin
        3 okaiin daiin                2 aiin okaiin                 9 chey qokaiin
        3 okaiin s                    2 chckhy okaiin               9 okaiin shedy
        2 ckhor okaiin                2 chdy qokaiin                9 qokaiin checkhy
        2 ckhy qokaiin                2 dain okaiin                 8 qokaiin chckhy

  It may be that A's chol is B's chedy/shedy.

  Another word that is common in both languages is "okal":

    compare-word-colocates \
      '\b[q]*[oy][tk][oa][l]\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        9 okol daiin                  4 daiin okal                 15 qokal chedy
        8 qokol daiin                 3 chdy okal                  12 qokal shedy
        7 okol chol                   3 okal dar                    9 chedy qokal
        5 daiin qokol                 2 aiin okal                   9 qokeedy qokal
        3 ckhor okol                  2 chckhy okal                 9 shedy qokal
        3 dain okol                   2 okaiin okal                 7 qokal dar
        3 okal chol                   2 okal chedy                  7 qokedy qokal
        3 okol dol                    2 okal chody                  6 okal chedy
        3 shor okol                   2 okal dam                    5 qokal daiin
        2 chody okol                  2 okal okair                  5 qokal dy

  Here are the counts withot the k/t and o/y fixes:

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        6 otol chol                   3 daiin otal                 11 qokal chedy
        5 qokol daiin                 3 okal dar                    9 qokal shedy
        4 daiin qotol                 2 aiin okal                   8 shedy qokal
        4 otol daiin                  2 chckhy okal                 6 chedy qokal
        3 okol daiin                  2 chdy ykal                   6 qokeedy qokal
        3 qotol daiin                 2 okal chedy                  5 qokal daiin
        2 cho qokol                   2 okal okair                  4 okal chedy
        2 cthor otol                  2 okal shdy                   4 qokal dar
        2 daiin otal                  2 qokol chedy                 4 qotal chedy
        2 odaiin okal                 1 chcfhol okal                3 chedy qotal

  So it seems that A uses otol/okol where B uses okal/otal.
  It is tempting to identify A's chol with B's chedy/shedy.
  
  Let's try with "okar", which is also distributed fairly uniformly:
  
    compare-word-colocates \
      '\b[q]*[oy][tk][oa]r\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds
  
    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        3 daiin qokor                 6 qokar okar                  6 qokar shedy
        3 okor chor                   5 okar chdy                   6 qokeedy qokar
        3 qokor chor                  4 okar ar                     5 chedy qokar
        2 dain qokor                  4 okar or                     5 qokar ol
        2 dy qokor                    3 ar okar                     4 chckhy okar
        2 okor chey                   3 okaiin okar                 4 okar okedy
        2 oky okor                    3 okar chedy                  4 okar ol
        2 qokchy qokor                3 okar okedy                  4 okar shedy
        2 qokor chol                  3 okar ol                     4 shey qokar
        2 qokor daiin                 3 qokar chckhy                3 okar chedy
  
  Again, where A uses "or", B uses "ar".
  Perhaps A's qokchy is B's qokeedy ?

  Another fairly uniform word is "qokeey":
  
    compare-word-colocates \
      '\b[q]*[oy][kt][cse][eh][yo]\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds
  
    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
       11 daiin qokchy                3 chedy okeey                 8 qokeey qokedy
        8 okchy kchy                  2 keedy okeey                 6 qokeedy qokeey
        7 daiin okchy                 2 okchy okar                  6 shedy qokeey
        7 qokchy qokchy               2 okeey daiin                 4 chedy qokeey
        5 ckhy okchy                  2 okeey dar                   4 qokeey okeey
        5 okchy daiin                 2 r okeey                     4 qokeey raiin
        5 okeey daiin                 1 alfshe? okshy               3 dar qokeey
        5 qokchy daiin                1 arar okeey                  3 okeey qol
        5 qokchy kchy                 1 chees okeey                 3 qokeey daiin
        4 qokchy qoky                 1 chek qokchy                 3 qokeey qokaiin

  Here are the counts without the k/t and y/o fixes:

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        4 daiin qokchy                2 chedy okeey                 6 qokeey qokedy
        4 daiin qotchy                2 r yteey                     6 shedy qokeey
        4 qotchy qokchy               1 alfshe? okshy               4 chedy qokeey
        3 cthy otchy                  1 arar oteey                  3 dar qokeey
        3 okeey daiin                 1 chedy ykeey                 3 qokeey daiin
        3 qotchy daiin                1 chees oteey                 3 qokeey qokaiin
        3 qoteey daiin                1 chek qokchy                 3 qokeey raiin
        2 aiin qotchy                 1 cheody okeey                2 oteey qol
        2 choty qokchy                1 chfalas qokeey              2 pchedy qokeey
        2 daiin otchy                 1 cthy qokeey                 2 qokedy qokeey

  Note the near-repetition "qotchy qokchy" in A, and "qokeey qokedy" 
  or "qokedy qokeey" in B.

  Now "otam", also fairly uniform:
  
    compare-word-colocates \
      '\b[q]*[oy][tk][ao][mjg]\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        1 char okam                   2 chdam qokam                 1 chcphey qokam
        1 chol qokom                  2 daiin okam                  1 chedy qokam
        1 ckham okom                  2 qokar okam                  1 lchey qokam
        1 ckhor okam                  1 aiin okam                   1 okam olaiin
        1 dar okom                    1 akedy okam                  1 okar okam
        1 kal okam                    1 ar okam                     1 qokam chedy
        1 kchody qokam                1 chdar okam                  1 qokam okal
        1 okam =                      1 chdy okam                   1 qokam qokaiin
        1 okam chckh                  1 checkhy okam                1 qokam s
        1 okam chol                   1 chekeedy okam               1 qokam sol

  Can't say much...
  
  Next is "chey" also very uniform:
  
    compare-word-colocates \
      '\b[q]*[cse][eh]ey\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        5 chey kchy                   3 chey =                     10 shey qokaiin
        3 cheor chey                  2 dar chey                    9 chey qokaiin
        3 chey keey                   2 qoky chey                   7 daiin chey
        3 dar shey                    2 shey daiin                  7 qol chey
        3 dy shey                     2 shey qokaiin                6 qokaiin shey
        2 chey dam                    1 ar shey                     5 chey qokeedy
        2 chey kchol                  1 chdain shey                 5 ol shey
        2 chey keor                   1 chdar shey                  5 shey daiin
        2 chey kor                    1 che?dy chey                 5 shey qokedy
        2 chey kshey                  1 chedy chey                  5 shey qoky

  Not clear...
  
  The word "chckhey" is also fairly uniform:
  
    compare-word-colocates \
      '\b[cse][eh][ce][kt][he]ey\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        1 chain chckhey               1 ???in shckhey               2 daiin chckhey
        1 chckhey chor                1 chckhey =                   2 qokaiin chckhey
        1 chckhey daiin               1 chckhey choky               1 chckhey cheor
        1 chckhey okaiin              1 chckhey okchdy              1 chckhey dar
        1 chckhey okshy               1 chckhey or                  1 chckhey kedy
        1 chckhey ol                  1 dair shckhey                1 chckhey lchey
        1 chckhey qod                 1 kodaiin shckhey             1 chckhey ldy
        1 chkaiin shckhey             1 odain chckhey               1 chckhey qokeedy
        1 ckhol chckhey               1 okain chckhey               1 chckhey qokeeol
        1 ckhy chckhey                1 okam chckhey                1 chckhey saiin

  The uses of this word are too scattered for us to say anything useful.

  Another uniform word is "yshey":
  
    compare-word-colocates \
      '\b[q]*[oy][cse][eh]ey\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        1 chey qoeeey                 1 chdy ochey                  1 chealy oshey
        1 chydaiin ochey              1 dy ochey                    1 cheedy oshey
        1 ckhar ochey                 1 kchodain oeeey              1 dy ochey
        1 ckhy ochey                  1 lor ochey                   1 lcheey qochey
        1 daiin ochey                 1 ochey dar                   1 lor oshey
        1 dy ochey                    1 ochey kamar                 1 ochey kal
        1 ochey chol                  1 ochey oly                   1 ochey qokain
        1 ochey ckhos                 1 oeeey okaiin                1 okar oshey
        1 ochey kchokchy                                            1 ols oshey
        1 ochey kchos                                               1 oroly ochey

  Finally, let's try "or":
  
    compare-word-colocates \
      '\b[oya]r\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        5 ckhy or                     6 or aiin                     8 or shedy
        4 or okaiin                   4 okar ar                     4 or aiin
        3 ar al                       4 okar or                     3 or al
        3 chol or                     3 ar daiin                    2 chedy or
        3 or chol                     3 ar okar                     2 chekar or
        3 or chor                     3 daiin or                    2 dal or
        2 daiin or                    3 dar ar                      2 dar ar
        2 dol or                      3 kor or                      2 or chey
        2 okaiin or                   3 or ar                       2 or sheey
        2 or aiin                     2 ar aiin                     2 or shey
  
  Again, it seems that A's chol is B's chedy.
  Also A's okaiin seems to be B's aiin.
  
  Now a few random words:

    compare-word-colocates \
      '\b[cs]ho[rl]\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
       30 chol daiin                  3 chol kar                    2 qokaiin chol
       20 chol chol                   2 or chol                     2 qokol chol
       10 shol daiin                  1 arakaiin shol               2 shol kedy
        9 chor daiin                  1 chdaiin chol                1 ?chor or
        8 chol ckhol                  1 ches chol                   1 chcphey chol
        8 chol shol                   1 chkaiin chol                1 chey chol
        8 chor chol                   1 chol alaiin                 1 chol ar
        7 chol ckhy                   1 chol chckhy                 1 chol chedcheydaiin
        7 okol chol                   1 chol chky                   1 chol cheky
        6 chol chor                   1 chol dar                    1 chol chy

  Note the curious numeric coincidence in the first file.

    compare-word-colocates \
      '\b[cse][he]edy\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
                                      4 daiin chedy                22 shedy qokaiin
                                      3 chedy daiin                18 qol chedy
                                      3 chedy okedy                17 chedy qokaiin
                                      3 chedy okeey                17 qokaiin chedy
                                      3 okar chedy                 17 shedy qokedy
                                      3 shedy qokedy               15 chedy qol
                                      2 chedy chckhy               15 ol shedy
                                      2 chedy dal                  15 qokal chedy
                                      2 chedy dar                  15 shedy qokeedy
                                      2 chedy kedy                 13 ol chedy

    compare-word-colocates \
      '\b[ce][tk][eh][ao][rl]\b' \
      hea-f-eva.wds heb-f-eva.wds bio-f-eva.wds

    count hea-f-eva.wds           count heb-f-eva.wds           count bio-f-eva.wds         
    ----- ----------------------  ----- ----------------------  ----- ----------------------
        8 chol ckhol                  1 aiin ckhar                  1 ckhal saiin
        8 daiin ckhor                 1 ckhar od                    1 ckhol chedy
        7 daiin ckhol                 1 ckhol ol                    1 ckhol skar
        6 ckhol chol                  1 okaiin ckhol                1 ckhor chey
        5 ckhol daiin                                               1 ckhor olchdy
        3 chor ckhol                                                1 daiin ckhal
        3 chor ckhor                                                1 iin ckhor
        3 ckhol dy                                                  1 olshey ckhor
        3 ckhor chol                                                1 qokal ckhol
        3 ckhor okol                                                1 rkaiin ckhol

97-12-22 stolfi
===============

  It occurred to me that the labels should tell us a lot 
  about valid word prefixes and suffixes, since their
  word boundaries shoudl be more reliable than
  those defined by spaces in the manuscript.
  
  Another possible source for that information is the 
  words in line-initial and line-final position.
  
  Quick test:
  
    cat Note-010/labtit.evt \
      | egrep -v '\.T[0-9]*\.[0-9].*>' \
      > .labels-eva.evt
  
  Edited .labels-eva.evt, removing some non-labels 
  and garbage, producing .labels-s-eva.evt.
      
    extract-words-from-interlin \
      -chars 'aoeilmnrchtpkfsqjdvxyg' \
      .labels-s-eva.evt \
      .labels-s-eva
      
     lines   words     bytes file        
    ------ ------- --------- ------------
       327     745      3485 .labels-s-eva.txt
       754     754      3503 .labels-s-eva.wds
       357     357      2500 .labels-s-eva.dic
       410     410      2760 .labels-s-eva-gut.wds
       344     344      2419 .labels-s-eva-gut.dic
       334     334       668 .labels-s-eva-fun.wds
         3       3         6 .labels-s-eva-fun.dic
        10      10        75 .labels-s-eva-bad.wds
        10      10        75 .labels-s-eva-bad.dic

    Sample from .labels-s-eva.txt:

      otaik dak alak =
      otaldy =
      otoky =
      seeyar =
      ykas asy =
      sosainr =
      oteey dar =
      ytodaiir =

    Digraph counts:

           TT     .     /     =     a     o     e     i     l     m     n     r     c     h     t     p     k     f     s     q     j     d     y     g     ?     -
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
      .    86     .     .     .    32    13     .     .     .     .     .     .    11     .     2     .     .     .     9     .     .    12     4     2     1     .
      /     3     .     .     .     .     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .
      =   324     .     .     .     4   217     2     1     .     .     .     1    26     .     1     .     2     .    27     2     .    22    18     .     1     .
      a   327     .     .     4     .     1     1    51   105    28     5   107     .     .     1     .     6     .     5     .     4     4     2     1     2     .
      o   424     2     .     5    10     1     8     4    74     2     1    46    13     .   105    14    69    11    21     .     2    32     2     .     2     .
      e   130     .     .     .     8    39    32     1     .     1     .     2     1     .     5     3     4     2     6     .     .     5    19     1     1     .
      i   107     .     .     .     .     .     .    39     1     .    42    20     .     .     1     .     2     1     .     .     .     .     .     .     1     .
      l   184    14     .    45    31     9     6     3     .     .     .     1    10     .     .     .     5     .    14     .     .    19    22     4     .     1
      m    32     2     .    28     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     1     .     .
      n    48    10     1    27     3     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     1     1     3     .     .     1
      r   180    27     .    56    47    11     2     2     .     .     .     .     7     .     .     .     .     .     2     .     .     1    21     2     1     1
      c   109     .     .     .     .     .     .     .     .     .     .     .     .    95     4     4     4     2     .     .     .     .     .     .     .     .
      h   138     .     .     1    17    41    36     .     .     .     .     1     2     .     1     .     .     1     3     .     1    18    15     .     1     .
      t   129     1     .     .    51    31    21     .     .     .     .     1     9     4     .     .     .     .     1     .     .     1     8     .     1     .
      p    23     .     .     .     5     4     1     .     .     .     .     .     8     4     .     .     .     .     1     .     .     .     .     .     .     .
      k   104     2     .     1    37    22    17     .     2     .     .     .     8     4     .     .     .     .     1     .     .     .     9     .     1     .
      f    18     1     .     1     8     2     .     .     .     .     .     .     3     2     .     .     .     .     .     .     .     .     1     .     .     .
      s   102     8     .    12    21    14     3     3     .     .     .     .     1    29     .     .     1     .     .     .     .     1     7     .     .     2
      q     2     .     .     .     .     1     .     .     .     .     .     .     .     .     .     .     1     .     .     .     .     .     .     .     .     .
      j     8     1     .     6     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     1     .     .     .
      d   130     1     .     7    49    13     .     3     .     1     .     .     7     .     .     .     .     .     1     .     .     .    48     .     .     .
      y   192    17     1   127     1     1     .     .     .     .     .     .     1     .     8     2    10     1     7     .     .    13     1     .     .     2
      g    11     .     .     3     2     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     .     6     .     .     .
      ?    17     .     .     2     1     1     1     .     2     .     .     .     .     .     1     .     .     .     1     .     .     .     3     .     5     .
      -     7     .     .     .     .     1     .     .     .     .     .     .     1     .     .     .     .     .     3     .     .     1     1     .     .     .
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT  2835    86     3   324   327   424   130   107   184    32    48   180   109   138   129    23   104    18   102     2     8   130   192    11    17     7

    Next-symbol probability (× 99):

        TT  .  /  =  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  j  d  y  g  ?  -
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      . 99  .  .  . 37 15  .  .  .  .  .  . 13  .  2  .  .  . 10  .  . 14  5  2  1  .
      / 99  .  .  .  . 66  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 33  .  .  .
      = 99  .  .  .  1 66  1  .  .  .  .  .  8  .  .  .  1  .  8  1  .  7  6  .  .  .
      a 99  .  .  1  .  .  . 15 32  8  2 32  .  .  .  .  2  .  2  .  1  1  1  .  1  .
      o 99  .  .  1  2  .  2  1 17  .  . 11  3  . 25  3 16  3  5  .  .  7  .  .  .  .
      e 99  .  .  .  6 30 24  1  .  1  .  2  1  .  4  2  3  2  5  .  .  4 14  1  1  .
      i 99  .  .  .  .  .  . 36  1  . 39 19  .  .  1  .  2  1  .  .  .  .  .  .  1  .
      l 99  8  . 24 17  5  3  2  .  .  .  1  5  .  .  .  3  .  8  .  . 10 12  2  .  1
      m 99  6  . 87  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .  3  .  .
      n 99 21  2 56  6  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  2  2  6  .  .  2
      r 99 15  . 31 26  6  1  1  .  .  .  .  4  .  .  .  .  .  1  .  .  1 12  1  1  1
      c 99  .  .  .  .  .  .  .  .  .  .  .  . 86  4  4  4  2  .  .  .  .  .  .  .  .
      h 99  .  .  1 12 29 26  .  .  .  .  1  1  .  1  .  .  1  2  .  1 13 11  .  1  .
      t 99  1  .  . 39 24 16  .  .  .  .  1  7  3  .  .  .  .  1  .  .  1  6  .  1  .
      p 99  .  .  . 22 17  4  .  .  .  .  . 34 17  .  .  .  .  4  .  .  .  .  .  .  .
      k 99  2  .  1 35 21 16  .  2  .  .  .  8  4  .  .  .  .  1  .  .  .  9  .  1  .
      f 99  6  .  6 44 11  .  .  .  .  .  . 17 11  .  .  .  .  .  .  .  .  6  .  .  .
      s 99  8  . 12 20 14  3  3  .  .  .  .  1 28  .  .  1  .  .  .  .  1  7  .  .  2
      q 99  .  .  .  . 50  .  .  .  .  .  .  .  .  .  . 50  .  .  .  .  .  .  .  .  .
      j 99 12  . 74  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 12  .  .  .
      d 99  1  .  5 37 10  .  2  .  1  .  .  5  .  .  .  .  .  1  .  .  . 37  .  .  .
      y 99  9  1 65  1  1  .  .  .  .  .  .  1  .  4  1  5  1  4  .  .  7  1  .  .  1
      g 99  .  . 27 18  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 54  .  .  .
      ? 99  .  . 12  6  6  6  . 12  .  .  .  .  .  6  .  .  .  6  .  .  . 17  . 29  .
      - 99  .  .  .  . 14  .  .  .  .  .  . 14  .  .  .  .  . 42  .  . 14 14  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99  3  0 11 11 15  5  4  6  1  2  6  4  5  5  1  4  1  4  0  0  5  7  0  1  0

    Previous-symbol probability (× 99):

        TT  .  /  =  a  o  e  i  l  m  n  r  c  h  t  p  k  f  s  q  j  d  y  g  ?  -
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      .  3  .  .  . 10  3  .  .  .  .  .  . 10  .  2  .  .  .  9  .  .  9  2 18  6  .
      /  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .
      = 11  .  .  .  1 51  2  1  .  .  .  1 24  .  1  .  2  . 26 99  . 17  9  .  6  .
      a 11  .  .  1  .  .  1 47 56 87 10 59  .  .  1  .  6  .  5  . 50  3  1  9 12  .
      o 15  2  .  2  3  .  6  4 40  6  2 25 12  . 81 60 66 61 20  . 25 24  1  . 12  .
      e  5  .  .  .  2  9 24  1  .  3  .  1  1  .  4 13  4 11  6  .  .  4 10  9  6  .
      i  4  .  .  .  .  .  . 36  1  . 87 11  .  .  1  .  2  6  .  .  .  .  .  .  6  .
      l  6 16  . 14  9  2  5  3  .  .  .  1  9  .  .  .  5  . 14  .  . 14 11 36  . 14
      m  1  2  .  9  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  9  .  .
      n  2 12 33  8  1  .  .  .  .  .  .  1  .  .  .  .  .  .  .  . 12  1  2  .  . 14
      r  6 31  . 17 14  3  2  2  .  .  .  .  6  .  .  .  .  .  2  .  .  1 11 18  6 14
      c  4  .  .  .  .  .  .  .  .  .  .  .  . 68  3 17  4 11  .  .  .  .  .  .  .  .
      h  5  .  .  .  5 10 27  .  .  .  .  1  2  .  1  .  .  6  3  . 12 14  8  .  6  .
      t  5  1  .  . 15  7 16  .  .  .  .  1  8  3  .  .  .  .  1  .  .  1  4  .  6  .
      p  1  .  .  .  2  1  1  .  .  .  .  .  7  3  .  .  .  .  1  .  .  .  .  .  .  .
      k  4  2  .  . 11  5 13  .  1  .  .  .  7  3  .  .  .  .  1  .  .  .  5  .  6  .
      f  1  1  .  .  2  .  .  .  .  .  .  .  3  1  .  .  .  .  .  .  .  .  1  .  .  .
      s  4  9  .  4  6  3  2  3  .  .  .  .  1 21  .  .  1  .  .  .  .  1  4  .  . 28
      q  0  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .
      j  0  1  .  2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .
      d  5  1  .  2 15  3  .  3  .  3  .  .  6  .  .  .  .  .  1  .  .  . 25  .  .  .
      y  7 20 33 39  .  .  .  .  .  .  .  .  1  .  6  9 10  6  7  .  . 10  1  .  . 28
      g  0  .  .  1  1  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .
      ?  1  .  .  1  .  .  1  .  1  .  .  .  .  .  1  .  .  .  1  .  .  .  2  . 29  .
      -  0  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  3  .  .  1  1  .  .  .
        -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

    Symbol entropy: 3.995

    Next-symbol entropy: 2.452

  Splitting prefix/midfix/suffix:
  
    cat .labels-s-eva-gut.wds \
      | sed \
          -e 's/sh/X/g' \
          -e 's/$/}/' \
          -e 's/^/{/' \
          -e 's/{\([qoaydirslmngj][qoaydirslmngj]*\)/\1{/' \
          -e 's/\([qoaydirslmngj][qoaydirslmngj]*\)}/}\1/' \
          -e 's/X/sh/g' \
          -e 's/{}/\./' \
          -e 's/\.//g' \
          -e 's/{/- -/' \
          -e 's/}/- -/' \
      > .labels-s-eva.fwd
      
    cat .labels-s-eva.fwd \
      | grep -v -e '- -' \
      > .labels-s-unifs-all.wds

    cat .labels-s-eva.fwd \
      | grep -e '- -' \
      | gawk '/./ {print $1}' \
      > .labels-s-prefs-all.wds

    cat .labels-s-eva.fwd \
      | grep -e '- -' \
      | gawk '/./ {print $2}' \
      > .labels-s-midfs-all.wds

    cat .labels-s-eva.fwd \
      | grep -e '- -' \
      | gawk '/./ {print $3}' \
      > .labels-s-suffs-all.wds
      
    dicio-wc .labels-s-{prefs,midfs,suffs,unifs}-all.wds
    
     lines   words     bytes file        
    ------ ------- --------- ------------
       312     312       940 .labels-s-prefs-all.wds
       312     312      1673 .labels-s-midfs-all.wds
       312     312      1488 .labels-s-suffs-all.wds
        98      98       531 .labels-s-unifs-all.wds

    foreach f ( prefs midfs suffs unifs )
      cat .labels-s-${f}-all.wds \
        | sort | uniq -c | expand | sort +0 -1nr \
        > .labels-s-${f}-all.frq
    end
      
    dicio-wc .labels-s-{prefs,midfs,suffs,unifs}-all.frq

     lines   words     bytes file        
    ------ ------- --------- ------------
        32      64       396 .labels-s-prefs-all.frq
        87     174      1317 .labels-s-midfs-all.frq
       118     236      1621 .labels-s-suffs-all.frq
        79     158      1095 .labels-s-unifs-all.frq

    pr -m -w 64 -e -t \
        .labels-s-{prefs,midfs,suffs,unifs}-all.frq \
      | expand \
      > .labels-s-joint-all.frq

    freq prefix     freq midfix     freq suffix     freq unifix
    ---- --------   ---- --------   ---- --------   ---- --------
     194 o-           66 -t-          41 -y            6 am
      54 -            55 -k-          19 -ol           6 ar
      19 y-           25 -ch-         17 -ar           3 ary
       8 ol-          13 -che-        16 -al           2 dy
       4 d-           13 -te-         14 -or           2 gy
       3 dy-           8 -sh-         11 -ody          2 odor
       2 a-            7 -f-          10 -dy           2 sal
       2 da-           7 -tch-         9 -aly          2 sar
       2 dar-          6 -ke-          7 -             2 sary
       2 so-           6 -kee-         6 -os           2 siiir
       1 adair-        6 -pch-         5 -alar         1 aiin
       1 al-           5 -p-           4 -aiin         1 ainaly
       1 ala-          5 -she-         4 -ain          1 ainam
       1 alam-         4 -kch-         4 -air          1 airar
       1 ali-          3 -tok-         4 -aram         1 al
       1 arar-         3 -tolch-       4 -ary          1 alols
       1 aro-          2 -chet-        4 -dal          1 aly
       1 do-           2 -cph-         4 -oldy         1 araly
       1 dol-          2 -cth-         4 -orain        1 arar
       1 il-           2 -ee-          3 -am           1 araydy
       1 oal-          2 -fch-         3 -o            1 arody
       1 oar-          2 -pche-        3 -odar         1 asy
       1 or-           2 -talsh-       3 -oly          1 daiin
       1 oyd-          2 -tare-        3 -r            1 daiindy
       1 q-            2 -tee-         3 -s            1 dainy
       1 qo-           1 -cfh-         2 -alaiin       1 dal
       1 s-            1 -chckhe       2 -aldy         1 dalary
       1 siiir-        1 -chee-        2 -alody        1 daliir
       1 soi-          1 -cheeee       2 -an           1 dalsy
       1 sol-          1 -chek-        2 -araiin       1 dan
       1 yd-           1 -cheoct       2 -aral         1 dar
       1 yy-           1 -chep-        2 -as           1 daramga
                       1 -chete-       2 -d            1 dararai
                       1 -chf-         2 -oaiin        1 dariiir
                       1 -choee-       2 -oaly         1 dary
                       1 -chof-        2 -olar         1 diin
                       1 -chok-        2 -ols          1 dolaj
                       1 -cholsh       2 -om           1 dolaram
                       1 -chosar       2 -yd           1 dolary
                       1 -chotee       1 -aday         1 dolory
                       1 -ckhe-        1 -ainy         1 doly
                       1 -cphe-        1 -airdy        1 dydarii
                       1 -e-           1 -airy         1 oaiin
                       1 -eep-         1 -aj           1 odaiin
                       1 -ekeee-       1 -ala          1 odaiir
                       1 -eoe-         1 -alain        1 odiiir
                       1 -eolale       1 -alal         1 odory
                       1 -et-          1 -alalg        1 ody
                       1 -faef-        1 -alaly        1 oin
                       1 -fche-        1 -alam         1 olaran
                       1 -fysk-        1 -ald          1 olaras
                       1 -karch-       1 -aldar        1 oldam
                       1 -kche-        1 -aldm         1 oldar
                       1 -kchoch       1 -aldo         1 onary
                       1 -kchsh-       1 -algar        1 oral
                       1 -keech-       1 -aloiir       1 orald
                       1 -keee-        1 -alrar        1 oram
                       1 -keeep-       1 -alsain       1 orar
                       1 -kocfh-       1 -alsy         1 oraraly
                       1 -kocth-       1 -alyd         1 oroj
                       1 -koee-        1 -any          1 orol
                       1 -kolsh-       1 -ao           1 osaro
                       1 -kshdch       1 -aralar       1 salal
                       1 -kydse-       1 -araldy       1 saldam
                       1 -pee-         1 -aralgy       1 saloiin
                       1 -pocph-       1 -arar         1 salols
                       1 -psh-         1 -aro          1 soaiin
                       1 -shch-        1 -dagy         1 sodar
                       1 -shockh       1 -daiir        1 solsy
                       1 -sholsh       1 -dajy         1 soly
                       1 -taik-        1 -dar          1 sorala
                       1 -tak-         1 -din          1 sororal
                       1 -takaik       1 -dorgy        1 sorory
                       1 -talch-       1 -g            1 sosainr
                       1 -talef-       1 -iir          1 sydarar
                       1 -talek-       1 -lairgy       1 sysam
                       1 -tche-        1 -ldam         1 y
                       1 -tchosh       1 -m            1 yorain
                       1 -teee-        1 -oaldy        1 ys
                       1 -tockh-       1 -odady  
                       1 -toee-        1 -odaiin 
                       1 -tolcht       1 -odaiir 
                       1 -tooee-       1 -odals  
                       1 -torche       1 -odol   
                       1 -tose-        1 -oj     
                       1 -tosh-        1 -olaiin 
                       1 -tshsh-       1 -olam   
                                       1 -olarol 
                                       1 -oldain 
                                       1 -olg    
                                       1 -olinj  
                                       1 -oloara 
                                       1 -olor   
                                       1 -ora    
                                       1 -orad   
                                       1 -oraj   
                                       1 -oraldy 
                                       1 -oram   
                                       1 -orol   
                                       1 -ory    
                                       1 -osal   
                                       1 -osam   
                                       1 -osar   
                                       1 -osarar 
                                       1 -osdy   
                                       1 -oys    
                                       1 -ral    
                                       1 -sas    
                                       1 -sody   
                                       1 -sos    
                                       1 -sy     
                                       1 -yar    
                                       1 -yda    
                                       1 -ydal   
                                       1 -ydary  
                                       1 -ydy    
                                       1 -ys     
                                       1 -ysam   

               
                    labels      herbal-A    herbal-B
                    ----------- ----------- -----------
    {o-,y-,a-}       215 (69%)  1234 (21%)   715 (29%)
    {-}               54 (17%)  3656 (61%)  1234 (50%)
    {qo-}              1 (0.3%)  603 (10%)   300 (12%)
    {ol-,al-}          9 (2.8%)   35 (0.6%)   62 (2.5%)
    {dy-,da-,do}       6 (1.9%)   22 (0.4%)   13 (0.5%)
    {d-}               4 (1.3%)  201 (3.3%)   35 (1.4%)

  There are also a few "micro-complex" prefixes with 1-2 occurrences each.
  
  The frequencies are roughly similar to the text, except that
  
    * the empty prefix got supplanted by {o-,y-,a-}: the frequencies
      are 50% and 29% in B text, 61% and 21% in A text,
      17% and 67% in labels.
      
    * On the other hand, the qo- prefix ios practically non-existent
      in labels: 1 occurrence. (There is also an occurrence of "q-" alone,
      perhaps a transcription error?)
         <f67v2.L3.1;C>     qokoaiin.ockhey={Label on line West from central square}
         <f89r1.t.2;L>      qkol={pharma label}
      In contrast, qo- occurs on 10% of herbal-A words, and 12% of herbal-B
      words.  This is all the more remarkable given the increased frequency 
      of the o- prefix in labels.
      
  The low frequency of "qo-"s in labels confirms the thesis that "q"
  is not part of the word, but a prefixed particle (article,
  conjuntion, preposition).

  The enhanced frequency of {o-,y-,a-} suggests that words with that
  prefix are nouns, or that the prefix is an article.  We should
  compare the occurrences of a tailfix with and without the q-, o-,
  and qo- prefixes...
  
  The midfixes are dominated by {-t-,-k-} which again is a 
  characteristic of herbal-B.  (In herbal-A the midfixes 
  {-ch-,-sh-} are twice as common as {-k-,-t-}.  Compare:

    by many      by Friedman  by Currier     by Friedman  by Currier          
    Labels       language B   language B     language A   language A          
    freq midfix  freq midfix  freq midfix    freq midfix  freq pc midfix
    ---- ------  ---- ------  ---- --------- ---- ------  ---- -- ------
      66 -t-      407 -k-      288 -k-       1045 -ch-     985 19 -ch-  
      55 -k-      183 -t-      155 -ke-       526 -sh-     470  9 -sh-  
      25 -ch-     179 -ke-     138 -che-      469 -k-      438  8 -k-   
      13 -che-    172 -ch-     127 -ch-       444 -t-      427  8 -t-   
      13 -te-     163 -che-    116 -t-        353 -cth-    298  6 -cth- 
       8 -sh-     110 -she-     88 -kee-      335 -tch-    280  5 -tch- 
       7 -f-      101 -kee-     85 -te-       297 -kch-    260  5 -kch- 
       7 -tch-     95 -te-      75 -she-      251 -che-    201  4 -che- 
     ... ...      ... ...      ... ...        ... ...       ... ...

  Note however the difference in the relative frequencies of -k-
  versus -t-: 10:12 in labels, 10:5 in B text, 10:9 in A text.
  Moreover, -te- replaces -ke- as the most common "e"-modified midfix.
  These numbers supports the thesis that -t- and -k- are merely
  variant shapes of the same letter, with -t- being more formal and
  -k- more cursive.
                                
  As for suffixes, here is a summary:
  
                        labels      A-text      B-text
                        ----------- ----------- -----------
     {-y,-o}              44 (14%)  2200 (37%)  583 (23%)
     {-ol,-al,-or,-ar}    67 (21%)  1886 (32%)  400 (16%)
     -                     7 (2.2%)  124 (2.1%)  63 (2.5%)
     -ody                 11 (3.5%)  218 (3.7%) 111 (4.5%)
     -dy                  10 (3.2%)   23 (0.4%) 639 (11%)
     -aiin                 4 (1.2%)  316 (5.3%) 143 (5.8%)
     
  The frequencies of {-ol,-al,-or,-ar}, {-}, and {-dy} seem roughly
  consistent with a mixture of A and B text.  In particular the -y:-dy
  ratio is 90:21, which lies between the ratios for A text (90:1) and
  B text (90:105).  However, the frequencies of {-y,-o} and {-ody} are
  a bit too low, and that of {-aiin} is significantly lower.  In fact
  the tail of the distribution is longer than that of midfixes,
  whereas in the text the midfixes have a much longer tail.
  
  Perhaps these observations can be explained by selective omission or
  insertion of spaces in the text vs. labels.
  
97-12-23 stolfi
===============

  Now extract line-initial and line-final words:
  
    foreach lang ( a b )
      cat he${lang}-f-eva.wds \
        | gawk 'BEGIN {s=1} /[\/=]/ {s=1;next}; /./ {if(s)print; s=0}' \
        > he${lang}-f-bol.wds
      cat he${lang}-f-eva.wds \
        | gawk 'BEGIN {w=""} /[\/=]/ {if(w!=""){print w;w=""};next}; /./ {w=$0}' \
        > he${lang}-f-eol.wds
    end
        
    dicio-wc he{a,b}-f-{bol,eol}.wds
    
     lines   words     bytes file        
    ------ ------- --------- ------------
      1216    1216      7610 hea-f-bol.wds
      1216    1216      7010 hea-f-eol.wds
       362     362      2353 heb-f-bol.wds
       362     362      2039 heb-f-eol.wds

    foreach lang ( a b )
      foreach ext ( bol eol )
        cat he${lang}-f-${ext}.wds \
          | sed \
              -e 's/sh/X/g' \
              -e 's/$/}/' \
              -e 's/^/{/' \
              -e 's/{\([qoaydirslmngj][qoaydirslmngj]*\)/\1{/' \
              -e 's/\([qoaydirslmngj][qoaydirslmngj]*\)}/}\1/' \
              -e 's/X/sh/g' \
              -e 's/{}/\./' \
              -e 's/\.//g' \
              -e 's/{/- -/' \
              -e 's/}/- -/' \
          > .he${lang}-f-${ext}.fwd
      end
    end
      
     lines   words     bytes file        
    ------ ------- --------- ------------
      1216    3152     13418 .hea-f-bol.fwd
      1216    2668     11366 .hea-f-eol.fwd
       362     926      4045 .heb-f-bol.fwd
       362     792      3329 .heb-f-eol.fwd

    foreach lang ( a b )
      foreach ext ( bol eol )
        cat .he${lang}-f-${ext}.fwd \
          | grep -v -e '- -' \
          > .he${lang}-f-${ext}-unifs-all.wds
        cat .he${lang}-f-${ext}.fwd \
          | grep -e '- -' \
          | gawk '/./ {print $2}' \
          > .he${lang}-f-${ext}-midfs-all.wds
      end

      cat .he${lang}-f-bol.fwd \
        | grep -e '- -' \
        | gawk '/./ {print $1}' \
        > .he${lang}-f-bol-prefs-all.wds

      cat .he${lang}-f-eol.fwd \
        | grep -e '- -' \
        | gawk '/./ {print $3}' \
        > .he${lang}-f-eol-suffs-all.wds
    end
      
    dicio-wc .he{a,b}-f-[be]ol-{prefs,midfs,suffs,unifs}-all.wds
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       968     968      2737 .hea-f-bol-prefs-all.wds
       282     282       754 .heb-f-bol-prefs-all.wds

       968     968      5645 .hea-f-bol-midfs-all.wds
       282     282      1673 .heb-f-bol-midfs-all.wds
       726     726      4085 .hea-f-eol-midfs-all.wds
       215     215      1166 .heb-f-eol-midfs-all.wds

       726     726      3128 .hea-f-eol-suffs-all.wds
       215     215       909 .heb-f-eol-suffs-all.wds

       248     248      1176 .hea-f-bol-unifs-all.wds
       490     490      2302 .hea-f-eol-unifs-all.wds
        80      80       400 .heb-f-bol-unifs-all.wds
       147     147       665 .heb-f-eol-unifs-all.wds

    foreach f ( .he[ab]-f-[eb]ol-{prefs,midfs,suffs,unifs}-all.wds )
      cat ${f} \
        | sort | uniq -c | expand | sort +0 -1nr \
        > ${f:r}.frq
    end
      
    dicio-wc .he[ab]-f-[eb]ol-{prefs,midfs,suffs,unifs}-all.frq

     lines   words     bytes file        
    ------ ------- --------- ------------
       169     338      2661 .hea-f-bol-midfs-all.frq
        34      68       419 .hea-f-bol-prefs-all.frq
       119     238      1852 .hea-f-eol-midfs-all.frq
       110     220      1486 .hea-f-eol-suffs-all.frq
        77     154      1180 .heb-f-bol-midfs-all.frq
        22      44       259 .heb-f-bol-prefs-all.frq
        59     118       880 .heb-f-eol-midfs-all.frq
        54     108       722 .heb-f-eol-suffs-all.frq
        93     186      1203 .hea-f-bol-unifs-all.frq
       149     298      1986 .hea-f-eol-unifs-all.frq
        35      70       464 .heb-f-bol-unifs-all.frq
        82     164      1061 .heb-f-eol-unifs-all.frq

    pr -m -w 80 -e -t \
        .labels-s-prefs-all.frq \
        .he{a,b}-f-bol-prefs-all.frq  \
        Note-009/he{a,b}-f-prefs-all.frq \
      | expand \
      > .prefs-joint.frq

    all labels      herbal-A bol    herbal-B bol    herbal-A all    herbal-B all
    freq prefix     freq prefix     freq prefix     freq prefix     freq prefix     
    ---- ---------  ---- ---------  ---- ---------  ---- ---------  ---- ---------  
     194 o-          384 -           136 -          3656 -          1234 -
      54 -           154 o-           58 y-          807 o-          490 o-
      19 y-          141 qo-          22 qo-         603 qo-         300 qo-
       8 ol-         138 y-           19 d-          424 y-          216 y-
       4 d-           71 d-           17 o-          201 d-           57 ol-
       3 dy-          17 s-            9 l-           55 s-           35 d-
       2 a-           11 so-           6 ol-          33 ol-          26 l-
       2 da-           7 oy-           1 a-           20 so-          10 dy-
       2 dar-          6 ol-           1 al-          15 l-            9 a-
       2 so-           4 l-            1 ara-         13 dy-           6 s-
       1 adair-        4 yo-           1 dy-          12 r-            5 al-
       1 al-           3 dy-           1 lo-          10 oy-           3 a:i-
       1 ala-          3 or-           1 lqo-          9 or-           3 dal-
       1 alam-         2 od-           1 q-            6 da:i-         3 lo-
       1 ali-          2 os-           1 qol-          6 do-           2 da:i-
       1 arar-         2 q-            1 r-            6 od-           2 do-
       1 aro-          2 yd-           1 s-            5 os-           2 or-
       1 do-           1 dain-         1 sol-          5 yo-           2 qol-
       1 dol-          1 dls-          1 ss-           4 qod-          2 r-
       1 il-           1 dor-          1 yd-           4 ro-           1 a:ii-
       1 oal-          1 i-            1 yo-           4 sol-          1 ad-
       1 oar-          1 lor-          1 yol-          3 a-            1 ao-
       1 or-           1 oldai-                        3 da-           1 ar-
       1 oyd-          1 ols-                          3 dol-          1 ara-
       1 q-            1 oor-                          3 lo-           1 da-
       1 qo-           1 oqo-                          3 sy-           1 dalo-
       1 s-            1 oso-                          3 yd-           1 dol-
       1 siiir-        1 qod-                          2 al-           1 dor-
       1 soi-          1 qoo-                          2 da:in-        1 lol-
       1 sol-          1 qy-                           2 dal-          1 lqo-
       1 yd-           1 ro-                           2 dor-          1 o:n-
       1 yy-           1 syd-                          2 old-          1 od-
                       1 ydarai-                       2 qoda:i-       1 olo-
                       1 yol-                          1 :i-           1 orol-
                                                       1 :iiin-        1 oy-
                                                       1 a:i-          1 sa:i-
                                                       1 ar-           1 say-
                                                       1 da:iinr       1 sol-
                                                       1 dao-          1 ss-
                                                       1 dar-          1 sy-
                                                       1 darod-        1 yd-
                                                       1 day-          1 yo-
                                                       1 dl-           1 yol-
                                                       1 dls-    
                                                       1 ds-     
                                                       1 dyo-    
                                                       1 lol-    
                                                       1 lor-    
                                                       1 ls-     
                                                       1 oda:i-  
                                                       1 olda:i- 
                                                       1 ols-    
                                                       1 oly-    
                                                       1 oo-     
                                                       1 oor-    
                                                       1 oqo-    
                                                       1 ora-    
                                                       1 ory-    
                                                       1 oso-    
                                                       1 qol-    
                                                       1 qoo-    
                                                       1 qor-    
                                                       1 qos-    
                                                       1 qoy-    
                                                       1 rolo-   
                                                       1 sa-     
                                                       1 sa:i-   
                                                       1 soo-    
                                                       1 syd-    
                                                       1 ydara:i 
                                                       1 yol-    
                                                       1 yr-     

  Comparing the beginning-of-line statistics with those of all words, we can see
  that:
  
    * In language A, the ratio y-/o- changes from 0.90 to 0.53;
      whereas in language B the ratio changes from 3.41 to 0.44.
      Yet another argument for the thesis that y- is merely 
      a more ornate form of o-.
      
    * Otherwise, the major prefix frequencies seem roughly the same.
      Which is encouraging, since it says that line breaks
      and word spaces are similar. If line breaks are 
      true word boundaries, then the same is true of most spaces.
      
    * The bol sample has a smaller set of prefixes, but that 
      seems to be just about the expected number given
      the ratio of sample sizes (1:6).
      
    * The prefix frequencies in labels are significantly different
      from those in any of the four word samples.
    
  Now for the suffixes:

    pr -m -w 80 -e -t \
        .labels-s-suffs-all.frq \
        .he{a,b}-f-eol-suffs-all.frq  \
        Note-009/he{a,b}-f-suffs-all.frq \
      | expand \
      > .suffs-joint.frq

  
    all labels      herbal-A eol    herbal-B eol    herbal-A all    herbal-B all
    freq suffix     freq suffix     freq suffix     freq suffix     freq suffix     
    ---- ---------  ---- ---------  ---- ---------  ---- ---------  ---- ---------  
      41 -y          190 -y           53 -y         1816 -y          639 -dy
      19 -ol          58 -aiin        31 -am         903 -ol         533 -y
      17 -ar          41 -ody         30 -dy         705 -or         168 -ar
      16 -al          29 -am           9 -aiin       360 -o          143 -aiin
      14 -or          29 -ol           8 -           316 -aiin       111 -ody
      11 -ody         28 -om           8 -ar         218 -ody         97 -ol
      10 -dy          28 -or           6 -ain        174 -ar          84 -al
       9 -aly         20 -             6 -al         124 -            63 -
       7 -            20 -al           6 -ody        104 -al          51 -or
       6 -os          18 -ar           5 -ary         76 -odaiin      44 -o
       5 -alar        14 -oldy         3 -daiin       76 -s           40 -am
       4 -aiin        13 -ory          3 -dam         71 -os          34 -daiin
       4 -ain         12 -dy           2 -ald         69 -od          33 -d
       4 -air         11 -ain          2 -ardam       66 -om          31 -os
       4 -aram        11 -an           2 -d           61 -am          30 -s
       4 -ary         11 -oly          2 -ol          49 -oiin        28 -dar
       4 -dal          9 -od           2 -or          48 -ain         26 -ain
       4 -oldy         8 -odaiin       1 -a           40 -oldy        19 -od
       4 -orain        8 -os           1 -aiily       35 -oy          15 -air
       3 -am           8 -s            1 -ainqod      29 -an          14 -dal
       3 -o            6 -a            1 -alaiin      29 -oly         11 -aly
       3 -odar         6 -o            1 -alam        27 -ory         10 -aldy
       3 -oly          5 -aldy         1 -alas        26 -odar         9 -odaiin
       3 -r            5 -old          1 -aldy        24 -a            7 -dain
       3 -s            5 -ordy         1 -aly         23 -dy           7 -dam
       2 -alaiin       5 -yd           1 -amdy        15 -odal         6 -a
       2 -aldy         5 -ydy          1 -ara         14 -oaiin        6 -ary
       2 -alody        4 -ald          1 -aram        14 -yd           6 -odar
       2 -an           4 -ary          1 -arar        12 -d            6 -oy
       2 -araiin       4 -d            1 -ardy        12 -n            5 -dair
       2 -aral         4 -m            1 -aro         12 -ydy          4 -dol
       2 -as           4 -oaiin        1 -aros        10 -air          4 -oar
       2 -d            4 -oy           1 -da          10 -l            4 -odal
       2 -oaiin        4 -ys           1 -daly        10 -odain        4 -oldy
       2 -oaly         3 -odain        1 -dol         10 -ols          4 -sy
       2 -olar         3 -odal         1 -dolaii       9 -ordy         3 -araiin
       2 -ols          3 -odar         1 -dydy         9 -sy           3 -aral
       2 -om           3 -oiin         1 -dym          8 -aldy         3 -arar
       2 -yd           3 -olo          1 -m            8 -odol         3 -ardy
       1 -aday         3 -ols          1 -o            8 -olol         3 -as
       1 -ainy         3 -yds          1 -oam          8 -r            3 -daly
       1 -airdy        2 -dm           1 -odydy        7 -ady          3 -dor
       1 -airy         2 -n            1 -odys         7 -ald          3 -dydy
       1 -aj           2 -odody        1 -old          7 -old          3 -oly
       1 -ala          2 -olm          1 -olody        7 -olo          3 -ydy
       1 -alain        2 -olol         1 -ols          6 -aiir         2 -adaiin
       1 -alal         1 -adaiin       1 -oram         6 -aly          2 -ady
       1 -alalg        1 -aiiin        1 -orar         6 -ary          2 -ainr
       1 -alaly        1 -aiim         1 -rodal        6 -oar          2 -ald
       1 -alam         1 -aiind        1 -saiin        6 -odam         2 -amdy
       1 -ald          1 -aiiny        1 -san          6 -olor         2 -an
       1 -aldar        1 -air          1 -ym           6 -ydaiin       2 -aram
       1 -aldm         1 -alod         1 -yom          5 -as           2 -ardam
       1 -aldo         1 -alody        1 -yoram        5 -m            2 -da
       1 -algar        1 -arar                         5 -odaly        2 -dody
       1 -aloiir       1 -ardl                         5 -odan         2 -l
       1 -alrar        1 -ariin                        5 -on           2 -oiin
       1 -alsain       1 -arm                          5 -oror         2 -ols
       1 -alsy         1 -aro                          5 -ys           2 -oody
       1 -alyd         1 -aroiin                       4 -ay           2 -oram
       1 -any          1 -arom                         4 -daiin        2 -orar
       1 -ao           1 -aryd                         4 -dal          2 -so
       1 -aralar       1 -as                           4 -oal          2 -yl
       1 -araldy       1 -da                           4 -oiiin        1 -ad
       1 -aralgy       1 -daiin                        4 -oraiin       1 -ai:dy
       1 -arar         1 -dal                          4 -osy          1 -aiiin
       1 -aro          1 -dam                          3 -adaiin       1 -aiily
       1 -dagy         1 -ds                           3 -dain         1 -aiiny
       1 -daiir        1 -l                            3 -iin          1 -aiir
       1 -dajy         1 -ld                           3 -odl          1 -airaii
       1 -dar          1 -oas                          3 -olody        1 -airy
       1 -din          1 -odaiin                       3 -ooiin        1 -alaiin
       1 -dorgy        1 -odam                         3 -oor          1 -alaiin
       1 -g            1 -odan                         3 -orody        1 -alam
       1 -iir          1 -odary                        3 -orory        1 -alas
       1 -lairgy       1 -odd                          3 -osaiin       1 -aldaii
       1 -ldam         1 -oddal                        3 -ydal         1 -aldar
       1 -m            1 -odoldy                       3 -yds          1 -alody
       1 -oaldy        1 -oiiin                        2 -aiiin        1 -als
       1 -odady        1 -olal                         2 -alod         1 -amar
       1 -odaiin       1 -oldam                        2 -als          1 -ara
       1 -odaiir       1 -oldar                        2 -dam          1 -aro
       1 -odals        1 -oloaii                       2 -dm           1 -arodai
       1 -odol         1 -olodal                       2 -oary         1 -aror
       1 -oj           1 -olom                         2 -odody        1 -aros
       1 -olaiin       1 -olsy                         2 -odor         1 -asal
       1 -olam         1 -on                           2 -odys         1 -ay
       1 -olarol       1 -ora                          2 -olaiin       1 -daiiin
       1 -oldain       1 -oraiin                       2 -oldam        1 -dalo
       1 -olg          1 -orar                         2 -oldar        1 -dalor
       1 -olinj        1 -ord                          2 -olm          1 -dly
       1 -oloara       1 -ordm                         2 -olom         1 -do
       1 -olor         1 -orly                         2 -orar         1 -dolaii
       1 -ora          1 -orm                          2 -orol         1 -ds
       1 -orad         1 -orods                        2 -orom         1 -dsairy
       1 -oraj         1 -oroiin                       2 -osar         1 -dyd
       1 -oraldy       1 -orom                         1 -aar          1 -dyldy
       1 -oram         1 -oror                         1 -ad           1 -dym
       1 -orol         1 -orory                        1 -adam         1 -i:dy
       1 -ory          1 -oross                        1 -adar         1 -lain
       1 -osal         1 -oryd                         1 -aii:dy       1 -lal
       1 -osam         1 -osory                        1 -aii:m        1 -ls
       1 -osar         1 -osy                          1 -aii:od       1 -ly
       1 -osarar       1 -oyd                          1 -aii:s        1 -m
       1 -osdy         1 -r                            1 -aiilm        1 -oal
       1 -oys          1 -sm                           1 -aiind        1 -oam
       1 -ral          1 -sordy                        1 -aiinda       1 -odain
       1 -sas          1 -sy                           1 -aiiny        1 -odair
       1 -sody         1 -yddor                        1 -aind         1 -odalai
       1 -sos          1 -yqoldy                       1 -ainos        1 -odaly
       1 -sy                                           1 -airin        1 -odam
       1 -yar                                          1 -alody        1 -odody
       1 -yda                                          1 -aor          1 -odydy
       1 -ydal                                         1 -arar         1 -odys
       1 -ydary                                        1 -arasy        1 -olal
       1 -ydy                                          1 -ardl         1 -olar
       1 -ys                                           1 -ariin        1 -old
       1 -ysam                                         1 -arm          1 -olody
                                                       1 -aro          1 -olol
                                                       1 -aroiin       1 -olor
                                                       1 -arom         1 -oraiin
                                                       1 -aryd         1 -oyl
                                                       1 -da           1 -riin
                                                       1 -dan          1 -rodal
                                                       1 -doly         1 -saiin
                                                       1 -dom          1 -san
                                                       1 -draird       1 -sar
                                                       1 -ds           1 -sdy
                                                       1 -i:s          1 -ym
                                                       1 -ir           1 -yom
                                                       1 -ld           1 -yoram
                                                       1 -lol          1 -yr
                                                       1 -lor    
                                                       1 -lsy    
                                                       1 -ly     
                                                       1 -oain   
                                                       1 -oair   
                                                       1 -oan    
                                                       1 -oarom  
                                                       1 -oas    
                                                       1 -oda    
                                                       1 -odaiin 
                                                       1 -odaiir 
                                                       1 -odair  
                                                       1 -odairo 
                                                       1 -odals  
                                                       1 -odary  
                                                       1 -odd    
                                                       1 -oddal  
                                                       1 -oddy   
                                                       1 -odo    
                                                       1 -odoaly 
                                                       1 -odoldy 
                                                       1 -odoral 
                                                       1 -odr    
                                                       1 -oii:s  
                                                       1 -oiir   
                                                       1 -oin    
                                                       1 -olal   
                                                       1 -olda   
                                                       1 -oldain 
                                                       1 -oldal  
                                                       1 -oldm   
                                                       1 -oldom  
                                                       1 -oloaii 
                                                       1 -olodal 
                                                       1 -oloiin 
                                                       1 -ololor 
                                                       1 -olols  
                                                       1 -ololy  
                                                       1 -olr    
                                                       1 -olraii 
                                                       1 -olsy   
                                                       1 -oo     
                                                       1 -ooaiin 
                                                       1 -ora    
                                                       1 -orain  
                                                       1 -oral   
                                                       1 -oraly  
                                                       1 -orari: 
                                                       1 -orary  
                                                       1 -ord    
                                                       1 -ordaii 
                                                       1 -ordm   
                                                       1 -orl    
                                                       1 -orly   
                                                       1 -orm    
                                                       1 -orodo  
                                                       1 -orods  
                                                       1 -oroiin 
                                                       1 -oross  
                                                       1 -ors    
                                                       1 -oryd   
                                                       1 -osory  
                                                       1 -oyd    
                                                       1 -ra     
                                                       1 -raiin  
                                                       1 -rrr    
                                                       1 -ry     
                                                       1 -saiin  
                                                       1 -sal    
                                                       1 -sm     
                                                       1 -so     
                                                       1 -sody   
                                                       1 -sor    
                                                       1 -sordy  
                                                       1 -soy    
                                                       1 -yaiin  
                                                       1 -yays   
                                                       1 -ydain  
                                                       1 -ydainy 
                                                       1 -yddor  
                                                       1 -ydlo   
                                                       1 -ydm    
                                                       1 -yl     
                                                       1 -yly    
                                                       1 -yoar   
                                                       1 -yol    
                                                       1 -ysaiin 
                                                       1 -ysaiin 

  This is intriguing: the ratio -dy/-y, that is the clearest indicator
  of language B, is more marked on general words than on line-final
  words.  The ratios are 0.012 and 1.199 for text, 0.566 and 0.063 
  for line-final words.
  
  The frequencies of {-al,-ol,-ar,-or} at end-of-line are about half
  of their overall frequencies, for both languages.  The frequency of
  these suffixes in labels is intermediate between the general
  frequencies, but higher than the end-of-line frequency.
  
  Of the occurrences of -am in language A, 48% are at end of line; in
  language B, 78% are at end-of-line.  Presumably -am is an
  abbreviation (used at e-o-l to avoid a line break), or something
  that occurs mostly at end-of-sentence.
  
  The {-o,-a}/-y ratio is 0.073 for labels; 0.063 and 0.037 for end-of-line
  (A and B, respectively); and 0.211 and 0.093 for all herbal words (A and B).
  Yet one more argument that -y is a fancy version of -o or -a.
  
  Here is a summary of the most important suffix classes:
  
                labels       herbal-A eol  herbal-B eol  herbal-A all  herbal-B all
                ---- ------  --- --------  ---- -------  ---- -------  ---- -------
    -[yoa]        44 (14%)   202 (28%)       55 (26%)    2200 (37%)     583 (24%)  
    -[oa][lr]     66 (21%)    95 (13%)       18 (8.4%)   1886 (32%)     400 (16%)  
    -[aoy]d[yoa]  12 (3.8%)   41 (5.6%)       6 (2.8%)    232 (3.9%)    114 (4.7%) 
    -d[yoa]       10 (3.2%)   13 (1.8%)      31 (14%)      24 (0.4%)    642 (26%)  
    -              7 (2.2%)   20 (2.8%)       8 (3.7%)    124 (2.1%)     63 (2.6%) 
    -aiin          4 (1.2%)   58 (8.0%)       9 (4.2%)    316 (5.3%)    143 (5.9%) 
  
  Labels seem to use generally less -aiin, -dy, -y, -o

  Now let's look at unifixes at either extremity:
  
    pr -m -w 64 -e -t \
        .labels-s-unifs-all.frq \
        .hea-f-bol-unifs-all.frq  \
        .hea-f-eol-unifs-all.frq  \
        Note-009/hea-f-unifs-all.frq \
      | expand \
      > .unifs-a-joint.frq
      
    all labels      herbal-A bol    herbal-A eol    herbal-A all  
    freq unifix     freq unifix     freq unifix     freq unifix     
    ---- ---------  ---- ---------  ---- ---------  ---- ---------  
       6 am           46 daiin       107 daiin       412 daiin
       6 ar           16 m            30 dy           88 dy
       3 ary          12 or           20 dam          87 s
       2 dy           11 dain         19 dar          74 dain
       2 gy            7 dar          18 dal          71 dar
       2 odor          7 dor          18 s            52 or
       2 sal           7 sor          12 d            47 dal
       2 sar           6 oaiin        11 dain         45 ol
       2 sary          6 saiin        11 sy           40 dol
       2 siiir         5 dol           8 saiin        34 dam
       1 aiin          5 iin           7 am           34 dor
       1 ainaly        5 ol            6 da           33 saiin
       1 ainam         5 sol           6 dan          26 dair
       1 airar         4 doiin         6 dom          24 sy
       1 al            4 soiin         6 or           19 odaiin
       1 alols         3 in            6 sal          18 ar
       1 aly           3 l             5 aiin         17 d
       1 araly         3 odaiin        5 ar           16 m
       1 arar          3 olor          5 dary         16 sor
       1 araydy        3 sar           5 ody          16 y
       1 arody         3 y             5 ol           15 aiin
       1 asy           3 ydaiin        4 dair         15 r
       1 daiin         2 dair          4 r            15 sol
       1 daiindy       2 lor           4 raiin        14 sar
       1 dainy         2 oain          4 sos          13 qodaiin
       1 dal           2 odar          4 y            12 sal
       1 dalary        2 qo            3 daiiin       10 al
       1 daliir        2 qoaiin        3 dol          10 oaiin
       1 dalsy         2 qody          3 n            10 ody
       1 dan           2 qor           3 sar           9 am
       1 dar           2 soy           3 sol           9 dan
       1 daramgal      2 yol           3 ydaiin        9 do

  Label unifixes and text unifixes seem largey disjoint, except for 
  {am,al,ary,dy,sal,sar}. The longer label unifixes occur 
  once each; exceptions are sary and siir.
  
  The unifix "iin" occurs with nonzero frequency at b-o-l but almost
  absent elsewere.  Checking the text one can see that almost all
  occurrences are due to a common word (usually "daiin") being split
  across a line break.
  
  The "m" unifix which is common at beginning of line is a Friedman
  transcription bug: in the Currier transcription most of those "m"
  are actually "g"s attached to the end of the previous line.

    pr -m -w 64 -e -t \
        .labels-s-unifs-all.frq \
        .heb-f-bol-unifs-all.frq  \
        .heb-f-eol-unifs-all.frq  \
        Note-009/heb-f-unifs-all.frq \
      | expand \
      > .unifs-b-joint.frq
      
    all labels      herbal-B bol    herbal-B eol    herbal-B all  
    freq unifix     freq unifix     freq unifix     freq unifix     
    ---- ---------  ---- ---------  ---- ---------  ---- ---------  
      6 am            19 daiin        11 dam          86 daiin
      6 ar             7 saiin         7 dy           55 or
      3 ary            6 dar           6 daiin        50 dar
      2 dy             4 iin           6 dal          40 aiin
      2 gy             4 m             4 am           39 ar
      2 odor           3 dair          4 dar          31 ol
      2 sal            3 ol            3 aiin         28 dy
      2 sar            3 or            3 ain          25 dal
      2 sary           3 r             3 ar           25 saiin
      2 siiir          2 dor           3 daly         17 dam
      1 aiin           2 sor           3 ldy          14 ody
      1 ainaly         1 aiin          3 ody          12 oraiin
      1 ainam          1 ain           3 ol           12 s
      1 airar          1 ar            3 or            9 al
      1 al             1 daiir         3 s             8 olaiin
      1 alols          1 darar         3 saiin         8 r
      1 aly            1 iir           2 al            7 ain
      1 araly          1 laiin         2 aram          7 dair
      1 arar           1 lodaiin       2 d             7 odaiin
      1 araydy         1 oaiin         2 da            7 y
      1 arody          1 odair         2 daram         6 dol
      1 asy            1 olair         2 dary          6 raiin
      1 daiin          1 qoaiin        2 od            5 am
      1 daiindy        1 rodaiin       2 oldam         5 dain
      1 dainy          1 saiir         2 oraiin        5 dor
      1 dal            1 sair          2 r             5 m
      1 dalary         1 sairain       2 raiin         5 sair
      1 daliir         1 sar           2 sa            5 sar
      1 dalsy          1 saraiir       2 y             4 araiin
      1 dan            1 sol           1 a             4 daly
      1 dar            1 solaiin       1 aldy          4 iin
      1 daramgal       1 y             1 alod          4 ldy

  The disjointness of label and text unifixes apparently holds
  for language B too.