Hacking at the Voynich manuscript - Side notes
077 Further comparisons of Recipes to the Shennong Bencao 

Last edited on 2025-06-26 11:59:05 by stolfi

INTRODUCTION

  This note makes further analyses comparing  the Starred Paragraphs (Recipes) section
  of the VMS to the Chinese medical classic Sennon Bencao Jing (SBJ).
    
SETUP

    ln -s ../work

    ln -s work/error_funcs.gawk
    ln -s work/compute-freqs.sh
    ln -s work/make_histogram.gawk
    ln -s work/insert_blank_lines.gawk
    
    ln -s ../076/compute_parag_stats.sh

    ln -s ${HOME}/ttf

DATA FILES

  Created "bencao.pin" with one version of the Shennong Bencao Jing, reformatted
  to look sort of like an EVT file.
  
  Using the "starps.eva" file created in note 076 as the source for the
  Starred Parags (Recipes) section of the VMS.
     
COUNTING ENTRIES AND WORDS

    ./count_recipes_and_words.sh bencao.pin
    ./count_recipes_and_words.sh starps.eva

HISTOGRAMS OF PARAG SIZES

  Computing the number of words per parag, and the histogram of those numbers:

    ./make_hist_of_words_per_recipe.sh bencao.pin
    ./make_hist_of_words_per_recipe.sh starps.eva
    
    plot_parag_size_histograms.sh bencao starps

  While analyzing the number of words per paragraph in the SBJ file
  ("bencao.pin") posted earlier, I noticed that there were several
  parags with only 3--5 Chinese words. It turns out that those are
  subsection headers. Here they are. The locus 1.X.YYY means that it is
  subsection X of section 1 《中卷》starting at line YYY. The notation
  2.X.YYY is analogous but for section 2 《下卷》

    1.1.001 玉石部上品  yùshí ù shàngpǐn         Top grade jade                                 
    1.2.019 玉石部中品  yùshí bù zhōng pǐn       Jade department middle grade                 
    1.3.033 玉石部下品  yùshí bùxià pǐn          jade subordinate product                     
    1.4.044 草部上品    cǎo bù shàngpǐn          Top grade grass                               
    1.5.102 草部中品    cǎo bù zhōng pǐn         Kusanabe middle grade                         
    1.6.162 草部下品    cǎo bùxià pǐn            The lowest grade of grass                     
    1.7.219 木部上品    mù bù shàngpǐn           Top grade wood                               
    1.8.234 木部中品    mù bù zhōng pǐn          Kibe middle grade                             
    1.9.253 木部下品    mù bùxià pǐn             Kibe inferior grade                           
    2.1.001 蟲獸部上品  chóng shòu bù shàngpǐn   Top quality insects and beasts               
    2.2.017 蟲獸部中品  chóng shòu bù zhōng pǐn  Insect and animal department medium quality   
    2.3.042 蟲獸部下品  chóng shòu bùxià pǐn     Insect Beast Subordinates                     
    2.4.069 果菜部上品  guǒcài bù shàngpǐn       Top quality fruits and vegetables department 
    2.5.080 果菜部中品  guǒcài bù zhōng pǐn      Medium range of fruits and vegetables         
    2.6.087 果菜部下品  guǒcài bùxià pǐn         Fruit and vegetable products                 
    2.7.091 米穀部上品  mǐgǔ bù shàngpǐn         Top grade rice cereals                       
    2.8.094 米穀部中品  mǐgǔ bù zhōng pǐn        Mid-grade rice                               
    2.9.098 米穀部下品  mǐgǔ bùxià pǐn           The inferior product of Rice Valley           

  The pinyin readings and translations are from Google Translate. I left them
  unedited for the lulz.

  After commenting those lines out, the shortest remaining entry seemed to be normal:

    2.3.044  鼯鼠　主墮胎，令易產。   wú shǔ zhǔ duòtāi, lìng yì chǎn 
    Flying squirrel: causes abortion and makes childbirth easier.

  (If that can be called "normal"...)

  And then I noticed that the Starred Parags file ("starps.eva") too had
  a few anomalously short parags of ~4 Voynichese words. Those were
  so-called "titles", short lines with anomalous justification:

  <f105r.T1.9a>    =sairy.ore.daiindy.ytam=
  <f105r.T2.36>    =otoiis.chedaiin.otair.otaly=
  <f108v.T1.52>    =olchar.olchedy.lshy.otedy=
  <f114r.T1.34>    =ytain.olkaiin.ykar.chdar.alkam=

  The title <f114r.T1.34> is a right-justified line after a parag that
  ends with a full line. It had been assumed to be the trailer of the
  previous parag that the Scribe skipped and then inserted in that
  non-standard position. However, the first line of the next parag
  <f114r.P1.35> bends down to avoid that title. Thus, if that conjecture
  is true, the Scribe must have realized the omission after writing the
  firat 4 lines of <f114r.P1.35>. I have now re-interpreted
  <f114r.T1.34> as a title.

  It is possible that other section headers were not recognized as such
  and were joined with adjacent parags.

  After commenting out the subsection titles on both files, I counted
  again the number of words and parags, and basic statistics (min, max,
  average, and standard deviation) of the number of words per paragraph
  (nwp):

    statistic   !  bencao !  starps
    ------------+---------+--------
    parags      |     354 |     330
    words       |   10874 |   10457
    min nwp     |       7 |      11
    max nwp     |      76 |      72
    avg nwp     |    30.8 |    31.7
    dev nwp     |     8.5 |    11.2


LOOKING FOR REPEATED WORD PATTERNS

  Challenged by the internet, will try to look for repeated patterns in the two files.
  
    ./count_repeated_tuples.sh 5 bencao.pin
    ./count_repeated_tuples.sh 3 starps.eva
   

======================================================================
  Here are some advances in the comparison between the Starred Parags (SPS) section and
  the Shennong Bencao Jing (SBJ).  Recall that the files are:

  [quote="Jorge_Stolfi" pid='67750' dateline='1750041874']
  starps.eva The Starred Paragraphs section (SPS) from Takeshi's transcription in the 1.6e6 interlinear file, from page f102r to line 30 of f116r. With one parag per line, in the EVA encoding, with all alignment fillers and comments removed, all weirdos and missing chars mapped to '*', one "=" at start and end of each line (= parag).
  bencao.pin The SBJ from the webpage posted by @oshfdk, minus the introduction 《上卷》 and section headers (see below), converted to pinyin by Google Translate, mapped to lowercase.
  Both files are in UTF-8 encoding.  Again, if you just click on those links you will see gibberish, because the server at my Univ expects plain text files to be in ISO-Latin-1 and thus messes up the formatted HTML that it sends to your browser.  You will have to download the files and look at them with any text editor or viewer that understands UTF-8.
  [/quote]


  Here is the histogram of the word counts nwp:
  
  At first sight the histograms are different, but there are some intriguing similarities.  Note that both files have 23 entries with 27 words (the most common entry length in both files), six entries with 23 words, 8 entries with 37 words, 2 entries with 47 words, one entry with 53 words, one entry with 59 words, and one entry with 62 words.  In both files, there are anomalously few entries with 23, 37, and 43 words. 

  Considering the missing bifolio in the SPS quire, we have 6 surprising near coincidences: number of entries, and the mode, min, max, average, and deviation of the number of words per paragraph. (The total number of words is not an extra coincidence since it is the average npw times the number of entries.)  

  Compared to the SBJ, the SPS has a somewhat broader npw histogram, as implied by the standard deviation.  It has more entries with 10-20 words and 35-70 words, and fewer with 21-34 words.  In particular, the SBJ has a second mode: 23 parags of 34 words, whereas the SPS has only 11.  
  
  These discrepancies could be the result of the some word spaces being incorrectly inserted or omitted in the SPS as it was digitized; somewhat at random, with almost the same probability.  
  
  Alternatively, some parag breaks in the SPS may be wrong, causing, for example, two consecutive parags that should have 22 and 32 words to become parags of 16 and 38 words; and two parags that should have 7 and 76 words to become parags with 13 and 70 words.
  
  Both kinds of errors would have little effect on the average npw, but would increase its standard deviation, as observed.
  
  There is also the bonus coincidence of both files having originally subsection titles with ~4 words each, althout the number of such titles is vastly different.  More on that later.
  
  Now for the bad news. As @oshfdk observed, there are hundreds of multiword sequences
  that occur many times in the SBJ.  In particular, there is a 10-word phrase
  that occurs six times, on six consecutive lines:
  [code]
    久食輕身不老，延年神仙。一名
    <s1.4.045> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.046> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.047> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.048> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.049> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    <s1.4.050> iǔ.shí.qīng.shēn.bùlǎo.yán.nián.shénxiān.yī.míng
    Eating it for a long time will make you light and immortal. It is also called
  [code]
  On the other hand, in the SPS the longest phrases that occur more than once
  have only 3 words; and the most common occurs only three times:
  [code]
    <f103r.P1.52> chedy.qokeey.qokeey
    <f108v.P1.44> chedy.qokeey.qokeey
    <f112v.P1.15> chedy.qokeey.qokeey
  [/code]
  I will discuss the implications of this difference in another post.

======================================================================

>> REDO THIS >>

LOOKING FOR REPEATED WORD PATTERNS

  Challenged by the internet, will try to look for repeated patterns in the two files.
  
  Extracting word tuples from the two files:
  
    for 
    cat ShenNongBenCaoJing_pyin.txt \
      | extract_word_tuples.gawk \
      | sort \
      | list_repeated_patterns.gawk \
      > .shen-reps
      
   
>> REDO THIS >>
  
THE DATA

  Raw data files, with comments prefixed with "#", recipe numbers
  in the form S-NNN prefixed with "##", and each kanji surrounded 
  by ASCII spaces:
  
    ln -s ~/IMPORT/texts/chinese/ShennongBencao/text.big5 bencao-raw.big5
    ln -s ~/IMPORT/texts/chinese/ShennongBencao/text.jis  bencao-raw.jis
  
  Data files without punctuation:

    cat bencao-raw.big5 \
      | gawk \
          ' /^#/ {print; next;} \
            // { \
              gsub(/[ ]+[{}][ ]+/, " ", $0); \
              gsub(/[ ]+[\241][\264]/, "", $0); \
              gsub(/[ ]+[\241][D]/, "", $0); \
              print; \
            } ' \
      > bencao.big5
      
    cat bencao-raw.jis \
      | gawk \
          ' /^#/ {print; next;} \
            // { \
              gsub(/[ ]+[{}][ ]+/, " ", $0); \
              gsub(/[ ]+[\201][\234]/, "", $0); \
              gsub(/[ ]+[\201][D]/, "", $0); \
              print; \
            } ' \
      > bencao.jis
      
    dicio-wc bencao{-raw,}.{big5,jis} vstars.eva

      lines   words     bytes file        
    ------- ------- --------- ------------
       2008   17532     57611 bencao-raw.big5
       1510   19003     70177 bencao-raw.jis
       2008   13705     46183 bencao.big5
       1510   15229     57427 bencao.jis
       1742   13642     86751 vstars.eva *

       1734   13354     85052 vstars.eva **
       
    * = as of sometime before 2004-05-30.
    ** = as of 2004-05-30.  Unknown why it changed since the original run.
       
  Extracted the Voynichese "stars" section from the Majority version,
  reformatted to be comparable to the Bencao (line numbers as
  NNNV-U-LL, recipe numbers as "## S-NNN", all words surrounded by
  ASCII space).  Fixed many errors by hand, against KHE's images
  (also in the interlinear file).
  
BASIC STATISTICS

  Checking whether each VMS page has been split into the correct number
  of recipes:
  
    cat vstars.eva \
      | count-recipes-per-page \
      > vstars.rpp
    diff true.rpp vstars.rpp
  
      total 328 recipes

  Note that total has changed. This is because, during the 05/2004 round of edits 
  some long recipes were split at paragraph breaks, even though there
  were no stars there.  This is not too unreasonable, because the stars seem 
  to have been placed without much care, as if the scribe did not understand 
  that they were associated with the paragraphs.
      
  Basic statistics - total tokens, words, recipes:
  
    foreach f ( bencao.big5 vstars.eva )
      printf "\n%-10s" "${f:r}:"
      cat $f \
        | print-tk-wd-counts \
        > ${f:r}.twct
      cat ${f:r}.twct \
        | sort -b -k3nr -k1n \
        | egrep -v '^000 ' \
        > ${f:r}.twsr
    end

    bencao:   total 357 recipes, 12826 tokens (   0 bad), 35.93 tokens/recipe, 1113 good words
    vstars:   total 328 recipes, 10491 tokens (  38 bad), 31.98 tokens/recipe, 2996 good words

   Note that these counts have changed since 02/2002. They used to be
    
    vstars: total 323 recipes, 10542 tokens,  ( 595 bad), 32.64 tokens/recipe, 2767 good words

   During the 05/2004 round of edits, many tokens became joined with
   their neighbors, because the spaces were entered as faithfully as
   possible. However, if we believe the word structure paradigm, then
   many of those joined words should have been kept separate. Also
   note that over 550 "bad" tokens were fixed by those edits.
      
RECIPE LENGTH HISTOGRAMS

  Plotting the recipe length histograms:
  
    foreach tw ( tk.3 wd.4 )
      foreach f ( bencao vstars )
        printf "\n%s (%s): " "${f}" "${tw:r}"
        cat ${f}.twct \
          | gawk -v fld="${tw:e}" '/./{ print $(fld); }' \
          | compute-tk-wd-histogram -v quantum=5 \
          > ${f}.${tw:r}h
      end
      foreach fmt ( png )
        plot-twhi -format ${fmt} \
          bencao.${tw:r}h Bencao 1 \
          vstars.${tw:r}h Voynich 2 \
        > recipe-${tw:r}-hist.${fmt}
      end
    end

RECIPE LENGTH PLOTS

  Plotting the recipe lengths as function of position in text:
  
    foreach fmt ( png )
      foreach f ( bencao.Bencao vstars.Voynich )
        plot-recipe-attr \
            -format ${fmt} \
            ${f:r}.twct "${f:e} (tk)" 3 1  1.0 \
          > ${f:r}-tk-counts.${fmt}
      end
    end

  Dito, smoothed:

    foreach width ( 09 )
      foreach fmt ( png )
        foreach f ( bencao.Bencao vstars.Voynich )
          foreach type ( avg dif )
            cat ${f:r}.twct \
              | gawk '/./{ print $1, $2, $3; }' \
              | filter-recipe-data -v ${type}=1 -v width=${width} \
              > ${f:r}-${type}${width}.tct
          end
          plot-recipe-attr \
              -format ${fmt} \
              ${f:r}-avg${width}.tct "${f:e} avg${width}" 3 1  1.0 \
              ${f:r}-dif${width}.tct "${f:e} dif${width}" 3 2 60.0 \
            > ${f:r}-tk-counts-dif${width}.${fmt}
        end
      end
    end

COINCIDENCE IMAGES

  Computing the coincidence image:
  
    foreach width ( 09 )
      foreach et ( 0.5/0.05/avg 0.01/0.01/dif )
        set err = "${et:h}"
        set type = "${et:t}"
        compute-coincidence-image \
            -v absErr=${err:h} -v relErr=${err:t} \
            -v xFile=bencao-${type}${width}.tct -v xField=3 \
            -v yFile=vstars-${type}${width}.tct -v yField=3 \
          | pgmnorm | pnmdepth 255 \
          > recipe-tk-counts-${type}${width}.pgm
        display recipe-tk-counts-${type}${width}.pgm
      end
    end
    
INTERESTING WORDS

  Word frequency tables:
  
    foreach f ( bencao.big5 vstars.eva )
      echo " "; echo "=== ${f:r} ==="
      cat $f \
        | gawk \
            ' /^ *([#]|$)/{ next; } \
              //{ \
                gsub(/^[-.0-9a-zA-Z]*/, " ", $0); \
                gsub(/[ ][-={}]/, " ", $0); \
                print; \
              } ' \
        | tr ' ' '\012' \
        | egrep '.' \
        | sort | uniq -c | expand \
        | map-field \
            -v table=big5-to-html.tbl \
            -v inField=2 -v outField=3 -v forgiving=1 \
        | map-field \
            -v table=html-to-py.tbl \
            -v inField=3 -v outField=4 -v forgiving=1 \
        | map-field \
            -v table=html-to-meaning.tbl \
            -v inField=3 -v outField=5 -v forgiving=1 \
        | gawk '//{ print $1, ($3 ($3==$4 ? "" : ("=" $4)) ($5==$3 ? "" : ("=" $5))); }' \
        | sort -b -k1nr -k2 \
        | compute-freqs \
        > ${f:r}.wfr
      head -100 ${f:r}.wfr
    end
        
    === bencao ===
    
        362 0.02823 &#29983;=(sheng1,5:sheng5)
        358 0.02791 &#21619;=(wei4)
        352 0.02745 &#27835;=(zhi4)
        313 0.02441 &#21517;=(ming2)
        308 0.02402 &#19968;=(yi1)
        299 0.02331 &#27683;=(qi4)
        293 0.02285 &#23506;=(han2)
        245 0.01910 &#35895;=(gu3,yu4)
        198 0.01544 &#29105;=(re4)
        168 0.01310 &#24179;=(ping2)
        161 0.01255 &#36523;=(shen1,juan1)
        154 0.01201 &#19981;=(bu4,5:bu5,bu2)
        149 0.01162 &#20037;=(jiu3)
        144 0.01123 &#20013;=(zhong1,zhong4)
        144 0.01123 &#24029;=(chuan1)
        143 0.01115 &#26381;=(fu2,fu4,5:fu5)
        136 0.01060 &#33510;=(ku3)
        136 0.01060 &#36629;=(qing1)
        132 0.01029 &#23665;=(shan1)
        129 0.01006 &#28331;=(wen1)

    === vstars ===
    
        189 0.01802 aiin=aiin
        189 0.01802 chedy=chedy
        155 0.01477 qokeey=qokeey
        146 0.01392 ar=ar
        134 0.01277 qokeedy=qokeedy
        131 0.01249 al=al
        127 0.01211 daiin=daiin
        121 0.01153 chey=chey
        119 0.01134 qokaiin=qokaiin
        115 0.01096 shedy=shedy
         96 0.00915 okeey=okeey
         96 0.00915 ol=ol
         95 0.00906 okaiin=okaiin
         89 0.00848 qokain=qokain
         76 0.00724 otaiin=otaiin
         75 0.00715 cheey=cheey
         70 0.00667 shey=shey
         69 0.00658 okain=okain
         63 0.00601 chol=chol
         63 0.00601 oteey=oteey


  Extract list of kth word from each recipe, and their distributions:
  
    foreach k ( 1 2 3 4 )
      foreach f ( bencao.big5 vstars.eva )
        printf "\n\n=== %s[%s] ===\n\n" "${f:r}" "$k"
        cat $f \
          | gawk -v which=${k} \
              ' /^[#][#]/{ fst = 1; next; } \
                /^ *([#]|$)/{ next; } \
                (fst){ \
                  gsub(/^[-.0-9a-zA-Z]*/, " ", $0); \
                  gsub(/[ ][-={}]/, " ", $0); \
                  print $(which); fst = 0; \
                } ' \
          | tr ' ' '\012' \
          | egrep '.' \
          > ${f:r}-${k}.tks
        cat ${f:r}-${k}.tks \
          | sort | uniq -c | expand \
          | map-field \
              -v table=big5-to-html.tbl \
              -v inField=2 -v outField=3 \
              -v forgiving=1 \
          | map-field \
              -v table=html-to-py.tbl \
              -v inField=3 -v outField=4 \
              -v forgiving=1 \
          | map-field \
              -v table=html-to-meaning.tbl \
              -v inField=3 -v outField=5 -v forgiving=1 \
          | gawk '//{ print $1, ($3 ($3==$4 ? "" : ("=" $4)) ($5==$4 ? "" : ("=" $5))); }' \
          | sort -b -k1nr -k2 \
          | compute-freqs \
          > ${f:r}-${k}.wfr
        head -5 ${f:r}-${k}.wfr
      end
    end

      === bencao[1] ===

           19 0.05322 &#30333;=(bai2,5:bai5)
           15 0.04202 &#30707;=(shi2,dan4)
            6 0.01681 &#32043;=(zi3)
            5 0.01401 &#22823;=(da4,dai4)
            5 0.01401 &#27700;=(shui3)

      === vstars[1] ===

            6 0.01829 daiin=daiin
            5 0.01524 polaiin=polaiin
            5 0.01524 tchedy=tchedy
            4 0.01220 pchedal=pchedal
            4 0.01220 pcheor=pcheor


      === bencao[2] ===

           15 0.04202 &#23526;=(shi2)
           11 0.03081 &#30707;=(shi2,dan4)
            7 0.01961 &#33609;=(cao3)
            6 0.01681 &#21443;=(can1,cen1,shen1,san1)
            6 0.01681 &#33437;=(zhi1)

      === vstars[2] ===

            7 0.02134 ar=ar
            6 0.01829 shedy=shedy
            5 0.01524 chey=chey
            5 0.01524 qokaiin=qokaiin
            4 0.01220 cheo=cheo


      === bencao[3] ===

          169 0.47339 &#19968;=(yi1)
          111 0.31092 &#21619;=(wei4)
           13 0.03641 &#23376;=(zi5,zi3,zi2)
            3 0.00840 &#23526;=(shi2)
            3 0.00840 &#33609;=(cao3)

      === vstars[3] ===

            9 0.02744 shedy=shedy
            7 0.02134 qokain=qokain
            5 0.01524 chedy=chedy
            5 0.01524 okain=okain
            5 0.01524 qokaiin=qokaiin


      === bencao[4] ===

          169 0.47339 &#21517;=(ming2)
           44 0.12325 &#33510;=(ku3)
           36 0.10084 &#19968;=(yi1)
           32 0.08964 &#36763;=(xin1)
           26 0.07283 &#21619;=(wei4)

      === vstars[4] ===

            9 0.02744 qokeey=qokeey
            7 0.02134 shedy=shedy
            6 0.01829 qokeedy=qokeedy
            5 0.01524 oteedy=oteedy
            4 0.01220 okeey=okeey

REPEATED WORDS

  Checking for repeats
  
    foreach f ( bencao.big5 vstars.eva )
      printf "\n%s: " "${f:r}"
      cat ${f} \
        | list-repeats \
        > ${f:r}.reps
      cat ${f:r}.reps | wc -l 
      cat ${f:r}.reps \
        | gawk '/./{ print $2; }' \
        | sort | uniq -c | expand \
        | map-field \
            -v table=big5-to-html.tbl \
            -v inField=2 -v outField=3 \
            -v forgiving=1 \
        | map-field \
            -v table=html-to-py.tbl \
            -v inField=3 -v outField=4 \
            -v forgiving=1 \
        | gawk '//{ print $1, ($3 "=" $4); }' \
        | sort -b -k1nr -k2 \
        > ${f:r}.rtop
      head -3 ${f:r}.rtop
    end
    
      bencao:      41
      8 &#27927;=(xi3,xian3)
      6 &#34880;=(xue4,xie3)
      5 &#23506;=(han2)

      vstars:      81
      10 qokeedy=qokeedy
      10 qokeey=qokeey
      7 ar=ar

  Build word-paragraph occurrence map.
  
    foreach f ( bencao.big5 vstars.eva )
      cat ${f} \
        | sed \
            -e 's/^[#][#] */@/' \
            -e 's/[#].*$//' \
            -e 's/^[0-9][-A-Za-z0-9]*[ ]/ /' \
            -e '/^[ ]*$/d' \
        | tr ' ' '\012' \
        | gawk \
            ' BEGIN{ \
                split("", map); \
                split("", wd); nwd=0; split("", wdct); \
                split ("", pg); npg = 0; p = "???"; \
              } \
              /^[@]/ { \
                p = $1; gsub(/[@]/, "", p); \
                pg[npg] = p; npg++; next; \
              } \
              /./ { \
                w = $1; \
                if (! (w in wdct)) \
                  { wd[nwd] = w; nwd++; wdct[w] = 0; } \
                wdct[w]++; map[p,w]++; \
              } \
              END { \
                for (w in wdct) \
                  { printf "%-20s %5d ", w, wdct[w]; \
                    for (i = 0; i < npg; i++) \
                      { p = pg[i]; \
                        if ((p,w) in map) \
                          { printf "%d", map[p,w]; } \
                        else \
                          { printf "."; } \
                      } \
                    printf "\n"; \
                  } \
              } \
            ' \
        | sort -b -k2nr -k1 \
        > ${f:r}.wpm
    end