Hacking at the Voynich manuscript - Side notes
042 Label occurrences in the text

Last edited on 1998-10-28 02:01:59 by stolfi

GOAL

Looking for occurrences of selected labels in the text.

PHARMA LABELS

For starters, let's take the labels on page f101v2+f101v1:

  <f101v2.R1.2;V>    otaldy={Grove's X.6}
  <f101v2.R1.3;U>    otol=
  <f101v2.R1.4;V>    yty={Grove's X.9}
  <f101v2.R1.5;V>    dokor={was <f101v1.R1.1;V>}
  <f101v2.R1.6;V>    orar={was <f101v1.R1.2;V>}
  <f101v2.R1.7;V>    otarar={was <f101v1.R1.3;V>}
  <f101v2.R1.8;V>    otoly={was <f101v1.R1.4;V>}
  <f101v2.R1.9;V>    soraly={was <f101v2.R1.5;V>}
  <f101v2.R2.2;U>    arom=
  <f101v2.R2.3;U>    orar,am=
  <f101v2.R2.5;U>    dytoly=
  <f101v2.R2.6;U>    olkor=
  <f101v2.R2.7;V>    dolary={was <f101v1.R2.3;V>}
  <f101v2.R2.8;U>    odor=
  <f101v2.R2.9;U>    olaran=

From the VMS concordance (see Notes/037) I manually extracted all
occurrences of these labels in the text. In most cases I took only
exact matches, but in some cases I tolerated a/o variations. The
result is in the file f101v2-labels-xref.txt.  The entries in that
file that matter begin with "+" and have the following fields:

  1 2   3    4     5     6     7
  + SEC FNUM LOCUS TRANS LABEL CONTEXT

Here LOCUS has the form UNIT.LINE, TRANS is a transcriber code (uppercase
letter), LABEL is the original label, and CONTEXT is its occurrence in the
text, with some surrounding words.

To get here I had to do some cleanup. I deleted the bogus references
in f100v.T and f100v.M (which were misplaced copies of those in
f101v2). I inserted manually the label as field 5, with a terminating
"|", then filtered through

  gawk -v FS='|' \
    ' /^[#<]/{print;next;} \
      /^ *$/{print;next;} \
      //{gsub(/[ ]/, ".", $2); printf "%-32s %s\n", $1, $2;} \
    '

Sorting that file by page:

  cat f101v2-labels-xref.txt \
    | egrep '^[+]' \
    | map-field \
        -v inField=3 -v outField=3 \
        -v table=fnum-to-pnum.tbl \
    | sort +2 -4 +5 -6 \
    | gawk '//{print $2, $3, $4, $5, $6, $7, $8;}' \
    > f101v2-labels-xref-sort.txt
    
This file has fields 

  1   2    3    4     5     6     7
  SEC PNUM FNUM LOCUS TRANS LABEL CONTEXT
  
Counting the number of occurrences of each label:

  cat f101v2-labels-xref-sort.txt \
    | gawk '//{print $6;}' \
    | sort | uniq -c | expand \
    | sort +0 -1nr

     76 otol
     23 yty
     22 orar
     17 otarar
     10 otaldy
      7 odor
      7 otoly
      5 olkor
      4 dokor
      4 dolary
      3 olaran
      3 orar,am
      3 soraly
      2 arom
      2 dytoly

Again, herbal-only:

  cat f101v2-labels-xref-sort.txt \
    | gawk '(($1=="hea")||($1=="heb")){print $6;}' \
    | sort | uniq -c | expand \
    | sort +0 -1nr

     32 otol
     15 yty
      4 olkor
      4 orar
      4 otarar
      3 odor
      2 dokor
      2 otaldy
      2 otoly
      1 dolary
      1 dytoly
      1 olaran
      1 orar,am

Tabulating it:

  cat f101v2-labels-xref-sort.txt \
    | format-label-xref \
    > f101v2-labels-xref-sort.plt

Let's look at the herbal pages that have the low-frequency labels:

  cat f101v2-labels-xref-sort.txt \
    | gawk \
        ' /^[<]/{print;next;} \
          /^ *$/{print; next;} \
          ($1 !~ /hea|heb/) { next; } \
          ($6 ~ /^(olkor|orar|otarar|odor|dokor|otaldy|otoly|dolary|dytoly|olaran|orar[,]*am)$/) {print;} \
        ' \
    | sort +5 -6 +1 -2
    
    heb 097 f50r P.2 F dokor ..ockhody.shos.alol.dy.kar.oky.daiin.okar.
    heb 118 f66v P.10 F dokor .....kal.daiin.otal.dakar.otam-yteeod.aiin.
    
      No obvious resemblance.

    hea 043 f23r P.11 F dolary ....dar.ykain-ykyka.dalory=
    
      No obvious resemblance.

    heb 090 f46v P.5 F dytoly ..qokedy.chdy.okedy.dykaly.daiin.chedy.okeedy.
    
      No obvious resemblance.

    hea 046 f24v P.13 U odor ..-oeeey.cheol.chol.odor.sho.do.otolodal-
    heb 089 f46r P.6 F odor .....chdalor.sheedy.odor.aiin.opchedy.dykedy.
    hea 103 f53r P.6 F odor -ykeodar.oqoor.ockh.odor.chain.qokod-ykchdy.
    
      The leaf of f101v2[2,8] = odor has some resemblance to that 
      of f53r[1,1].  Othwerwise, there is no obvious resemblance.

    heb 094 f48v P.6 F olaran .okar.otar.or.otees.ol.orain-otal.okytar.chedy.
    
      No obvious resemblance.

    heb 063 f33r P.5 F olkor -pair.oraiin.otaiin.olkor.aiin.okal.otal.
    heb 076 f39v P.5 F olkor ..aiin.okaiin.ckhol.ol.kor.otor.opchy-lkedy.
    heb 193 f95r1 P.8 F olkor chetchdy.chdy.chkam-olkor.chdaiin.chol.kaiin.
    heb 194 f95r2 P.6 U olkor .....qopchdy.kary-y.olkor.ol.shol.qotar.chdy.

      No obvious resemblance.

    heb 065 f34r P.15 F orar .chor.ar.aiiin.daly-or.ar.ykar.ol.al.oky-
    heb 077 f40r P.6 F orar ...ar.ar.or.dam-tor.or.ar.shokoram.olshedy.
    heb 097 f50r P.4 F orar ....qokchdy.qokaiin.or.ar.alol.keodaiin.olr.
    hea 178 f87v P.12 F orar ....-yksho.qos.arol.or.ar.al.daraiinm-saiin.

      No obvious resemblance.
    
    heb 107 f55r P.1 F orar,am ...chepaiin.qokchdy.or.arod-okair.or.aiin.chody.

      No obvious resemblance.
    
    hea 104 f53v P.13 F otaldy ....adam-ycthadaiin.otaldy=
    heb 196 f95v1 P.4 F otaldy .qokal.oty.shekshey.otaldy.okshey.ytshedy.

      No obvious resemblance.
    
    heb 066 f34v P.8 F otarar chkain.otain-ysheos.otar.ar.cho.raiin.cheky.
    heb 076 f39v P.5 F otarar ..kor.or.sheky.kain.otar.or.aiin.okaiin.ckhol.
    heb 090 f46v P.2 F otarar .okaly.daiin.qokedy.otar.ar.oldy.otedy.saim
    heb 094 f48v P.6 F otarar ..-shdy.qokain.okar.otar.or.otees.ol.orain-

      No obvious resemblance.
    
    hea 071 f37r P.5 F otoly okchy.qotchor.chkol.otoly-shor.shol.qokchy.
    heb 094 f48v P.8 F otoly chckhedy.ykedy.oldy-otoly.chey.taly.tokar.

      The leaves of f37r[1,1] resemble those of f101v2[1,8].
      Othwerwise, there is no obvious resemblance.

So the coincidences between this Pharma page and the herbal pages seem to be 
unrelated to the shape of the plants.  

The following pages have an unusual number of occurrences of labels from 
f101v2 (not counting the very popular "otol" and "yty"): 

                  o    s    o    o    o    d    o    o    d    o    d    o    a    o    y    
                  r    o    t    d    l    y    t    l    o    t    o    r    r    t    t    
                  a    r    a    o    k    t    o    a    k    a    l    a    o    o    y    
                  r    a    r    r    o    o    l    r    o    l    a    r    m    l         
                       l    a         r    l    y    a    r    d    r    ,                   
                       y    r              y         n         y    y    a                   
                                                                         m                   
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    heb f39v   |    |    |*   |    |*   |    |    |    |    |    |    |    |    |    |    |
    heb f46v   |    |    |*   |    |    |*   |    |    |    |    |    |    |    |    |    |
    heb f48v   |    |    |*   |    |    |    |*   |*   |    |    |    |    |    |    |    |
    heb f50r   |*   |    |    |    |    |    |    |    |*   |    |    |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    unk f76r   |**  |*   |    |    |    |    |    |    |    |    |    |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    bio f79v   |    |    |    |    |    |    |    |    |    |*   |*   |    |    |**  |*   |
    bio f81r   |*   |    |    |    |    |    |    |*   |    |    |    |    |    |    |    |
    bio f84r   |*   |    |    |    |    |    |*   |    |    |    |    |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    ast f67r1  |    |    |    |    |    |    |    |    |    |*   |*   |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    pha f89v2  |    |    |    |**  |    |    |    |    |    |    |    |    |    |**  |    |
    pha f99r   |*   |    |    |    |    |    |    |    |    |*   |    |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    unk f85r1  |    |    |*   |*   |    |    |    |    |    |    |    |    |    |*   |    |
    unk f86v6  |    |    |*   |    |    |    |    |    |    |    |    |*   |    |*** |*   |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    str f58r   |*   |    |    |    |    |    |    |    |    |*   |    |    |    |    |    |
    str f104v  |*   |    |*   |    |    |    |    |    |    |    |    |    |    |    |    |
    str f106r  |*   |    |*   |    |    |    |    |    |    |    |    |    |    |*   |    |
    str f113v  |    |    |****|    |    |    |    |    |    |    |    |    |    |*   |    |
    str f115r  |**  |    |*   |    |    |    |    |    |    |    |    |    |    |    |    |
                ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 

A notable coincidence: "otaldy" and "dolary", which occur together on page bio f79v (the
Mermaid) also occur as consecutive radial labels in the "astro" diagram f67r1: at 
00:00 (first) and 11:00 (last). 

Perhaps f101v2 is bad because the labels were copied from the
wrong page altogether.  Note that there are three rows of plants
but only two have labels.

Let's do the same search with the labels on f100v, which seem to 
have been writen with more care.  Here are they:

  <f100v.T.1;V>      tolchd={was <f100v.B.1;V>}
  <f100v.T.2;V>      chols={was <f100v.B.2;V>}
  <f100v.T.3;V>      opchor={was <f100v.B.3;V>}
  <f100v.T.4;V>      solsy={was <f100v.B.4;V>}
  <f100v.M.1;V>      soleesos={was <f100v.B.5;V>}
  <f100v.M.2;V>      ykchochdy={was <f100v.B.6;V>}
  <f100v.M.3;V>      ykchdy={was <f100v.B.7;V>}
  <f100v.M.4;V>      dchdy={was <f100v.B.8;V>}
  <f100v.M.5;V>      dalsy={was <f100v.B.9;V>}
  <f100v.B.1;V>      okcheor={was <f100v.B.10;V>}
  <f100v.B.2;V>      ytchol={was <f100v.B.11;V>}
  <f100v.B.3;V>      dykchal={was <f100v.B.12;V>}
  <f100v.B.4;V>      chos.cthoral={was <f100v.B.13;V>}

The references (extracted manyally from the concordance) are
in f100v-labels-xref.txt.  Processing them:

  cat f100v-labels-xref.txt \
    | egrep '^[+]' \
    | map-field \
        -v inField=3 -v outField=3 \
        -v table=fnum-to-pnum.tbl \
    | sort +2 -4 +5 -6 \
    | gawk '//{print $2, $3, $4, $5, $6, $7, $8;}' \
    > f100v-labels-xref-sort.txt
    
This file has fields 

  1   2    3    4     5     6     7
  SEC PNUM FNUM LOCUS TRANS LABEL CONTEXT
  
Counting the number of occurrences of each label:

  cat f100v-labels-xref-sort.txt \
    | gawk '//{print $6;}' \
    | sort | uniq -c | expand \
    | sort +0 -1nr

       39 otchol
       34 ykchdy
       30 opchor
       11 chols
        7 dchdy
        4 okcheor
        3 tolchd
        2 dalsy
        1 chos.cthoral

Again, herbal-only:

  cat f100v-labels-xref-sort.txt \
    | gawk '(($1=="hea")||($1=="heb")){print $6;}' \
    | sort | uniq -c | expand \
    | sort +0 -1nr

       32 otchol
       23 opchor
       13 ykchdy
        7 chols
        3 dchdy
        2 okcheor

Let's look at the herbal pages that have the low-frequency labels:

  cat f100v-labels-xref-sort.txt \
    | gawk \
        ' /^[<]/{print;next;} \
          /^ *$/{print; next;} \
          ($1 !~ /hea|heb/) { next; } \
          ($6 ~ /^(chols|dchdy|okcheor)$/) {print;} \
        ' \
    | sort +5 -6 +1 -2
    
    hea 005 f3r P.16 F chols .otchom.oporar-oteol.chol.s.cheol.ekshy.qokeom
    hea 010 f5v P.1 U chols ...char.ytchey.pshod.chols.chodaiin.ytoiiin
    hea 045 f24r P.18 F chols ..-ycheol.chol.daiin.chol.s-yol.tol.chol.shom
    hea 051 f27r P.7 F chols .cheol.pchy.schey.ly-chals.cham-ytchy.chy
    hea 082 f42v P.1 F chols ...sheey.qocho.taiin.shols-chol.chor.dain-
    hea 189 f93r P.19 U chols .hodaiin.shody-tchor.shol.s.sheoky-ychockhy
    heb 195 f95v2 P.5 F chols ......ar-daiin.ykaly.chals.shedaiin.olaiiny
    
      No obvious resemblance.
    
    heb 075 f39r P.5 U dchdy ..chees.aly.okalchem-dchdy.chdy.ykaiin=
    heb 079 f41r P.8 F dchdy ..chedy.chckhy.qokey.dchdy-qokedy.qokyl.cheked
    heb 196 f95v1 P.6 F dchdy ......=tshdal.qokshy.dchdy.shedy.dkshey.chefar
    
      No obvious resemblance.
      
    hea 042 f22v P.5 U okcheor ..-odaiin.ytaiin-dor.ykcheor.daii**=
    hea 088 f45v P.8 F okcheor ...dshy.otyol-ytchom-ykcheor.odal.sho.dy.pchom-

      No obvious resemblance.