Hacking at the Voynich manuscript - Side notes
076 Getting a clean transcription of the Recipes section

Last edited on 2026-01-20 21:26:51 by stolfi

INTRODUCTION

  This note is about creating a transcription of the Recipes or Starred
  Parags (SPS) section of the VMS, with re-checked text and carefully
  marked paragraph breaks.

  For this note, the SPS is defined as all the prose text between page
  f103r line 01 and page f116r line 30. That is all of quire 20, minus
  the last 19 lines of page f116r (which do not seem to consist of
  "recipes" like all previous lines) and page f116v (which has only some
  extraneous writing and a few words of Voynichese).
  
  Note that the central bifolio of the quire (f109+f110) was removed
  after the folios were numbered, probably after the book was bound.
  Presumably it had four pages (f109r, f109v, f110r, f110v).
    
SETUP

    ln -s ../work

    ln -s work/error_funcs.gawk
    ln -s work/compute_freqs.gawk

    ln -s work/read_table.gawk

    ln -s work/error_funcs.py
    ln -s work/ivtff_format.py

    ln -s ${HOME}/ttf

TERMINOLOGY
  
  See "report/report_076.html" for the glossary of terms used here,
  including "parag head" and "tail", "starlet", "long" and "short line",
  etc.

PREPARING THE TRANSCRIPTION FILE
  
  The main transcription file for this section is "starps-U.eva", that
  was eventually moved to the main transcription Note ("../074/"). It
  contains only with the SPS part of the VMS, from page f103r line 1 to
  page f116r line 30. All lines have transcriber code ";U".
  
  The format of that file is a variant of the EVT or IVTFF formats.
  In particular:
  
    * The locus indicators is <{PAGE}.{LINE};{TRANS}> where {PAGE} is
      "f103r", "f103v", ... "f116r", and {LINE} is the line number,
      starting from 1.
  
    * The text of each line is prefixed with one of [=»]. The '=' means
      that the line starts at or before the left rail, and '»' means
      that it starts after it. This comes after the "<%><!S{nn}>" or
      "<%><!NoS>" marks, if any.
    
    * Each line is suffixed with one of [=«]. The "=" means that the
      line ends at or past the right rail, and "«" means that it ends
      before that rail. This comes before the "<$>" mark, if any.
      
    * Parags, as found in this note, are marked with a "<%>" before the
      text of the head line and with "<$>" after the text of the tail line.
  
    * An inline comment "<!S{nn}>" was inserted at the start of each
      inferred parag start line to mean that
      star "S{nn}" is the starlet assigned to this parag head; or
      "<!NoS>" to mean that this parag has no assigned starlet.
      
    * For every linegap that was wider than the other linegaps nearby, at least on
      part of the line, the marker "<!WLP>" was inserted at he start of the next line,
      and "<!WLN>" at the end of the previous line.
      
    * Word breaks are indicated with "." (definite word space), ","
      (dubious word space), or "-" (break due to intruding drawing or fold).
      A word break never occure at the start of a text line or imediately 
      after another word break.
      
    * There is no circular text in this section, so the text never ends 
      with a word break character.
      
    * There are no drawing intrusions or folds interrupting the text in
      the SPS, so the inline comments "<!®>" and "<!¥>" are not used.
      
    * Each inline comment "<!...>" other than the above, was either deleted
      if not essential, or replaced by a #-comment before the line.
      
    * The order of these annotations at line start is: "<%>", "<!S{nn}>"
      or "<!NoS>", "<!WLP>", and [»=].
      
    * The order of these annotations at line end is: [=»], "<!WLN>", and "<$>".

STAR PROPERTY TABLE

  Created a star property table "star-props.txt". See comments in the
  file for the format.  Used it to insert star star numbers "<!S{nn}>"
  and "<!NoS> in parag head lines
  
PARAGRAPH STATISTICS

    compute_parag_stats.py starps-U.ivt > out/parag-stats-U.txt
 
PARAG BREAKING RULES

  Revised the text and all parag breaks.

  See "report/report_076.html" for nomenclature ("parag", "head", "tail",
  "starlet", "puff", "right rail", "left rail", "short line", "long
  line", "linegap", etc.) and for the parag breaking rules.

  The title <f114r.T1.34> is a right-justified line after a parag that
  ends with a full line. It had been assumed to be the tail of the
  previous parag that the Scribe skipped and then inserted in that
  non-standard position. However, the first line of the next parag
  <f114r.P1.35> bends down to avoid that title. Thus, if that conjecture
  is true, the Scribe must have realized the omission after writing the
  firat 4 lines of <f114r.P1.35>. I have now re-interpreted
  <f114r.T1.34> as a title.

  It is possible that other section headers were not recognized as such
  and were joined with adjacent parags.

  After commenting out the subsection titles on both files, I counted
  again the number of words and parags, and basic statistics (min, max,
  average, and standard deviation) of the number of words per paragraph
  (nwp):
  
    ./count_recipes_and_words.sh starps.ivt

    !!! OLD: !!!
    statistic   !  bencao !  starps
    ------------+---------+--------
    parags      |     354 |     330
    words       |   10874 |   10457
    min nwp     |       7 |      11
    max nwp     |      76 |      72
    avg nwp     |    30.8 |    31.7
    dev nwp     |     8.5 |    11.2
    !!! !!!

CREATING THE PARAG SPLITTING REPORT

  Creating the file "report/report.html" with a description of how the 
  parags were chosen.
  
  First let's create the raw page images:

    ln -s ../../../FromBeinecke
    mkdir -p report/images/raw
    for bf in `cat beinecke_SPS_images.txt` ; do
      fnum="${bf/*-/}"
      convert FromBeinecke/${bf}.jpg -resize 'x800' report/images/raw/${fnum}.png
    done
    eom report/images/raw/*.png

IMAGES

  ??? Make the images showing parag splitting.
  ??? Remove comments that are in the descriptio file.
  ??? Write script to automatically assihm parag breaks.