Hacking at the Voynich manuscript - Side notes 076 Getting a clean transcription of the Recipes section Last edited on 2026-01-20 21:26:51 by stolfi INTRODUCTION This note is about creating a transcription of the Recipes or Starred Parags (SPS) section of the VMS, with re-checked text and carefully marked paragraph breaks. For this note, the SPS is defined as all the prose text between page f103r line 01 and page f116r line 30. That is all of quire 20, minus the last 19 lines of page f116r (which do not seem to consist of "recipes" like all previous lines) and page f116v (which has only some extraneous writing and a few words of Voynichese). Note that the central bifolio of the quire (f109+f110) was removed after the folios were numbered, probably after the book was bound. Presumably it had four pages (f109r, f109v, f110r, f110v). SETUP ln -s ../work ln -s work/error_funcs.gawk ln -s work/compute_freqs.gawk ln -s work/read_table.gawk ln -s work/error_funcs.py ln -s work/ivtff_format.py ln -s ${HOME}/ttf TERMINOLOGY See "report/report_076.html" for the glossary of terms used here, including "parag head" and "tail", "starlet", "long" and "short line", etc. PREPARING THE TRANSCRIPTION FILE The main transcription file for this section is "starps-U.eva", that was eventually moved to the main transcription Note ("../074/"). It contains only with the SPS part of the VMS, from page f103r line 1 to page f116r line 30. All lines have transcriber code ";U". The format of that file is a variant of the EVT or IVTFF formats. In particular: * The locus indicators is <{PAGE}.{LINE};{TRANS}> where {PAGE} is "f103r", "f103v", ... "f116r", and {LINE} is the line number, starting from 1. * The text of each line is prefixed with one of [=»]. The '=' means that the line starts at or before the left rail, and '»' means that it starts after it. This comes after the "<%>" or "<%>" marks, if any. * Each line is suffixed with one of [=«]. The "=" means that the line ends at or past the right rail, and "«" means that it ends before that rail. This comes before the "<$>" mark, if any. * Parags, as found in this note, are marked with a "<%>" before the text of the head line and with "<$>" after the text of the tail line. * An inline comment "" was inserted at the start of each inferred parag start line to mean that star "S{nn}" is the starlet assigned to this parag head; or "" to mean that this parag has no assigned starlet. * For every linegap that was wider than the other linegaps nearby, at least on part of the line, the marker "" was inserted at he start of the next line, and "" at the end of the previous line. * Word breaks are indicated with "." (definite word space), "," (dubious word space), or "-" (break due to intruding drawing or fold). A word break never occure at the start of a text line or imediately after another word break. * There is no circular text in this section, so the text never ends with a word break character. * There are no drawing intrusions or folds interrupting the text in the SPS, so the inline comments "" and "" are not used. * Each inline comment "" other than the above, was either deleted if not essential, or replaced by a #-comment before the line. * The order of these annotations at line start is: "<%>", "" or "", "", and [»=]. * The order of these annotations at line end is: [=»], "", and "<$>". STAR PROPERTY TABLE Created a star property table "star-props.txt". See comments in the file for the format. Used it to insert star star numbers "" and " in parag head lines PARAGRAPH STATISTICS compute_parag_stats.py starps-U.ivt > out/parag-stats-U.txt PARAG BREAKING RULES Revised the text and all parag breaks. See "report/report_076.html" for nomenclature ("parag", "head", "tail", "starlet", "puff", "right rail", "left rail", "short line", "long line", "linegap", etc.) and for the parag breaking rules. The title is a right-justified line after a parag that ends with a full line. It had been assumed to be the tail of the previous parag that the Scribe skipped and then inserted in that non-standard position. However, the first line of the next parag bends down to avoid that title. Thus, if that conjecture is true, the Scribe must have realized the omission after writing the firat 4 lines of . I have now re-interpreted as a title. It is possible that other section headers were not recognized as such and were joined with adjacent parags. After commenting out the subsection titles on both files, I counted again the number of words and parags, and basic statistics (min, max, average, and standard deviation) of the number of words per paragraph (nwp): ./count_recipes_and_words.sh starps.ivt !!! OLD: !!! statistic ! bencao ! starps ------------+---------+-------- parags | 354 | 330 words | 10874 | 10457 min nwp | 7 | 11 max nwp | 76 | 72 avg nwp | 30.8 | 31.7 dev nwp | 8.5 | 11.2 !!! !!! CREATING THE PARAG SPLITTING REPORT Creating the file "report/report.html" with a description of how the parags were chosen. First let's create the raw page images: ln -s ../../../FromBeinecke mkdir -p report/images/raw for bf in `cat beinecke_SPS_images.txt` ; do fnum="${bf/*-/}" convert FromBeinecke/${bf}.jpg -resize 'x800' report/images/raw/${fnum}.png done eom report/images/raw/*.png IMAGES ??? Make the images showing parag splitting. ??? Remove comments that are in the descriptio file. ??? Write script to automatically assihm parag breaks.