Hacking at the Voynich manuscript - Side notes 076 Getting a clean transcription of the Recipes section Last edited on 2025-07-17 17:17:20 by stolfi INTRODUCTION This note is about creating a transcription of the Recipes or Starred Parags (SPS) section of the VMS, with re-checked text and carefully marked paragraph breaks. For this note, the SPS is defined as all the prose text between page f103r line 01 and page f116r line 30. That is all of quire 20, minus the last 19 lines of page f116r (which do not seem to consist of "recipes" like all previous lines) and page f116v (which has only some extraneous writing and a few words of Voynichese). Note that the central bifolio of the quire (f109+f110) was removed after the folios were numbered, probably after the book was bound. Presumably it had four pages (f109r, f109v, f110r, f110v). SETUP ln -s ../work ln -s work/error_funcs.gawk ln -s work/compute-freqs.sh ln -s work/insert_blank_lines.gawk ln -s work/read_table.gawk ln -s work/error_funcs.py ln -s work/ivtff_align.py ln -s work/ivtff_format.py ln -s work/compare_ivtff_files.py ln -s ../074/map_locators.sh ln -s ../074/loci-evmt16e6-ivtff.tbl ln -s ${HOME}/lib/read_table.gawk ln -s ${HOME}/ttf PREPARING THE TRANSCRIPTION FILE Created "starps-H.eva" with the SPS part of the VMS, with only the lines from the Takeshi Takahashi transcription (";H>") from page f103r to page f116r line 30. Created a similar file "starps-U.eva" with the SPS part of the VMS with only the lines from the Stolfi transcription (";U"). Created a similar file "starps-Z.eva" with Rene Zandbergen's transcription (";Z>"). Checked and revised it thoroughly by reference to the Beinecke 2014 online scans as of 2025-06. (Thus it is no longer "Rene's"!) See "report/report.html" for the glossary of terms used here, including "parag head" and "tail", "starlet", "long" and "short line", etc. The format of these transcription files started somewhat similar to that of the EVMT interlinear. In particular: * The locus indicators were <{PAGE}.{LINE};{TRANS}> where {PAGE} is "f103r", "f103v", ... "f116r", and {LINE} is the line number, starting from 01. The numbering skipped the "titles". It was then converted to the new IVTFF format, mostly. IN particular: * Assumed parags were marked with a "<%>" before the text of the head line and with "<$>" after the text of the tail. * After the <%> of a parag head, was inserted to mean that star "S{nn}" is the starlet assigned to this parag head; or "" to mean that this parag has no assigned starlet. * The text of each line is prefixed with one of [=»]. The '=' means that the line starts at or before the left rail, and '»' means that it starts after it. This comes after the "<%>" or "<%>" marks, if any. * Each line is suffixed with one of [=«]. The "=" means that the line ends at or past the right rail, and "«" means that it ends before that rail. This comes before the "<$>" mark, if any. For this conversion, the following line numbers had to be changed: f103v: 28a,29-36,36a,37-44 to 29-46. f108v: 24a,25-51 to 25-52. f112v: 44a,45-47 to 45-48. f115r: 36a,37-44 to 37-45. Also, the existing transcriptions of the SPS have four "titles", short lines with anomalous justification: =sairy.ore.daiindy.ytam= =otoiis.chedaiin.otair.otaly= =olchar.olchedy.lshy.otedy= =ytain.olkaiin.ykar.chdar.alkam= The "title" was assumed to be part of the following line that the Scribe skipped at first and then added above that line. In the conversion, it was appended to line . The other three "titles" were kept as such. They must be excluded by special tests when analyzing paragraphs. UPDATING THE LOCATORS A major step in converting the various files to the IVTFF format was replacing the old-style EVMT 1.6e6 locators <{PAGE}.{UNIT}.{OSEQ};{TRANS}> by the new-style IVTFF locators <{PAGE}.{NSEQ};{TRANS}> now=`yyyy-mm-dd-hhmmss` mkdir -p SAVE/${now} for ifile in starps-{U,H,Z} ; do chmod a-w ${ifile}.eva mv -vi ${ifile}.eva SAVE/${now}/ done The "-Z" version had already been upgraded, so: cp -av SAVE/${now}/starps-Z.eva ./ As for the other two: for ifile in starps-{U,H} ; do cat SAVE/${now}/${ifile}.eva | map_locators.sh > ${ifile}.eva done STAR PROPERTY TABLE Created a star property table "star-pros.txt". See comments in the file for the format. Initial stab at inserting star numbers "" and " in parag head lines: ./replace_star_ids.sh < starps-Z.eva > .temp-Z.eva Edited the file "starps-Z.eva" checking and reassigning all parag breaks. make -f parag-stats-Z.make See output in "out/ COMPARING VERSIONS Wrote a python3 program "compare_ivtff_files.py" to compare two files, line by line, using an optimal alignment algorithm: make -f compare_ivtff_files.make tag0="Z" for tag1 in U H ; do file0="starps-Z.eva" file1="starps-${tag1}.eva" ofile=".cmp-${tag0}${tag1}.edf" ./compare_ivtff_files.py ${file0} ${file1} > ${ofile} done First run: read 2414 lines from file 0 = starps-Z.eva read 1313 lines from file 1 = starps-U.eva there were 587 loci from file0 missing in file1 read 2414 lines from file 0 = starps-Z.eva read 1655 lines from file 1 = starps-H.eva there were 1 loci from file0 missing in file1 Edited the files starps-Z.eva and starps-U.eva until the last one became a subset of the former (apart from comments). Final run: # read 2421 lines ( 1064 data, 23 pages) from file 0 = starps-Z.eva # read 921 lines ( 476 data, 23 pages) from file 1 = starps-U.eva # 588 loci from file0 missing in file1 # 476 perfectly matching line pairs # 0 imperfectly matching line pairs Saving the current files: now="`yyyy-mm-dd-hhmmss`"; echo "now = ${now}" mkdir -p SAVE/${now} mv -vi starps-U.eva SAVE/${now} cp -av starps-Z.eva SAVE/${now}/starps-Z-actually-U.eva chmod a-w SAVE/${now}/*.eva now = 2025-07-15-200047 renamed 'starps-U.eva' -> 'SAVE/2025-07-15-200047/starps-U.eva' 'starps-Z.eva' -> 'SAVE/2025-07-15-200047/starps-Z-actually-U.eva' Renaming "starps-Z.eva" as "starps-U.evt" to reflect the true culprit and uniformize with note 074. Renaming "starps-H.eva" to "starps-H.evt" for the same reason: mv -vi starps-Z.eva starps-U.evt mv -vi starps-H.eva starps-H.evt chmod u+w starps-{U,H}.evt Replacing ";Z" by ";U" with emacs. Moving starps-U.evt to note 074 since further editing will take place there: mv -vi starps-U.evt ../074/star25e1.evt ln -s ../074/star25e1.evt starps-U.evt TO DO ??? Move the '#'-comments to the page description file whenever possible. COUNTING LINES AND PARAGS join starps-H.evt, starps-U.evt lines parags stars page H Z H Z Z f103r 54 54 19 21 19 f103v 46 46 14 16 14 f104r 45 45 13 13 13 f104v 44 44 13 13 13 f105r 35 35 10 16 10 f105v 38 38 10 20 10 f106r 47 47 15 17 16 f106v 47 47 15 16 14 f107r 51 51 15 16 15 f107v 49 49 16 16 15 f108r 50 50 16 17 16 f108v 52 52 17 16 16 f111r 54 54 17 19 17 f111v 51 51 18 20 19 f112r 45 45 12 15 12 f112v 47 47 14 17 13 f113r 51 51 17 19 16 f113v 49 49 15 15 15 f114r 44 44 13 14 13 f114v 41 41 12 13 12 f115r 45 45 13 14 13 f115v 45 45 13 14 13 f116r 30 30 10 11 10 PARAG BREAKING RULES ??? Revised the text (STILL DOING) and all parag breaks. See "report/report.html" for nomenclature ("parag", "head", "tail", "starlet", "puff", "right rail", "left rail", "short line", "long line", "linegap", etc.) and for the parag breaking rules. The title is a right-justified line after a parag that ends with a full line. It had been assumed to be the tail of the previous parag that the Scribe skipped and then inserted in that non-standard position. However, the first line of the next parag bends down to avoid that title. Thus, if that conjecture is true, the Scribe must have realized the omission after writing the firat 4 lines of . I have now re-interpreted as a title. It is possible that other section headers were not recognized as such and were joined with adjacent parags. After commenting out the subsection titles on both files, I counted again the number of words and parags, and basic statistics (min, max, average, and standard deviation) of the number of words per paragraph (nwp): ./count_recipes_and_words.sh starps.evt !!! OLD: !!! statistic ! bencao ! starps ------------+---------+-------- parags | 354 | 330 words | 10874 | 10457 min nwp | 7 | 11 max nwp | 76 | 72 avg nwp | 30.8 | 31.7 dev nwp | 8.5 | 11.2 !!! !!! CREATING THE PARAG SPLITTING REPORT Creating the file "report/report.html" with a description of how the parags were chosen. First let's create the raw page images: ln -s ../../../FromBeinecke mkdir -p report/images/raw for bf in `cat beinecke_SPS_images.txt` ; do fnum="${bf/*-/}" convert FromBeinecke/${bf}.jpg -resize 'x800' report/images/raw/${fnum}.png done eom report/images/raw/*.png IMAGES ??? Make the images showing parag splitting. ??? Remove comments that are in the descriptio file. ??? Write script to automatically assihm parag breaks.