Hacking at the Voynich manuscript - Side notes 074 Revising the "U" (Stolfi) transcription Last edited on 2025-07-30 04:09:20 by stolfi INTRO This note was about an attempt at preparing a new version of the EVMT ("European Voynich Multi-Transctiption") file, to be version 25e1 (meaning 2.5 release 1). It would take off at the abandoned attempt to build version 20e1 described in Notes/072 and Notes/073. However, Rene Zandbergen has created a much improved version (IVTFF) of the transcription format and software. So this note is now redirected at preparing my own transcriptions (";U" in the interlinear) in a format compatible with that new standard. It requires, among other things, changing the format of all locators, and mapping the encoding I have been used to his chosen one. One reason for doing this effort is investigating the theory that one-leg gallows with hooks are distinct from those without hooks. LINKS ln -s ../.. work ln -s work/Notes ln -s work/ivtff_frac_word_counts.py ln -s work/ivtff_format.py ln -s work/process_frac_words.py OLD EVT INPUT FILES We start from the file Notes/072/text20e1-03.evt which is the version of EVMT 20e1 that was prepares in 2005-02-03 but never released. It probably needs extensive checking. chmod a-w Notes/072/text20e1-03.evt ln -s Notes/072/text20e1-03.evt We must massage the ".evt" file a bit before automatic conversion. For one thing, it has too many groups like "(cht)", and it is not clear what they should really be like. probably not consistent. The plan is to inspect the groups /[(][ci][ktfpzw]*h+[)]/ and /[(]sh+[)]/. If they ARE ligated in the usual way, just remove the parens, as in the EVMT 16e6 format, since they will be converted to /CI[KTFPZW]*H*h/ and /SH*h/ by the automatic EVA-XEVE conversion scripts. If they are NOT ligated, convert them to symbolic weirdos "*{&...}" where "..." is the XEVA code for the ligated combination cp text20e1-03.evt text20e1-30.evt Inspected and edited text20e1-30.evt by hand, replacing weirdos and weird ligatures by codes like &{OPr} or &{310}. !! NOTE: my weirdo codes differ from Rene's. Mapping will be necessary at some point. EXTRACTING MY TRANSCRIPTION Removing from "text20e1-30.evt" all transcriptions except ";U": cat text20e1-30.evt \ | remove_non_stolfi_transcriptions.gawk \ > text25e1-50.evt Saved the version "text20e1-30.evt" just in case: now=2025-05-31-095000 mkdir SAVE/${now} chmod a-w text20e1-30.evt mv -vi text20e1-30.evt SAVE/${now}/ (It seems that a tentative IVTFF version "text25e1-01.xev" was created in 20205-05 from text20e1-30.evt but then editing continued in "text25e1-51.evt". Saved "text25e1-01.xev" just in case.) Proceeded to manual edit of "text25e1-51.evt". Split off "text25e1-weirdos.txt" - weirdo code definitions. "text25e1-intro.txt" - the introductory comments. "../076/starps-U.eva" - the Starred Parags section (SPS, f103r to f11r line 30). Also extracted from "text20e1-30.evt" the versions by Takeshi (';H') and Rene (';Z') of the SPS: "../076/starps-H.eva" "../076/starps-Z.eva" The files "../076/starps-U.eva" and "../076/starps-Z.eva" were uniformized until the TEXT of the former was a subset of the latter. But the #-COMMENTS were not. Then "../076/starps-U.eva" was saved to "SAVE/2025-07-15-200047/starps-U.eva" and "../076/starps-Z.eva" renamed "../076/starps-U.evt" with ";U" codes replacing the ";Z" codes. See "../076/Note-076.txt" ln -s ../076/starps-U.evt See also: "../073/desc25e1-51.txt" - per-page verbal descriptions. Continued editing "text25e1-51.evt" Replaced all weirdo codes notations "&\{{NNN}\}" by "&{NNN};". Replaced all EVA font notations in comments, from "<{CHAR}>" to "@{CHAR}", and from "<{TEXT}>" to "@'{TEXT}'" Replaced all inline comments "{...}" by "". OBTAINING THE LAST IVTFF FILE Downloaded the latest version of Rene's "reference" transcription, "RF1b-e.txt", from "https://voynich.nu/data/RF1b-e.txt". Renamed it "text25rz-40.txt", made it readonly. Needed to fix the page of the rosette from "fRos" to "f85v2" otherwise all my scripts would break. So made a copy cp text25rz-40.txt text25rz-41.txt chmod u+w text25rz-41.txt Edited replacing "fRos" by "f85v2". Also replaced page "f101v" by "f101v2" for consistency. MAPPING EVMT LOCATORS TO IVTFF LOCATORS Rene sent a table "rene-loci-table.xlsx" with the mapping from old EVMT 1.6e6 locators to his new IVTFF locators. Extracted it as "rene-loci-table-orig.csv" then edited by hand to "loci-evmp16e6-ivtff.tbl". Applied the page number fixes above. To check, extracted all locators from the two files: ./match_locators_25e1_25ez.sh There were lots of old locators that did not match, concentrated on several pages - such as the "nine rosettes" page (f85v2) and the "ages of man" page (f67v2). Rene's table mapping old locators to new ones was based on the real 1.6e6 version of the interlinear. However "text25e1-51.evt" was based on the aborted update of that interlinear (Notes/072). That upgrade implied adding many units not previously transcribed, and renaming dozens of units. Edited the files and re-ran the script above until all old locators and all new locators were in 1:1 correspondence. lines words bytes file ------- ------- --------- ------------ 3513 3513 45249 .locators_old 3513 7026 80634 .locators_old_mapped_to_new 0 0 0 .old_unmapped 5388 5388 54207 .locators_new 5388 10776 123832 .locators_new_mapped_to_old 0 0 0 .new_unmapped Wrote a script that reads {stdin}, maps all old locators (with two '.') on data lines to new locators (with a single '.') and writes to stdout. Does not touch locators in comments; those will have to be fixed by hand, since sometimes they shoudl remain old-style. cat text25e1-51.evt | map_locators.sh > text25e1-52.evt now="`yyyy-mm-dd-hhmmss`" mkdir -p SAVE/${now} chmod a-w text25e1-51.evt mv -vi text25e1-51.evt SAVE/${now}/ ln -s ../073/desc25e1-52.txt CHANGING THE PARAGRAPH MARKERS In the old EVMT format, parags were marked only by "=" at the end of the tail line and "-" at the end of every other lines. As preparation to fix the paragraph markers, Changing temporarily the line-final "-" by <|>. Prefixing every text line after a <|> with <:>. Replacing every final "=" with "<$>". Looking for lines that do not end with "<$>" or "<|>" and fixing them: cat text25e1-52.evt \ | sed \ -e 's:^[ ]*\([#]\|[@][@]\|$\).*::g' \ -e 's:^.*::g' \ -e 's:^.*<[|$]> *$::g' \ | egrep --color=auto -nH --null -e '.' \ | sed -e 's:[(]standard input[)]:text25e1-52.evt:g' \ > .bugs Adding start-of-parag markers: ./add_parag_markers.gawk text25e1-52.evt \ > .tmp prdiff -Bb text25e1-52.evt .tmp | head -n 200 > .diff # now="`yyyy-mm-dd-hhmmss`" now="2025-07-15-171634" mkdir -p SAVE/${now} mv -vi text25e1-52.evt SAVE/${now}/ chmod a-w SAVE/${now}/text25e1-52.evt mv -vi add_parag_markers.gawk SAVE/${now}/ mv .tmp text25e1-53.evt SIMPLIFYING THE NAMES mv -vi text25e1-53.evt text25e1.evt mv -vi desc25e1-53.evt desc25e1.evt mv -vi star25e1-53.evt star25e1.evt REPLACING IMPLICIT LIGATURES BY EXPLICIT ONES We want to replace implicit ligatures like @'ycthhey' by explicit ones like @'y{CTHh}ey'. First let's save the current state of things: now="`yyyy-mm-dd-hhmmss`"; echo "now = ${now}" # now=2025-07-17-140500 mkdir -p SAVE/${now} cp -av \ text25e1.evt star25e1.evt \ text25e1-weirdos.txt text25e1-intro.txt \ SAVE/${now} cp -av ../073/desc25e1.txt SAVE/${now} chmod a-w SAVE/${now}/*.{evt,txt} Piped all four files through "convert_ligatures.sed" >>> STOPPED HERE 2025-05-27 TO DO ??? REPLACE <|> <:> BY [=»«] LISTING ALL LOCATORS WITH ONE-LEG GALLOWS Making a list of all line locators in the full EVT that contain [fp] gallows (that may need to be converted to [zw]): ./list_one_gallows_loci.sh text20e1-30.evt text25e1-51.evt 0 CONVERTING MORE First stab at converting the old EVMT to Rene's IVTFF format ./convert_evmt_20e1_to_evmt_25e1.sh After many ad-hoc tweaks in the input "text20e1-50.evt", we got the above script to process without errors. * Each /glyph/ occurence in the text is defined as maximal set of strokes that are (or presumably were intended to be) connected by contact or ligatures. In the XEVT format, each glyph is encoded as a pair of parens '()' enclosing a string of one or more XEVA /simple glyph/ codes, like "(v)" or "(Sh)" or "(AKPIHO)" * The XEVA simple glyph codes include the basic lowercase EVA letters [adefik-ty] and the two combinations @{Ch}, @{Sh}, and the platform gallows @{CKh}, @{CTh}, @{CFh}, and @{CPh}. They also include new lowercase codes @b, @g, @u, @j (@e, @a, or @i with plumes or tails), @v (the caret), @x (the picnic table), and @z and @w (versions of @f and @p with an @e-hook at the end of the horizontal arm); thus completing all lowercase letters. They also include @c and capital letters [ACHIOQRSY] denoting the same simple glyphs as the lower case versions with a ligature line added at top right; @E which is an @e that can connect to the bottom of the next glyph; and [KTFPZW] which are the gallows [ktfpzw] with a stroke forming the floor of the platform. * The XEVA simple codes also include weirdo codes like &NNN; where NNN is a 3-digit number. The previous EVMT notation like "s{&123}" or "*{&o'}" is replaced by codes "&123" in XEVA that function as simple letters. Thus the line " ol*{&ol}.ofaiin=" from EVMT 16e6 wluld become " (o)(ll)(&312).(o)(f)(a)(i)(i)(n)=" in the new EVMT file. LISTING WEIRDO USES AND SEFINITIONS The weirdos and non-basic glyphs are encoded as "&{...}". Listing weirdo uses in both files and definitions in the text file: ./list_weirdo_uses_and_defs.sh TO DO