Hacking at the Voynich manuscript - Side notes 074 Revising the "U" (Stolfi) transcription (archived steps) Last edited on 2026-01-17 23:01:28 by stolfi INTRO This note was about an attempt at preparing a new version of the EVMT ("European Voynich Multi-Transctiption") file, to be version 25e1 (meaning 2.5 release 1). It would take off at the abandoned attempt to build version 20e1 described in Notes/672 and Notes/073. However, Rene Zandbergen has created a much improved version (IVTFF) of the transcription format and software. So this note is now redirected at preparing my own transcriptions (";U" in the interlinear) in a format compatible with that new standard. It requires, among other things, changing the format of all locators, and mapping the encoding I have been used to his chosen one. One reason for doing this effort is investigating the theory that one-leg gallows with hooks are distinct from those without hooks. OLD EVT INPUT FILES We start from the file Notes/672/text20e1-03.evt which is the version of EVMT 20e1 that was prepares in 2005-02-03 but never released. It probably needs extensive checking. chmod a-w Notes/672/text20e1-03.evt ln -s Notes/672/text20e1-03.evt We must massage the ".evt" file a bit before automatic conversion. For one thing, it has too many groups like "(cht)", and it is not clear what they should really be like. probably not consistent. The plan is to inspect the groups /[(][ci][ktfpzw]*h+[)]/ and /[(]sh+[)]/. If they ARE ligated in the usual way, just remove the parens, as in the EVMT 16e6 format, since they will be converted to /CI[KTFPZW]*H*h/ and /SH*h/ by the automatic EVA-XEVE conversion scripts. If they are NOT ligated, convert them to symbolic weirdos "*{&...}" where "..." is the XEVA code for the ligated combination cp text20e1-03.evt text20e1-30.evt Inspected and edited text20e1-30.evt by hand, replacing weirdos and weird ligatures by codes like &{OPr} or &{310}. !! NOTE: my weirdo codes differ from Rene's. Mapping will be necessary at some point. EXTRACTING MY TRANSCRIPTION Removing from "text20e1-30.evt" all transcriptions except ";U": cat text20e1-30.evt \ | remove_non_stolfi_transcriptions.gawk \ > text25e1-50.evt Saved the version "text20e1-30.evt" just in case: now=2025-05-31-095000 mkdir SAVE/${now} chmod a-w text20e1-30.evt mv -vi text20e1-30.evt SAVE/${now}/ (It seems that a tentative IVTFF version "text25e1-01.xev" was created in 20205-05 from text20e1-30.evt but then editing continued in "text25e1-51.evt". Saved "text25e1-01.xev" just in case.) Proceeded to manual edit of "text25e1-51.evt". Split off "text25e1-weirdos.txt" - weirdo code definitions. "text25e1-intro.txt" - the introductory comments. "../076/starps-U.eva" - the Starred Parags section (SPS, f103r to f11r line 30). Also extracted from "text20e1-30.evt" the versions by Takeshi (';H') and Rene (';Z') of the SPS: "../076/starps-H.eva" "../076/starps-Z.eva" The files "../076/starps-U.eva" and "../076/starps-Z.eva" were uniformized until the TEXT of the former was a subset of the latter. But the #-COMMENTS were not. Then "../076/starps-U.eva" was saved to "SAVE/2025-07-15-200047/starps-U.eva" and "../076/starps-Z.eva" renamed "../076/starps-U.evt" with ";U" codes replacing the ";Z" codes. See "../076/Note-076.txt" ln -s ../076/starps-U.evt See also: "../073/desc25e1-51.txt" - per-page verbal descriptions. Continued editing "text25e1-51.evt" Replaced all weirdo codes notations "&\{{NNN}\}" by "&{NNN};". Replaced all EVA font notations in comments, from "<{CHAR}>" to "@{CHAR}", and from "<{TEXT}>" to "@'{TEXT}'" Replaced all inline comments "{...}" by "". OBTAINING THE LAST IVTFF FILE Downloaded the latest version of Rene's "reference" transcription, "RF1b-e.txt", from "https://voynich.nu/data/RF1b-e.txt". Renamed it "text25rz-40.txt", made it readonly. Needed to fix the page of the rosette from "fRos" to "f85v2" otherwise all my scripts would break. So made a copy cp text25rz-40.txt text25rz-41.txt chmod u+w text25rz-41.txt Edited replacing "fRos" by "f85v2". Also replaced page "f101v" by "f101v2" for consistency. Also split off the @'okeeey.qokeeey..okeey;.okeey' from line as a separate line , assuming it is a "title", for compatibility with "star25e1.evt". Also removed " or?r.m" which seems to be a duplicate of " osaram". Renamed it again to "full25rz.ivt" mv -vi text25rz-41.txt full25rz.ivt CHANGING THE PARAGRAPH MARKERS In the old EVMT format, parags were marked only by "=" at the end of the tail line and "-" at the end of every other lines. As preparation to fix the paragraph markers, Changing temporarily the line-final "-" by <|>. Prefixing every text line after a <|> with <:>. Replacing every final "=" with "<$>". Looking for lines that do not end with "<$>" or "<|>" and fixing them: cat text25e1-52.evt \ | sed \ -e 's:^[ ]*\([#]\|[@][@]\|$\).*::g' \ -e 's:^.*::g' \ -e 's:^.*<[|$]> *$::g' \ | egrep --color=auto -nH --null -e '.' \ | sed -e 's:[(]standard input[)]:text25e1-52.evt:g' \ > .bugs Adding start-of-parag markers: ./add_parag_markers.gawk text25e1-52.evt \ > .tmp prdiff -Bb text25e1-52.evt .tmp | head -n 200 > .diff # now="`yyyy-mm-dd-hhmmss`" now="2025-07-15-171634" mkdir -p SAVE/${now} mv -vi text25e1-52.evt SAVE/${now}/ chmod a-w SAVE/${now}/text25e1-52.evt mv -vi add_parag_markers.gawk SAVE/${now}/ mv .tmp text25e1-53.evt SIMPLIFYING THE NAMES mv -vi text25e1-53.evt text25e1.ivt mv -vi star25e1-53.evt star25e1.ivt mv -vi desc25e1-53.txt desc25e1.txt