The source text used to prepare this map is Gabriel Landini's interlinear transcription of the VMs, version 1.6.
The file was slightly edited to simplify my processing. The main change was replacing the "anonymous" text locations by specific locations (usually ".P", but sometimes ".P1", "P2", or "R".)
From this file, I manually extracted two sub-files: the labels and the the paragraphic text (or parags for short).
For the puposes of this map, a label is an isolated short bit of Voynich text (a couple of words at most). This definition includes the "day names" from the zodiac diagrams, the labels of stars in the "astro" section, and the labels of plants in the "pharma" section, and so forth. It includes also the "words" in the left column of page f66r.
Unfortunately, the interlinear file contains only a small subset of all the labels in the VMs. The bulk comes from these pages:
location section L & H text type comments --------- ------- ----- --------------- -------------------- f66r ? B ? words left col of a table f67r1 astro ? ? labels on sectors f67v2 cosmo ? ? labels on diagram f68r2 astro ? ? labels on stars f68v3 astro ? ? labels on diagram f68v2 astro ? ? labels radial f70v2 zodiac ? ? labels on stars f70v1 zodiac ? ? labels on stars f71r zodiac ? ? labels on stars f71v zodiac ? ? labels on stars f72r1 zodiac ? ? labels on stars f72r2 zodiac ? ? labels on stars f88r pharma A 4 labels under plants f88v pharma A 4 labels under plants f89r1 pharma A ? labels under plants f89r2 pharma A ? labels under plants f89v2 pharma A ? labels under plants f89v1 pharma A ? labels under plants f100r pharma A? 4? labels under plants f100v pharma A? 4? labels under plants f101v1 pharma A? 4? labels under plants
The parags sub-file consists of all multi-line text blocks that seem to be continuous paragraphs of "prose" text, broken into lines in the usual way. In particular, it includes the right-hand text in page f66r.
(The balance of the interlinear file comprises text in circles, isolated lines, titles, and the columns of isolated letters in f49v and f66r.)
As part of the ECC recoding, all the inter-word spaces (denoted by "." in the interlinear file) were discarded. Line breaks were preserved, however.
After the ECC recoding, all the transcriptions for each line were mechanically combined into a "consensus" transcription, producing the encoded labels and the the encoded parags files, respectively.
The labels file was then sorted and cleaned, discarding duplicates and labels that had conflicting transcriptions or unreadable characters. Any label with an embedded "-" (signifying an intruding star, plant stem, or other element of a drawing) was entered twice: once as single label, with the "-" removed, and once as two distinct labels, as if the "-" was a line break. After these transformations, there remained 231 distinct labels, ranging between 2 and 18 ECC characters.
The encoded parags file was not subjected to any further cleanup. In particular, embedded "-" codes were left there. It comprised 3918 lines and 164571 ECC characters of Voynich text.
Each encoded label was then searched in the encoded parags file. The position of each match was recorded into an index file, both as a (line-num, char-offset) pair and as a total count of Voynich characters since the beginning of the encoded parags file. The location where the label was first defined was also listed there, in curly braces.
This index was then used to build the label reference maps and tables.
Another version of the index file, with the ECC labels replaced by their FSG originals, was used to produce a printout of selected pages showing the matches and near matches for selected labels.