Locating parag breaks in the Starred Paragraphs Section

This webpage reports an attemp at locating the paragraph breaks in the Starred Paragraphs section (SPS) of the Voynich Manuscript (VMS), also known as the Recipes section.

Paragraphs and why they matter

The text of the SPS clearly consists of multiple paragraphs, each comprising one or more lines. Each line normally starts at the left margin of the text and continues up to the right margin, or to the end of the paragraph, whichever comes first. Considering the cost of parch and the difficulty of erasing it, we assume that the Author (the person who decided to create the book, devised the script, created or obtained the information, etc.) first wrote a draft the SPS on paper, and then recruited a Scribe to traspose the final draft to parch. We assume that bot the Author and the Scribe understood that the parags were significant, while the line breaks within each parag were not; so that the Scribe disregarded the line breaks in the draft, and itroduced new ones whenever he/she reached the right margin. (While the Scribe may have been the Author him/herself, there is evidence that they were different persons, and that the Scribe did not know much about the meaning of text, probably not even its language.)

Identifying the paragraphs as intended by the Author is necessary for certain analyses, such as comparing the frequencies of words at in the start, middle, and end of paragraphs, or trying to identify words that occur preferably at the end of sentences (like the verbs in German and other SOV languages). There are several features that are believed to indicate those breaks; but they often disagree or are absent, and there are several blocks of consecutive lines that, based on internal and external clues, can be presumed to consist of two or more parags run together. Locating the paragraph breaks within those blocks requires some more or less arbitary choices, guided by the available clues. This article describes a set of criteria for guiding these choices, and shows the results of applying them to each page of the SPS.

The transcription file

For this note, I will use a transcription file that was obtained from the old EVMT interlinear file, release 1.6e6, by extracting the transcriptions of the SPS by Takeshi Takahashi and Rene Zandbergen, comparing and merging them, and checking it, word for word, against the Beinecke library scans. The result can be considered a new transcription of my own.

I considered the SPS to comprise the lines of Voynichese text from the start of page f103r to line 30 of page f116r. The rest of page f116 -- from line 31 onwards, starting with {pchallarar} -- seems to be of a different nature; for one thing, there is an extra wide linegap above it, and there are no stars on its left margin.

I also excludes from the SPS, for the purpose of this note, a few lines in the above range that seem likely to be of a different nature, such as subsection titles. These are

It is known that folios f109 and f110 are missing in the physical VMS, so the SPS file covers only 22.6 pages, instead of the presumed 26.6 originally present. The surviving pages have ??? lines of &qt;prose&qt; (non-title) text, and average of ??? lines per page.

The number of words in that file is hard to tell because the pacing of words and glyphs is highly irregular, and modern ranscribers often disagree on whether a gap between two glyphs is a ord boundary or not, or are themselves uncertain about that. Some of hese disputed or uncertain spaces are maked in the computer ranscriptions with comma ',' instead of dot '.'. If one dsregards hem, the consensual ones determine ??? words, or ??? per line on verage. If we treat all of the commas as as word spaces too, we get

??? stains on f112 match those on f103

The number of glyphs in the SPS also a bit uncertain, since the transcribers sometimes disagree on how many glyphs they see in each ink squigle. Somewill read {ee} where others read {ch} or even {a}, and so on. Considering each of {ch}, {sh}, {ih}, {cth} {ith}, {cthh}, etc. as single glyphs, the trasncription file above has ??? glyphs, or an average of ??? per line. There is a table with the statistics of those counts per page.

Nomenclature and notation

Text in braces is either a math formula or Voynichese in EVA (as of the 1.6e6 version of ythe EVT interlinear).

General terms

While the left and right rails are generally smooth curves, they occasionally have abrupt discontinuities. For example, on page f112r, the right rail is indented by ~3 cm in the span from line 2 to line 22, with abrupt transitions at both ends. In that range, it wanders more than usual and it may be uncertain whether a line ends on it or before it. ??? Left rail???

Stars in the SPS

The left margin of every page of the SPS contains a column of stars. It is believed that there should be one star for each parag of the SSP, aligned vertically with the parag head line, like the bullet in an item of an itemized list. However the reality is somewhat far from this ideal. A star that is presumably associated with an obvious parag is often located at some distance, up or down, from the head, or is missing entirely. Part of the task of identifying the paragraph breaks is to assign each star to exactly one parag, and deciding which parags will be left without stars.

The stars look basically the same as those in the Cosmo and Zodiac section. Each star is typically ~6 mm across. Its outline is drawn with the same ink as the text, and consists of 6 to 9 rays, typically ~1.5 mmm wide at the base and ~2 mm long, with two straight or slightly curved sizes, that form a sharp or slightly rounded point. The bases of the rays define a round body ~2mm across. While the finished outline is a single continuous line, it may be drawn in two or more separate strokes.

A few stars are clear -- just outlined, not painted. All the others are partially or totally painted with one of two colors: yel, a partly transparent watercolor-like golden yellow paint (apparently the same paint/ink used on the hair of the nymphs of the Zodiac section), or red, an opaque tempera-like dark red paint (presumably the same used for the lips of nymphs in the Zodiac, flowers in the Herbal section, etc.)

The following terms are used when describing stars:

The unit for the vpos is the spacing between baselines. Thus &qt;vpos +1.5&qt; means that that the starlet is aligned with the center of the gap between the baselines of the two text lines before its assigned line, and &qt;vpos -0.2&qt; means that the starlet is located about 1/5 of the way between the baselines of its assigned line and of the next text line.

The cores of stars that are painted red is usually invisibe, while that of those painted yel is normally visible through the paint.

The two colors may have been applied at different times by different people (the Light Painter and the Dark Painter, respectively), who may have had different levels of knowledge about the VMS. Thus it is possible that the yel color on stars has some information, for instance about parag breaks, while the red color has not.

Most stars in the SPS have a tail, which is a thin curvy line extending down from the star's outline, usually from one of the rays. In this case the ray is often sharper, longer, and curved. (This indicates that the tail was drawn by the same Scribe who drew the stars.) On some stars the ray is extended to most of the length of the tail, thus creating a fat tail. The fat tails may have some information too. For instance, a fat tail may signify that the star is associated with two parag heads that are too close together to receive individual stars.

The number of rays may also carry information.

Parsing the SPS into parags

Parag properties and perfect parags

My defintion of &qt;perfect parag&qt; for the present is a set of one or more consecutive left-justified lines that satisfy all of the following criteria:

Note that P4 does not *require* the existence of puffs in the head line of a perfect parag. It only prohibits them in the other lines.

Two other conditions that are relevant for parag splitting are:

The first line of surviving SPS pages (except f107v, see below) usually has one or more puffs. On those pages, Rule P4 effectively requires that a perfect parag be entirely contained within one page.

The alignment of starlets and parag heads are rarely precise; the starlets often are displaced up or down from the head's midline and may lie closer to the midlines of adjacent lines, or even beyond them. Therefore, when assigning starlets to text lines and checking P5 and P6, I chose to allow for offsets (vpos) of plus or minus ??? line spacings between the candidate starlet and the head of the candidate parag, provided that in the end no two starlets were assigned to the same line. Thus, in particular, I consider P5 satisfied for a candidate parag even if the star in question lies a bit above the third line from the end of the previous parag, or a bit below the fourth line of the parag in question.

By the above criteria, one can identify ??? perfect parags in the SPS, covering ??? or the ??? text lines. When those are excluded, the lines that remain ???

Fix line numbers to match the IVTFF

Criterion P4 is not necessary, if it fails we shoudl have &qt;quasi-perfec parag&qt;

If parag is perfect and criterion P7 is satisfied, call it &qt;pluperfect&qt;

Determining the &qt;imperfect&qt; parags

While most of the text can be parsed as perfect parags, there are several &qt;imperfect blocks&qt; of consecutive lines such that, in any block, any candidate parag fails at least one of the criteria P1-P6. Thus, within each of those imperfect blocks we had to chose where to break parags by less objective criteria.

Stars on the right margin do not seem to be reliable parag markers. they are often more than one line off from the head, or missing entirely, ???

for candidates to perfect parags that satisfy P1--P4 ???

Looking at. (I can think of a few scenarios for the creation and final scribing of the SMS that would have led to starlets being omitted or misaligned by mistake.) ???

I suspect that the Scribe may somethimes have started a parag on the same line as the tail of the previous parag, when the latter was a short line. For example, on the parag break should perhaps be after the first word {dcheo}. Likewise, on , the parag break should perhaps be after the first word {saiin}.

We put a &qt;definitive&qt; parag break after every short line, even if the next line cannot get a starlet assigned to it. That line will be the start of an &qt;unstarred&qt; parag.

We put a &qt;tentative&qt; parg break before any line that has at least one puff, even if the previous line is not short and no starlet can be assigned to it.

Those two decisions divide each imperfect block into &qt;tentative parags&qt;. Each tentative parags is a set of consecutive lines such that no line except perhaps the first has any puffs or assigned starlet, and no line except perhaps the last one is short-length.

On page f103r there are two big stains. One is light greenish, triangular, ~30 mm wide and ~20 mm tall, is located along the middle of the top edge of the page; it affected some glyphs on lines 01 and 02 of the page. Another is brown, round, ~30 mm in diameter, is located in the upper left corner, ~7 mm from the top edge and ~15 mm fom the right edge; it affected part of the text on lines 01--03, 05, 06, 08, and 09. (Lines 04 and 07 were short and thus escaped the stain.) A smaller round brown stain, ~5 mm wide, is located between the big one and the right margin; it affected the last few glyphs of lines 5 and 6. Some of the affected glyphs completely erased, others became very faint but are still legible. These stains seeped through to the other side (f103v) and affected the text there too, but to a smaller extent. Some of the affected glyphs appear to have been retraced in the dark Retracer ink, which flared out slightly in the areas covered by the stains. The colors suggest that the Dark Painter dropped paint and then washed it off, with water or whatever thinner the paint used. Which suggests that (1) the ink is not iron-gall, and (2) the Retracer acted well after the Dark Painter and hence well after the Scribe.

Page f107v is the only one whose first line has no puffs. But the last line of f107r is short, so the last parag of f107r cannot continue on page f107v, not even as an imperfect parag.

On page f108r, there seems to be something wrong with stars 2--5 and 7--9. Stars 1 and 2 are both yellow, as are stars 4,5, and 6, breaking the pattern of alternating yellow and red stars. Line 2 of the page, <f108r.P1.2>, has a one-leg gallows and thus is probably the start of a paragraph; however there is no nearby star that could be assigned to it. Presumably a star, presumably red, was omitted between stars 1 and 2. On the other hand, the line of star 4 ( <f108r.P1.11>) seems to be in the middle of a parag. Presumably that &qt;extra&qt; star was added in order to get the count of stars right, compensating for the missing star on line <f108r.P1.2>. Thus we will assume that line <f108r.P1.8> is the start of a 6-line parag, which would be perfect if that &qt;extra&qt; star was not there. Likewise, there seems to be one star missing on line <f108r.P1.25>, which has 2 one-leg gallows; and star 7 is yellow, star 9 is red, and star 8 is unpainted. Presumably the missing star was to be red, and then star 8 should be yellow.

On f108v, the lines in the lower 2/3 of the page seem more crammed than usual, and some lines overflow the right rail. Stars 10--16 do not have obvious parag heads. Star 10 is next to line <f108v.P1.33> which starts with an extra-wide {t}, hinting that it is a parag head. The lines from that point on have no one-leg gallows or short lines. A parag break was inferred before <f108v.P1.49> because of a slightly wider interlinear space, which was then assigned to star 16. The other breaks were guessed from the positions of stars 10--15. Stars 10--14 are not aligned with the lines but with gaps between the lines, so in each of these cases a choice had to be made between the two nearest lines. (There are faint short lines between the star and the left rail that may be suggestions for the parag breaks. However, those lines may have been added by some later owner and may be just his guesses, rather than informed hints. They were ignored.)

On pages f111r and f111v there is a triangular stain along the middle of the top edge of the panel, similar in shape and size to that of f103r but light brown instead of light green. There are three smaller stains below it, at ~40 mm from the top edge, with diams ~8, ~4, and ~10 mm. The text included in the stains (on lines 01--03 of both pages, 11--14 of f111r, 10--13 of f111v) became fainter but is still quite readable. Maybe a couple of glyphs on f111r were fully or partly retraced, but it is far from certain.


Last edited on 2025-08-17 22:00:09 by stolfi