This webpage reports an attemp at locating the paragraph breaks in the Starred Paragraphs section (SPS) of the Voynich Manuscript (VMS), also known as the Recipes section.
The text of the SPS clearly consists of multiple paragraphs, each comprising one or more lines. Each line normally starts at the left margin of the text and continues up to the right margin, or to the end of the paragraph, whichever comes first. Considering the cost of parch and the difficulty of erasing it, we assume that the Author (the person who decided to create the book, devised the script, created or obtained the information, etc.) first wrote a draft the SPS on paper, and then recruited a Scribe to traspose the final draft to parch. We assume that both Author and Scribe understood that the parags were significant, while the line breaks within each parag were not; so that the Scribe disregarded the line breaks in the draft, and introduced new ones whenever he/she reached the right margin. (While the Scribe may have been the Author him/herself, there is evidence that they were different persons, and that the Scribe did not know much about the meaning of text, probably not even its language.)
Identifying the paragraphs as intended by the Author is necessary for certain analyses, such as comparing the frequencies of words at in the start, middle, and end of paragraphs, or trying to identify words that occur preferably at the end of sentences (like the verbs in German and other SOV languages). There are several features that are believed to indicate those breaks; but they often disagree or are absent, and there are several blocks of consecutive lines that, based on internal and external clues, can be presumed to consist of two or more parags run together. Locating the paragraph breaks within those blocks requires some more or less arbitary choices, guided by the available clues. This article describes a set of criteria for guiding these choices, and shows the results of applying them to each page of the SPS.
For this note, I will use a transcription file of my own that is described elsewhere.
I considered the SPS to comprise the lines of Voynichese text from the start of page f103r to line 30 of page f116r. The rest of page f116 -- from line 31 onwards, starting with {pchallarar} -- seems to be of a different nature; for one thing, there is an extra wide linegap above it, and there are no stars on its left margin. (Here, as in the rest of this report, Voynichese text is encoded in the EVA script, with braces {} used to indicate groups of glyphs connected by ligatures.)
The range above includes three lines seem likely to be of a different nature, such as subsection titles. These are:
<f105r.36> {otoiis.chedaiin.otair.otaly} (centered, bottom of page)
<f108v.52> {olchar.olchedy.lshy.otedy} (partly centered, bottom of page)
<f114r.34> {ytain.o,l,kaiin.y,kar.chdar.alkam} (right-justified, mid-page)
It is known that folios f109 and f110 are missing in the physical VMS, so the SPS file covers only 22.6 pages, instead of the presumed 26.6 originally present. The surviving pages have 1065 lines of 'prose' (non-title) text, or ~40 lines per page on the average.
The number of words in that file is hard to tell because the spacing of words and glyphs is highly irregular, and modern transcribers often disagree on whether a gap between two glyphs is a word boundary or not, or are themselves uncertain about that. Some of these disputed or uncertain spaces are maked in the computer transcriptions with comma ',' instead of dot '.'. If one dsregards them, the consensual ones determine ~10'200 words, or ~9.6 per line on average. If we treat all the commas as as word spaces too, we get ~11'200 words, or an average of ~10.5 per line..
??? stains on f112 match those on f103
The number of glyphs in the SPS also a bit uncertain, since the transcribers sometimes disagree on how many glyphs they see in each ink squigle. Some will read {ee} where others read {ch} or even {a}, and so on. Considering each of {ch}, {sh}, {ih}, {cth} {ith}, {cthh}, etc. as single glyphs, the transcription file above has ??? glyphs, or an average of ??? per line. There is a table with the statistics of those counts per page.
The left margin of every page of the SPS contains a column of stars. It is believed that there should be one star for each parag of the SPS, aligned vertically with the parag head line, like the bullet in an item of an itemized list. However the reality is somewhat far from this ideal. A star that is presumably associated with an obvious parag is often located at some distance, up or down, from the head, or is missing entirely. Part of the task of identifying the paragraph breaks is to assign each star to exactly one parag, and deciding which parags will be left without stars.
The stars look basically the same as those in the Cosmo and Zodiac section. Each star is typically ~6 mm across. Its outline is drawn with the same ink as the text, and consists of 6 to 9 rays, typically ~1.5 mmm wide at the base and ~2 mm long, with two straight or slightly curved sizes, that form a sharp or slightly rounded point. The bases of the rays define a round body ~2mm across. While the finished outline is a single continuous line, it may be drawn in two or more separate strokes.
A few stars are clear -- just outlined, not painted. All the others are partially or totally painted with one of two colors: yel, a partly transparent watercolor-like golden yellow paint (apparently the same paint/ink used on the hair of the nymphs of the Zodiac section), or red, an opaque tempera-like dark red paint (presumably the same used for the lips of nymphs in the Zodiac, flowers in the Herbal section, etc.)
The cores of stars that are painted red is usually invisible, while that of those painted yel is normally visible through the paint.
The two colors may have been applied at different times by different people (the Light Painter and the Dark Painter, respectively), who may have had different levels of knowledge about the VMS. Thus it is possible that the yel color on stars has some information, for instance about parag breaks, while the red color has not.
Most stars in the SPS have a tail, which is a thin curvy line extending down from the star's outline, usually from one of the rays. In this case the ray is often sharper, longer, and curved. (This indicates that the tail was drawn by the same Scribe who drew the stars.) On some stars the ray is extended to most of the length of the tail, thus creating a fat tail. The fat tails may have some information too. For instance, a fat tail may signify that the star is associated with two parag heads that are too close together to receive individual stars.
The number of rays may also carry information.
My defintion of 'perfect parag' for the present is a set of one or more consecutive left-justified lines that satisfy all of the following criteria:
P1. The first of these lines follows a short line (or is the first line in the SPS, or follows a 'title').
P2. The last of these lines is short (or is the last line of the SPS, or precedes a 'title').
P3. All lines other than the last one are long lines.
P4. There are no puffs in any of these lines except possibly in the first of them.
P5. The first of those lines has a starlet.
P6. None of these lines, except the first one, has an assigned starlet.
Note that P4 does not *require* the existence of puffs in the head line of a perfect parag. It only prohibits them in the other lines.
Two other conditions that are relevant for parag splitting are:
P7. The head line has a pufftial: namely, its first glyph is either a puff or a {t}.
P8. The linegap above the line is wider than normal, at least over some part of the line.
The first line of surviving SPS pages (except f107v, see below) usually has one or more puffs. On those pages, Rule P4 effectively requires that a perfect parag be entirely contained within one page.
The alignment of starlets and parag heads are rarely precise; the starlets often are displaced up or down from the head's midline and may lie closer to the midlines of adjacent lines, or even beyond them. Therefore, when assigning starlets to text lines and checking P5 and P6, I chose to allow for offsets (vpos) of plus or minus ??? line spacings between the candidate starlet and the head of the candidate parag, provided that in the end no two starlets were assigned to the same line. Thus, in particular, I consider P5 satisfied for a candidate parag even if the star in question lies a bit above the third line from the end of the previous parag, or a bit below the fourth line of the parag in question.
By the above criteria, one can identify ??? perfect parags in the SPS, covering ??? or the ??? text lines. When those are excluded, the lines that remain ???
???Fix line numbers to match the IVTFF
???Criterion P4 is not necessary, if it fails we shoudl have 'quasi-perfec parag'
???If parag is perfect and criterion P7 is satisfied, call it 'pluperfect'
While most of the text can be parsed as perfect parags, there are several 'imperfect blocks' of consecutive lines such that, in any block, any candidate parag fails at least one of the criteria P1-P6. Thus, within each of those imperfect blocks we had to chose where to break parags by less objective criteria.
Stars on the right margin do not seem to be reliable parag markers. they are often more than one line off from the head, or missing entirely, ???
for candidates to perfect parags that satisfy P1--P4 ???
Looking at. (I can think of a few scenarios for the creation and final scribing of the SMS that would have led to starlets being omitted or misaligned by mistake.) ???
I suspect that the Scribe may somethimes have started a parag on the same line as the tail of the previous parag, when the latter was a short line. For example, on <f105v.P1.6> the parag break should perhaps be after the first word {dcheo}. Likewise, on <f105v.P1.13>, the parag break should perhaps be after the first word {saiin}.
We put a 'definitive' parag break after every short line, even if the next line cannot get a starlet assigned to it. That line will be the start of an 'unstarred' parag.
We put a 'tentative' parg break before any line that has at least one puff, even if the previous line is not short and no starlet can be assigned to it.
Those two decisions divide each imperfect block into 'tentative parags'. Each tentative parags is a set of consecutive lines such that no line except perhaps the first has any puffs or assigned starlet, and no line except perhaps the last one is short-length.
On page f103r there are two big stains. One is light greenish, triangular, ~30 mm wide and ~20 mm tall, is located along the middle of the top edge of the page; it affected some glyphs on lines 01 and 02 of the page. Another is brown, round, ~30 mm in diameter, is located in the upper left corner, ~7 mm from the top edge and ~15 mm fom the right edge; it affected part of the text on lines 01--03, 05, 06, 08, and 09. (Lines 04 and 07 were short and thus escaped the stain.) A smaller round brown stain, ~5 mm wide, is located between the big one and the right margin; it affected the last few glyphs of lines 5 and 6. Some of the affected glyphs completely erased, others became very faint but are still legible. These stains seeped through to the other side (f103v) and affected the text there too, but to a smaller extent. Some of the affected glyphs appear to have been retraced in the dark Retracer ink, which flared out slightly in the areas covered by the stains. The colors suggest that the Dark Painter dropped paint and then washed it off, with water or whatever thinner the paint used. Which suggests that (1) the ink is not iron-gall, and (2) the Retracer acted well after the Dark Painter and hence well after the Scribe.
Page f107v is the only one whose first line has no puffs. But the last line of f107r is short, so the last parag of f107r cannot continue on page f107v, not even as an imperfect parag.
On page f108r, there seems to be something wrong with stars 2--5 and 7--9. Stars 1 and 2 are both yellow, as are stars 4,5, and 6, breaking the pattern of alternating yellow and red stars. Line 2 of the page, <f108r.P1.2>, has a one-leg gallows and thus is probably the start of a paragraph; however there is no nearby star that could be assigned to it. Presumably a star, presumably red, was omitted between stars 1 and 2. On the other hand, the line of star 4 ( <f108r.P1.11>) seems to be in the middle of a parag. Presumably that 'extra' star was added in order to get the count of stars right, compensating for the missing star on line <f108r.P1.2>. Thus we will assume that line <f108r.P1.8> is the start of a 6-line parag, which would be perfect if that 'extra' star was not there. Likewise, there seems to be one star missing on line <f108r.P1.25>, which has 2 one-leg gallows; and star 7 is yellow, star 9 is red, and star 8 is unpainted. Presumably the missing star was to be red, and then star 8 should be yellow.
On f108v, the lines in the lower 2/3 of the page seem more crammed than usual, and some lines overflow the right rail. Stars 10--16 do not have obvious parag heads. Star 10 is next to line <f108v.P1.33> which starts with an extra-wide {t}, hinting that it is a parag head. The lines from that point on have no one-leg gallows or short lines. A parag break was inferred before <f108v.P1.49> because of a slightly wider interlinear space, which was then assigned to star 16. The other breaks were guessed from the positions of stars 10--15. Stars 10--14 are not aligned with the lines but with gaps between the lines, so in each of these cases a choice had to be made between the two nearest lines. (There are faint short lines between the star and the left rail that may be suggestions for the parag breaks. However, those lines may have been added by some later owner and may be just his guesses, rather than informed hints. They were ignored.)
On pages f111r and f111v there is a triangular stain along the middle of the top edge of the panel, similar in shape and size to that of f103r but light brown instead of light green. There are three smaller stains below it, at ~40 mm from the top edge, with diams ~8, ~4, and ~10 mm. The text included in the stains (on lines 01--03 of both pages, 11--14 of f111r, 10--13 of f111v) became fainter but is still quite readable. Maybe a couple of glyphs on f111r were fully or partly retraced, but it is far from certain.
Last edited on 2026-01-02 16:41:05 by stolfi