Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

Voynichese, the language of the ''[[Voynich Manuscript]]''. Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.

* 'Herbal' section, language A, part 2 (pages f87r,f87v,f90r1-f90v2,f93r,f93v,f96r,f96v). Sample: ''<nowiki>poal shsal shocphor ypcho cpheo saiin oteodal saiin dchee ckhos chety</nowiki>'' [...] ''<nowiki>checkhey sosar cheekeo soy sar cheor</nowiki>''. File voyn/prs/hea.2/gud.wfr (original 826 words, truncated/filtered to 823 words, ''N'' = 509 distinct).


* 'Herbal' section, language B, part 2 (pages f94r-f95v2). Sample: ''<nowiki>chedaiin dsheedy qopchedal keo daiin otal aiin oar dor cheody okaiin</nowiki>'' [...] ''<nowiki>chcthy</nowiki>''. File voyn/prs/heb.2/gud.wfr (510 words, ''N'' = 288 distinct).


* 'Zodiac' section (pages f70v1,f70v2,f71r-f73v). Sample: ''<nowiki>okcheo dar otey ykeey tchy otsheo oteotey shey sheckh opcheol dair</nowiki>'' [...] ''<nowiki>chodaiin chey ar daly alar oto lam chory ytaly</nowiki>''. File voyn/prs/zod.1/gud.wfr (original 702 words, truncated/filtered to 701 words, ''N'' = 379 distinct).


* 'Stars' section, part 1 (folio f58r). Sample: ''<nowiki>kor cholfy shopchy otoralchy chofchol sholy otaly dal m dshodal or ckhy</nowiki>'' [...] ''<nowiki>dy o shor qokain okam shear sarols</nowiki>''. File voyn/prs/str.1/gud.wfr (original 673 words, truncated/filtered to 670 words, ''N'' = 402 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.