Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

Synthetic languages imitating Voynichese, the language of the ''[[Voynich Manuscript]]''. Text generated manually by Gordon Rugg with his proposed 'table-and-grille' method.

* Whole text. Sample: ''<nowiki>olkshedy otedy qocheol ochecthdy aiin qochekdy rchey qol ol okdy</nowiki>'' [...] ''<nowiki>okeey yky olchedy ky cheol kd oshey ol</nowiki>''. File voyp/grm/tot.1/gud.wfr (708 words, ''N'' = 307 distinct).

Voynichese, the language of the ''[[Voynich Manuscript]]''. Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.

* 'Biology' section. Sample: ''<nowiki>kary okeey qokar shy kchedy qotar shedy dain shey ly ssheol qolchedy</nowiki>'' [...] ''<nowiki>daiin olkedy ykaiin sor otes dol kedy otol chedy</nowiki>''. File voyn/prs/bio.1/gud.wfr (original 6559 words, truncated/filtered to 6555 words, ''N'' = 1325 distinct).


* Whole prose text. Sample: ''<nowiki>fachys ykal ar ataiin shol shory cthres y kor sholdy sory ckhar or y</nowiki>'' [...] ''<nowiki>sodal chal chcthy chckhy qol ary</nowiki>''. File voyn/prs/tot.1/gud.wfr (original 35128 words, truncated/filtered to 35027 words, ''N'' = 6525 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.