Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

Voynichese, the language of the ''[[Voynich Manuscript]]''. Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.

* Page f1r, unknown text type. Sample: ''<nowiki>fachys ykal ar ataiin shol shory cthres y kor sholdy sory ckhar or y</nowiki>'' [...] ''<nowiki>chol chok choty chotey dchaiin</nowiki>''. File voyn/prs/unk.1/gud.wfr (202 words, ''N'' = 153 distinct).


* Page f49v, unknown text type. Sample: ''<nowiki>kshor shol cphokchol qokchy qokchod sho chotchy chcthy cthy koddy okeod</nowiki>'' [...] ''<nowiki>ykchokeo r cheey daiin</nowiki>''. File voyn/prs/unk.2/gud.wfr (original 136 words, truncated/filtered to 134 words, ''N'' = 97 distinct).


* Page f66r, unknown text type. Sample: ''<nowiki>pdaiin oteedy opchedy chefchy shddy ypcher cholpchd okedals rair shekey</nowiki>'' [...] ''<nowiki>daiin chty</nowiki>''. File voyn/prs/unk.4/gud.wfr (original 296 words, truncated/filtered to 292 words, ''N'' = 216 distinct).


* Page f85r1, unknown text type. Sample: ''<nowiki>pdsheody shdol shey otchdy dshedy soeeedy dchefoey sair shedy sodair</nowiki>'' [...] ''<nowiki>otol otchedy</nowiki>''. File voyn/prs/unk.5/gud.wfr (309 words, ''N'' = 214 distinct).


* Page f86v6, unknown text type. Sample: ''<nowiki>pchey pchdar cphy aiin ofy chedy otedalol orairody ochody chol chey</nowiki>'' [...] ''<nowiki>otalky chear</nowiki>''. File voyn/prs/unk.6/gud.wfr (original 432 words, truncated/filtered to 431 words, ''N'' = 247 distinct).


* Page f86v5, unknown text type. Sample: ''<nowiki>pshdal sheody lfchy fyshey qoky opy ypar oraiin ytor aiin opy losair</nowiki>'' [...] ''<nowiki>yteody chedy qoteey octhy dy</nowiki>''. File voyn/prs/unk.7/gud.wfr (357 words, ''N'' = 208 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.