Zipf law plot (frequency as function of frequency rank) for various texts.
The languages, texts and the word frequency files are:
Voynichese, the language of the ''[[Voynich Manuscript]]''. Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.
* 'Cosmological' section, part 2 (pages f67r1-f70r2). Sample: ''teeodaiin shey epairody osaiin yteeoey shey epaiin o aiin daiir okeody'' [...] ''chcthey s or ary''. File voyn/prs/cos.2/gud.wfr (original 1364 words, truncated/filtered to 1353 words, ''N'' = 733 distinct).
* 'Cosmological' section, part 3 (pages f85r2,f85v2,f86v3,f86v4). Sample: ''otedy ar chcthy otar chepaiin otodaiin otaiin otchedy olkaiin odar'' [...] ''dar shol or alor''. File voyn/prs/cos.3/gud.wfr (original 717 words, truncated/filtered to 713 words, ''N'' = 380 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.