Zipf law plot (frequency as function of frequency rank) for various texts.
The languages, texts and the word frequency files are:
[[Latin]]. The first five books (the ''Pentateuch'') from the Latin version (''Vulgate'') of the ''Old Testament'', edited by [[St. Jerome]] around 400 CE. Converted to lowercase.
* All five books. Sample: ''in principio creavit deus caelum et terram terra autem erat inanis et'' [...] ''chananei''. File latn/ptt/tot.1/gud.wfr (original 96870 words, truncated/filtered to 35027 words, ''N'' = 6633 distinct).
[[Greek language|Greek]]. Text ''[[Byzantine text-type]]'' or ''Majority Text'' version of the ''[[New Testament]]'' in vulgar Byzantine Greek (''koinƩ''), from 300 CE or earlier, in a had-hoc enconding of the Greek alphabet into ISO Latin-1.
* Whole text (27 books). Sample: ''biblos geneseōs iėsou qristou uiou dauid uiou abraam abraam egennėsen'' [...] ''maršas tės''. File grek/nwt/tot.1/gud.wfr (original 66183 words, truncated/filtered to 35027 words, ''N'' = 5436 distinct).
[[Russian language|Russian]]. The first five books (the ''Pentateuch'') from the [[Synodal Russian Bible]] (1876). Translated from Old Slavonic, with many archaic words. Romanized, all lowercase.
* All five books. Sample: ''v nachale sotvoril bog nebo i zemlyu zemlya zhe byla bezvidna i pusta i'' [...] ''v den' sobraniya i otdal ikh gospod' mne i''. File russ/ptr/tot.1/gud.wfr (original 111824 words, truncated/filtered to 35027 words, ''N'' = 5520 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.