Zipf law plot (frequency as function of frequency rank) for various texts. The languages, texts and the word frequency files are: [[Ge'ez]] (Classical Ethiopian). Text of the ''[[Glory of the Kings]]'' (''Kebra Nagast''), a 14th century chronicle of Ethiopian kings, part of the Coptic Bible. Published by Michal Jerabek. In the SERA encoding, with numerals excluded. * Whole text. Sample: ''be'akWetEtu le'Igzi'AbHEr 'ab 'a`hazE kWulu webeweldu 'iyesus krstos'' [...] ''Syon baHr seged Hzbe 'ar`ad qdme seged Zan seged wdm 'ar`ad `amde Syon''. File geez/gok/tot.1/gud.wfr (34291 words, ''N'' = 12272 distinct). [[Hebrew language|Hebrew]]. The first five books (''[[Torah]]'', ''Pentateuch'') of the Hebrew Bible (''Tanak''). From the 10th century version (the [[Masoretic text]]) of the original, probably composed mainly around ~500 BCE from earlier texts. From the ''Sacred Texts'' site, maintained by John B. Hare. In an ad-hoc single-byte encoding designed to look vaguely phonetic under an ISO-Latin-1 font. '''With''' vowel points but '''without''' cantillation marks. * Whole text. Sample: ''b¤°rë¡s¹ïy± b¤ârâ¡ ¡°êlöhïym ¡ë± häs¤¹âmäyïm w°¡ë± hâ¡ârêþ w°hâ¡ârêþ'' [...] ''k¤âlhäy¤âmïym''. File hebr/tav/tot.1/gud.wfr (original 66311 words, truncated/filtered to 35027 words, ''N'' = 12487 distinct). [[Arabic language|Arabic]]. The ''[[Quran]]'' (~650 CE). Based on the ''Unicode Quran'' document from the Sacred Texts site, maintained by John B. Hare, with several corrections. Arabic Unicode characters were mapped into [[ISO 8859-1|ISO latin-1]] characters in a vaguely phonetic way. '''With''' vowel marks, hamza, madda but '''without''' sukuns. * Whole text. Sample: ''bîsmî alllâhî alrrâµmânî alrrâµîymî alµâmdû lîllâhî râbbî al¿âlâmîynâ'' [...] ''tâttâ©î£ûwnâ mînhû sâkâräa wârîzqäa µâsânäa a¡înnâ fîy''. File arab/quv/tot.1/gud.wfr (original 77411 words, truncated/filtered to 35027 words, ''N'' = 10762 distinct). The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.