Zipf law plot (frequency as function of frequency rank) for various texts.
The languages, texts and the word frequency files are:
[[Latin]]. The first four books (the ''Gospels'') from the Latin version (''Vulgate'') of the ''New Testament'', edited by [[St. Jerome]] around 400 CE. Converted to lowercase.
* Book 1 - ''Gospel of Matthew''. Sample: ''liber generationis iesu christi filii david filii abraham abraham genuit'' [...] ''ecce ego vobiscum sum omnibus diebus usque ad consummationem saeculi''. File latn/nwt/mat.1/gud.wfr (16431 words, ''N'' = 3911 distinct).
* Book 2 - ''Gospel of Mark''. Sample: ''initium evangelii iesu christi filii dei sicut scriptum est in esaia'' [...] ''confirmante sequentibus signis''. File latn/nwt/mrk.1/gud.wfr (10280 words, ''N'' = 2913 distinct).
* Book 3 - ''Gospel of Luke''. Sample: ''quoniam quidem multi conati sunt ordinare narrationem quae in nobis'' [...] ''erant semper in templo laudantes et benedicentes deum amen''. File latn/nwt/luk.1/gud.wfr (18004 words, ''N'' = 4406 distinct).
* Book 4 - ''Gospel of John''. Sample: ''in principio erat verbum et verbum erat apud deum et deus erat verbum'' [...] ''eos qui scribendi sunt libros amen''. File latn/nwt/joh.1/gud.wfr (14026 words, ''N'' = 2523 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts are in the companion files */*/org/main.src. The extracted texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.