Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Latin]]. The first four books (the ''Gospels'') from the Latin version (''Vulgate'') of the ''New Testament'', edited by [[St. Jerome]] around 400 CE.  Converted to lowercase.

* Book 1 - ''Gospel of Matthew''. Sample: ''<nowiki>liber generationis iesu christi filii david filii abraham abraham genuit</nowiki>'' [...] ''<nowiki>ecce ego vobiscum sum omnibus diebus usque ad consummationem saeculi</nowiki>''. File latn/nwt/mat.1/gud.wfr (16431 words, ''N'' = 3911 distinct).


* Book 2 - ''Gospel of Mark''. Sample: ''<nowiki>initium evangelii iesu christi filii dei sicut scriptum est in esaia</nowiki>'' [...] ''<nowiki>confirmante sequentibus signis</nowiki>''. File latn/nwt/mrk.1/gud.wfr (10280 words, ''N'' = 2913 distinct).


* Book 3 - ''Gospel of Luke''. Sample: ''<nowiki>quoniam quidem multi conati sunt ordinare narrationem quae in nobis</nowiki>'' [...] ''<nowiki>erant semper in templo laudantes et benedicentes deum amen</nowiki>''. File latn/nwt/luk.1/gud.wfr (18004 words, ''N'' = 4406 distinct).


* Book 4 - ''Gospel of John''. Sample: ''<nowiki>in principio erat verbum et verbum erat apud deum et deus erat verbum</nowiki>'' [...] ''<nowiki>eos qui scribendi sunt libros amen</nowiki>''. File latn/nwt/joh.1/gud.wfr (14026 words, ''N'' = 2523 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts are in the companion files */*/org/main.src.  The extracted texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.