Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Vietnamese language|Vietnamese]]. The first five books (the ''Pentateuch'') from the [[Cadman Vietnamese Bible]] (1934). Probably translated from the English [[King James Bible]].  In the ASCII VIQR encoding, mapped to lowercase, without hyphens.

* All five books. Sample: ''<nowiki>ban dda^`u ddu+'c chu'a tro+`i du+.ng ne^n tro+`i dda^'t va? dda^'t la`</nowiki>'' [...] ''<nowiki>da.y la.i cho dde^? ca'c ngu+o+i la`m theo no' trong xu+' ma` ca'c</nowiki>''. File viet/ptt/tot.1/gud.wfr (original 169480 words, truncated/filtered to 35027 words, ''N'' = 1631 distinct).

Synthetic text imitating [[Vietnamese language|Vietnamese]]. Text created by a [[Markov chain]] of order 3, trained on the Cadman Vietnamese Pentateuch.

* Whole generated text. Sample: ''<nowiki>ddo' no+i cha(`ng ra(`ng mi`nh xo^'p da^~ng ddi dde^` ca'ch tro+`i</nowiki>'' [...] ''<nowiki>dda(.c</nowiki>''. File viep/mky/tot.1/gud.wfr (original 39293 words, truncated/filtered to 35027 words, ''N'' = 3341 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.