Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Chinese language|Chinese (Mandarin)]]. The classical Chinese novel ''[[Dream of the Red Chamber]]'' or ''Dream of the Red Mansion'' (''Hong2 Lou2 Meng4'') by Cao2 Xue3 Qin2 and Gao E (~1750); with some errors and omissions.  Chinese characters were mapped 1:1 from GB (Guo Biao) to pinyn with tone marks and disambiguating suffixes, e.g. 'zuo4', 'zuo4.1', 'zuo4.2', so as to distinguish characters with the same pinyin.  Each character is treated as a separate word.

* Whole text. Sample: ''<nowiki>ci3 kai1 juan3 di4.2 yi1 hui2 ye3 zuo4.2 zhe3 zi4 yun2 yin1 ceng2 li4.4</nowiki>'' [...] ''<nowiki>dong1 bian1 wu1 nei4.1 guo4 lai2 dai4.1 le5 liu2.1</nowiki>''. File chin/red/tot.1/gud.wfr (original 706889 words, truncated/filtered to 35027 words, ''N'' = 2420 distinct).

[[Chinese language|Chinese (Mandarin)]]. The classical Chinese novel ''[[Dream of the Red Chamber]]'' (''Hong2 Lou2 Meng4'') by Cao2 Xue3 Qin2 and Gao E, ~1750; with some errors and omissions.  Chinese characters were mapped 1:1 from GB (Guo Biao) to a fancy number scheme similar to Roman numerals but with the structure of Voynichese words, like 'ÓÖ' (= 'you4') ⟶ 'yrkso'.  Each Chinese character is treated as a separate word.

* Whole text. Sample: ''<nowiki>dkelsy kerdy adt yckrdo aked kry ykrso kelo adlker ydkro adkersy aske</nowiki>'' [...] ''<nowiki>alckidy kdo dcksy</nowiki>''. File chrc/red/tot.1/gud.wfr (original 706889 words, truncated/filtered to 35027 words, ''N'' = 2420 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.