Zipf law plot (frequency as function of frequency rank) for various texts.
The languages, texts and the word frequency files are:
[[Chinese language|Chinese (Mandarin)]]. The classical Chinese novel ''[[Dream of the Red Chamber]]'' or ''Dream of the Red Mansion'' (''Hong2 Lou2 Meng4'') by Cao2 Xue3 Qin2 and Gao E (~1750); with some errors and omissions. Chinese characters were mapped 1:1 from GB (Guo Biao) to pinyn with tone marks and disambiguating suffixes, e.g. 'zuo4', 'zuo4.1', 'zuo4.2', so as to distinguish characters with the same pinyin. Each character is treated as a separate word.
* Whole text. Sample: ''ci3 kai1 juan3 di4.2 yi1 hui2 ye3 zuo4.2 zhe3 zi4 yun2 yin1 ceng2 li4.4'' [...] ''dong1 bian1 wu1 nei4.1 guo4 lai2 dai4.1 le5 liu2.1''. File chin/red/tot.1/gud.wfr (original 706889 words, truncated/filtered to 35027 words, ''N'' = 2420 distinct).
[[Chinese language|Chinese (Mandarin)]]. The classical Chinese novel ''[[Dream of the Red Chamber]]'' (''Hong2 Lou2 Meng4'') by Cao2 Xue3 Qin2 and Gao E, ~1750; with some errors and omissions. Chinese characters were mapped 1:1 from GB (Guo Biao) to a fancy number scheme similar to Roman numerals but with the structure of Voynichese words, like 'ÓÖ' (= 'you4') ⟶ 'yrkso'. Each Chinese character is treated as a separate word.
* Whole text. Sample: ''dkelsy kerdy adt yckrdo aked kry ykrso kelo adlker ydkro adkersy aske'' [...] ''alckidy kdo dcksy''. File chrc/red/tot.1/gud.wfr (original 706889 words, truncated/filtered to 35027 words, ''N'' = 2420 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.