Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Spanish language|Spanish]]. Text of [[Miguel de Cervantes]]'s novel ''[[Don Quixote]]''. In original spelling of early 1600s, including variable use of 'v', 'u', and 'b' for the same sound. Mapped to lowercase, excluding foreign language insertions and poems.

* Part I (1605). Sample: ''<nowiki>en vn lugar de la mancha de cuyo nombre no quiero acordarme no ha mucho</nowiki>'' [...] ''<nowiki>pariente suyo fuera de que</nowiki>''. File span/qvi/one.1/gud.wfr (original 177061 words, truncated/filtered to 35027 words, ''N'' = 5452 distinct).

[[Portuguese language|Portuguese]]. Text of the novel ''[[Dom Casmurro]]'' by ''[[Machado de Assis]]'' (1899).  The spelling was updated to Brazilian usage as of ~2000, incuing umlaut on 'u' after 'q', accent in 'Ã©ia' endings, differential accents 'tem'/'tÃªm', etc. Mapped to lowercase, with numerals excluded.

* Whole text. Sample: ''<nowiki>uma noite destas vindo da cidade para o engenho novo encontrei no trem</nowiki>'' [...] ''<nowiki>josé dias gostaram do moço o agregado disse~lhe que vira uma vez</nowiki>''. File port/csm/tot.1/gud.wfr (original 64602 words, truncated/filtered to 35027 words, ''N'' = 6267 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.