Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Spanish language|Spanish]]. Text of [[Miguel de Cervantes]]'s novel ''[[Don Quixote]]''. In original spelling of early 1600s, including variable use of 'v', 'u', and 'b' for the same sound. Mapped to lowercase, excluding foreign language insertions and poems.

* Part I (1605). Sample: ''<nowiki>en vn lugar de la mancha de cuyo nombre no quiero acordarme no ha mucho</nowiki>'' [...] ''<nowiki>pariente suyo fuera de que</nowiki>''. File span/qvi/one.1/gud.wfr (original 177061 words, truncated/filtered to 35027 words, ''N'' = 5452 distinct).


* Part II (1615). Sample: ''<nowiki>cuenta zide hamete benengeli en la segunda parte desta historia y</nowiki>'' [...] ''<nowiki>bachiller sanson carrasco nuestro compatrioto en esto boluio</nowiki>''. File span/qvi/two.1/gud.wfr (original 187776 words, truncated/filtered to 35027 words, ''N'' = 5698 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.