Zipf law plot (frequency as function of frequency rank) for various texts.
The languages, texts and the word frequency files are:
[[Spanish language|Spanish]]. Text of [[Miguel de Cervantes]]'s novel ''[[Don Quixote]]''. In original spelling of early 1600s, including variable use of 'v', 'u', and 'b' for the same sound. Mapped to lowercase, excluding foreign language insertions and poems.
* Part I (1605). Sample: ''en vn lugar de la mancha de cuyo nombre no quiero acordarme no ha mucho'' [...] ''pariente suyo fuera de que''. File span/qvi/one.1/gud.wfr (original 177061 words, truncated/filtered to 35027 words, ''N'' = 5452 distinct).
* Part II (1615). Sample: ''cuenta zide hamete benengeli en la segunda parte desta historia y'' [...] ''bachiller sanson carrasco nuestro compatrioto en esto boluio''. File span/qvi/two.1/gud.wfr (original 187776 words, truncated/filtered to 35027 words, ''N'' = 5698 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website]. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.