Zipf law plot (frequency as function of frequency rank) for various texts.
The frequency tables are available at the website [ UNICAMP website]. The languages, texts and the frequency files are:
Voynichese, the language of the ''[[Voynich Manuscript]]''. Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.
* 'Herbal' section, language A, part 2 (pages f87r,f87v,f90r1-f90v2,f93r,f93v,f96r,f96v). Sample: poal shsal shocphor ypcho cpheo saiin oteodal saiin dchee ckhos chety [...] checkhey sosar cheekeo soy sar cheor File voyn/prs/hea.2/gud.wfr (original 826 words, truncated/filtered to 823 words, ''N'' = 509 distinct).
* 'Herbal' section, language B, part 2 (pages f94r-f95v2). Sample: chedaiin dsheedy qopchedal keo daiin otal aiin oar dor cheody okaiin [...] chcthy File voyn/prs/heb.2/gud.wfr (510 words, ''N'' = 288 distinct).
* 'Zodiac' section (pages f70v1,f70v2,f71r-f73v). Sample: okcheo dar otey ykeey tchy otsheo oteotey shey sheckh opcheol dair [...] chodaiin chey ar daly alar oto lam chory ytaly File voyn/prs/zod.1/gud.wfr (original 702 words, truncated/filtered to 701 words, ''N'' = 379 distinct).
* 'Stars' section, part 1 (folio f58r). Sample: kor cholfy shopchy otoralchy chofchol sholy otaly dal m dshodal or ckhy [...] dy o shor qokain okam shear sarols File voyn/prs/str.1/gud.wfr (original 673 words, truncated/filtered to 670 words, ''N'' = 402 distinct).
* Whole prose text. Sample: fachys ykal ar ataiin shol shory cthres y kor sholdy sory ckhar or y [...] sodal chal chcthy chckhy qol ary File voyn/prs/tot.1/gud.wfr (original 35128 words, truncated/filtered to 35027 words, ''N'' = 6525 distinct).
Labels, titles, word lists, and other isolated words from the Majority Vote version extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.
* Whole list. Sample: ytoain dairol olkchdal oparairdly otardaly otodaram aralarar ocfhor [...] okeody daiisaly ypary opchytch ypcholdy loralody opchdard oror sheey File voyn/lab/tot.1/gud.wfr (original 1021 words, truncated/filtered to 1003 words, ''N'' = 721 distinct).
Prose-like parts from Majority Vote version of the text, excluding 'labels'. Extracted from the Landini/Zandbergen Interlinear Transcription 1.6e6.
* 'Cosmological' section, part 1 (page f57v). Sample: sa l y saeos ar okees o d soefchees sos okey defo f o rkedam sh ofol sar [...] d f s y l k l r ar o r t l s d y dar teodar otadal sheky otchody r l File voyn/prs/cos.1/gud.wfr (original 168 words, truncated/filtered to 146 words, ''N'' = 63 distinct).
* Page f1r, unknown text type. Sample: fachys ykal ar ataiin shol shory cthres y kor sholdy sory ckhar or y [...] chol chok choty chotey dchaiin File voyn/prs/unk.1/gud.wfr (202 words, ''N'' = 153 distinct).
* Page f49v, unknown text type. Sample: kshor shol cphokchol qokchy qokchod sho chotchy chcthy cthy koddy okeod [...] ykchokeo r cheey daiin File voyn/prs/unk.2/gud.wfr (original 136 words, truncated/filtered to 134 words, ''N'' = 97 distinct).
* Page f66r, unknown text type. Sample: pdaiin oteedy opchedy chefchy shddy ypcher cholpchd okedals rair shekey [...] daiin chty File voyn/prs/unk.4/gud.wfr (original 296 words, truncated/filtered to 292 words, ''N'' = 216 distinct).
The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/gud.tlw.