Folder: MAIL/folders-splitted/vm-folders/voynich-98 From jguy@alphalink.com.au Tue Oct 6 08:58 EST 1998 Message-ID: <361B0233.5A20@alphalink.com.au> Reply-To: jguy@alphalink.com.au Date: Tue, 06 Oct 1998 22:54:59 -0700 From: jguy <jguy@alphalink.com.au> To: stolfi@dcc.unicamp.br Subject: Jorge's Chinese theory (well, I bear some responsibility too) Jorge Stolfi wrote, in answer to Glen Claston: > To a first approximation, the Voynichese words can be decomposed into > the following "elements": [fascinating stuff deleted] Having read that, I am almost convinced that it is Chinese, or a language with phonological properties similar to those of Chinese. I am also convinced that it is not Mandarin, and that, if it is Chinese, it is an archaic form, which still had initial consonant clusters, like kl for instance. Mind you, I guess Thai or Burmese could also fit the bill, but I know too little Thai and Burmese to to tell. > Yet they still fit 97% of all words in Rene's list (counting > multiplicities), and 94% of all words in the interlinear file. The > remaining 6% includes the words containing "whackos" (3% of all > words), and long words that look like the concatenation > of two ordinary ones. That convinces me. Those are astonishingly good figures. Jorge has the phonotactics of the language (or the structure of the cipher) cracked! > I can't imagine ... [snip] I agree with all that, but I'm only a linguist, with a bit of statistician thrown in. No cryptologist at all. > But many labels have only one Voynichese "word" I tend to think that most are the equivalent of our Fig.1, Fig.2, etc. in a very different. I hate to mention it, but Chinese has a set of characters, each with is own pronunciation, the purpose of which is ... how could I say? Referencing. Like A, B, C, D. Or the symbols you sometimes find for footnotes: <asterisk>, <cross>, <a double cross>, <the "paragraph" symbol> > 4. Why aren't there recognizable European grammatical structures > (gender/number agreement, noun and verb inflections, etc.) > A: Because Chinese doesn't have such things. True. There are no inflections at. There is reliable evidence that Chinese once had inflections, but even then, they were very reduced, even more so than modern English which is well on its way to losing all inflections. > 5. Why are there so many repeated and similar words? > A: Partly because Chinese often uses repeated words; > partly because Chinese words sound similar, and distinguished > by features (such as tone) which Westerners have a hard time > perceiving. Both true. > 6. Why did Sukhotin's algorithm (for vowel/consonant identification) > fail for Voynichese? > A: Because the algorithm depends on the fact that, in a Western > language with well-tuned alphabet, vowels generally alternate with > consonants; so that the counts of CV and VC digraphs (the "signal" > on which the identification is based) dominate over the CC and VV > counts (which are useless and potentially misleading "noise"). > Now, when Chinese is encoded with the Voynichese alphabet and then > transcribed to EVA, we get only *two* CV/VC digraphs in each word > (5-6 characters); or only *one* CV pair, and zero VC's, if space > is treated as a character. So the data given to Sukhotin's > algorithm was 15%-30% signal and 70-85% noise. No on two counts. Firstly, we do not know that it failed, because we still can't speak Voynichese. Secondly, because it assumes that the letters have been correctly identified. We are not even sure what constitutes a letter! Is <in> one or two letters? Is <iin> one or two or three letters? We suspect, but we don't know. And *that* in my view, is why Sukhotin's algorithm cannot give us a reliable answer. We're feeding probable garbage, we're probably getting garbage back. > 7. Why does Voynichese word and letter stats resemble those > of natural languages (e.g. Zipf's law), rather than those > of cipher text? > A: Because Voynichese is unencrypted Chinese. I prefer the stronger case: Because Voynichese is an unencrypted natural language. > 8. Why are there so few long words, when compared > to Latin or English? > A: Because Chinese words are monosyllabic, and syllables > are obviously limited in length. There are many languages with short, mostly monosyllabic words. Not only in Asia. Many in Africa too. > 9. Why are there two "languages", with radically different > vocabularies but with similar word structure? > A: Because the book is written in two Chinese dialects, (e.g. > Cantonese and Mandarin), and the differences between Chinese > dialects happen to be of that sort. That is true of the dialects of many, many languages. > 10. Why is there no punctuation? > A: Traditional Chinese writing didn't use punctuation, so why use > it in the new writing system? Besides, the author surely did not > know the language well enough to make up good punctuation rules. Plus, punctuation is rare in Medieval European manuscripts. One reinforces the other. > 11. Why do we find elements like <y>, <r>, <s>, <d> alone, but > not <k>, <ke>, <sh> etc.? > A: Because the latter are consonants, while former are vowels. Not *quite* necessarily. Some Chinese words/syllables consist of a single consonant, e.g. sì "four" is just "sss". The vowel is just an orthographic device. Others: cí "word" (t'sss), zì "character" (dzzz), shì "to be" (sshh) etc. Some other Asian languages have other one-consonant words like v, f, pf, bv (those count as one phoneme). Cantonese has m, and ng. Lots of Chinese people from Canton or Hong Kong are called Ng. But there are no words/syllables consisting of just p, t, k, b, d, g. So your remark is still valid. > 12. Why did the author write the VMs? > A: for any or all of these reasons: (1) to test and debug the > new writing system, somewhat likely (2) to convince the Chinese of the advantages > of alphabetic writing, most likely > > How does PTTTTH!!!! translate into Voynich, anyway? Isn't that the Piraha consonant that consist of a t co-articulated with a bilabial trill, a t with a raspberry in layman's words? Was the VMS written by Piraha Indians? You know, Piraha, the language with tones, 3 vowels and 7 consonants, spoken right next door to where Jorge lives? (I've been looking for a Piraha grammar or wordbook. No luck so far)