Folder: MAIL/folders-splitted/vm-folders/voynich-98
From jguy@alphalink.com.au Tue Oct  6 08:58 EST 1998
Message-ID: <361B0233.5A20@alphalink.com.au>
Reply-To: jguy@alphalink.com.au
Date: Tue, 06 Oct 1998 22:54:59 -0700
From: jguy <jguy@alphalink.com.au>
To: stolfi@dcc.unicamp.br
Subject: Jorge's Chinese theory (well, I bear some responsibility too)

Jorge Stolfi wrote, in answer to Glen Claston:

>   To a first approximation, the Voynichese words can be decomposed into
>   the following "elements":

[fascinating stuff deleted]

Having read that, I am almost convinced that it is Chinese, or 
a language with phonological properties similar to those of Chinese.
I am also convinced that it is not Mandarin, and that, if it is
Chinese, it is an archaic form, which still had initial consonant
clusters, like kl for instance. Mind you, I guess Thai or Burmese
could also fit the bill, but I know too little Thai and Burmese to
to tell. 
>   Yet they still fit 97% of all words in Rene's list (counting
>   multiplicities), and 94% of all words in the interlinear file. The
>   remaining 6% includes the words containing "whackos" (3% of all
>   words), and long words that look like the concatenation
>   of two ordinary ones.
That convinces me. Those are astonishingly good figures. Jorge
has the phonotactics of the language (or the structure of the cipher) 
cracked!
 
> I can't imagine ...
[snip]
I agree with all that, but I'm only a linguist, with a bit of
statistician thrown in. No cryptologist at all.
 
> But many labels have only one Voynichese "word"

I tend to think that most are the equivalent of our Fig.1, Fig.2, etc.
in a very different. I hate to mention it, but Chinese has a
set of characters, each with is own pronunciation, the purpose
of which is ... how could I say? Referencing. Like A, B, C, D. Or
the symbols you sometimes find for footnotes: <asterisk>, <cross>, 
<a double cross>, <the "paragraph" symbol>

 
>   4. Why aren't there recognizable European grammatical structures
>   (gender/number agreement, noun and verb inflections, etc.)
 
>     A: Because Chinese doesn't have such things.
True. There are no inflections at. There is reliable evidence that
Chinese once had inflections, but even then, they were very reduced,
even more so than modern English which is well on its way to losing
all inflections.
 
>   5. Why are there so many repeated and similar words?
 
>     A: Partly because Chinese often uses repeated words;
>     partly because Chinese words sound similar, and distinguished
>     by features (such as tone) which Westerners have a hard time
>     perceiving.

Both true.
 
>   6. Why did Sukhotin's algorithm (for vowel/consonant identification)
>   fail for Voynichese?
 
>     A: Because the algorithm depends on the fact that, in a Western
>     language with well-tuned alphabet, vowels generally alternate with
>     consonants; so that the counts of CV and VC digraphs (the "signal"
>     on which the identification is based) dominate over the CC and VV
>     counts (which are useless and potentially misleading "noise").
>     Now, when Chinese is encoded with the Voynichese alphabet and then
>     transcribed to EVA, we get only *two* CV/VC digraphs in each word
>     (5-6 characters); or only *one* CV pair, and zero VC's, if space
>     is treated as a character. So the data given to Sukhotin's
>     algorithm was 15%-30% signal and 70-85% noise.

No on two counts. Firstly, we do not know that it failed, because we
still can't speak Voynichese. Secondly, because it assumes that the
letters have been correctly identified. We are not even sure what
constitutes a letter! Is <in> one or two letters? Is <iin> one or
two or three letters? We suspect, but we don't know. And *that* in
my view, is why Sukhotin's algorithm cannot give  us a reliable answer.
We're feeding probable garbage, we're probably getting garbage back.

 
>   7. Why does Voynichese word and letter stats resemble those
>     of natural languages (e.g. Zipf's law), rather than those
>     of cipher text?
 
>      A: Because Voynichese is unencrypted Chinese.

I prefer the stronger case:
 
Because Voynichese is an unencrypted natural language.
 
>   8. Why are there so few long words, when compared
>   to Latin or English?
 
>      A: Because Chinese words are monosyllabic, and syllables
>      are obviously limited in length.

There are many languages with short, mostly monosyllabic words.
Not only in Asia. Many in Africa too.
 
>   9. Why are there two "languages", with radically different
>   vocabularies but with similar word structure?
 
>      A: Because the book is written in two Chinese dialects, (e.g.
>      Cantonese and Mandarin), and the differences between Chinese
>      dialects happen to be of that sort.

That is true of the dialects of many, many languages.
 
>  10. Why is there no punctuation?
 
>      A: Traditional Chinese writing didn't use punctuation, so why use
>      it in the new writing system? Besides, the author surely did not
>      know the language well enough to make up good punctuation rules.

Plus, punctuation is rare in Medieval European manuscripts. One
reinforces
the other.
 
>  11. Why do we find elements like <y>, <r>, <s>, <d> alone, but
>      not <k>, <ke>, <sh> etc.?
 
>      A: Because the latter are consonants, while former are vowels.


Not *quite* necessarily. Some Chinese words/syllables consist of a 
single consonant, e.g. sì "four" is just "sss". The vowel is just
an orthographic device. Others: cí "word" (t'sss), zì "character"
(dzzz), shì "to be" (sshh) etc. Some other Asian languages have
other one-consonant words like v, f, pf, bv (those count as one
phoneme). Cantonese has m, and ng. Lots of Chinese people from
Canton or Hong Kong are called Ng. But there are no words/syllables
consisting of just p, t, k, b, d, g. So your remark is still valid.
 
>   12. Why did the author write the VMs?
 
>      A: for any or all of these reasons: (1) to test and debug the
>      new writing system, 
       somewhat likely

(2) to convince the Chinese of the advantages
>      of alphabetic writing, 
most likely

 
>     > How does PTTTTH!!!! translate into Voynich, anyway?

Isn't  that the Piraha consonant that consist of a 
t co-articulated with a bilabial trill, a t with a raspberry
in layman's words? Was the VMS written by Piraha Indians? You
know, Piraha, the language with tones, 3 vowels and 7 consonants,
spoken right next door to where Jorge lives? (I've been looking
for a Piraha grammar or wordbook.  No luck so far)