Folder: MAIL/folders-splitted/vm-folders/voynich-98
From jim@mail.rand.org Fri Oct  2 01:32 EST 1998
Message-ID: <3614C69C.545F@alphalink.com.au>
Reply-To: jguy@alphalink.com.au
References: <361399B5.57FF3800@sprint.ca>
Content-Transfer-Encoding: 7bit
Date: Fri, 02 Oct 1998 05:27:08 -0700
From: jguy <jguy@alphalink.com.au>
To: John Grove <4groves@sprint.ca>
CC: voynich@rand.org
Subject: Re: The Nature of the Analyst

John Grove wrote:

> 
>          The forms all have counterparts starting with <i>: <ig>, <x>, <2>,
> etc. We
> also have <a> = <c>+<i>. 

My view is that <c> and <i> are equivalent, each occurring in the
context
of strokes similar to itself. Cryptologia published an article of mine
a long time ago where I showed that the two sets of letters, the c-like
and the i-like, occurred in almost completely mutually exclusive
variation. That can be due (in a linguist's eye) to two things:

1. they are allographs of the same  grapheme (like the two form
   of small beta in Greek)
2. extension vowel or consonant harmony

Later, but still a long time ago, I argued on this list that <cc> and
<a> were two different ways of writing "a". It had not even occurred
to that <a> = <c> + <i>

> All the letters containing an initial
> "c"-curve are also the only letters that can be preceded in the same
> word by the little letter that looks
> like "c," e.g. <c89>, <ccc89>. On the other hand, the letters <x> and
> <2> (which have very high frequencies) can *never* be preceded by
> <c>, *ever*; they are instead
> preceded by <a>."

or <o>. 
 
>         Now the fact that he saw these things as 'two-stroke' characters
> seems promising to me -- as it supports my observations.  However, it
> may simply be that Currier was employed in roughly the same field as
> I work in - and thus analyzes things from the same perspective.  What
> was his job?  If he was a crytanalyst

He was. But I am a linguist, and I reported the same phenomenon. I did
not know about Currier at the time, either. So that makes his
observation
all the more credible. When results converge...
 
>         Jorge, on the other hand, has attacked the VMS from a linguistic
> point of view 

Jorge is a computer scientist. So now that's three viewpoints that
converge:
cryptology, linguistics, computer science.

> - there are just not enough characters in just the
> right places to form a simple alphabetic language 

Yes there are! Look in the archives, before the invention of EVA,
when we were  groping for a "pronounceable Voynich", I came up
with two: one looking like a sort of mock-Latin which sounded
grand and mysterious (good for ceremonial magic?), another, more
serious that looked much like a sort  of Indonesian. Piraha, a
south-american Indian language, has only three vowels and seven
consonants. But it has tones. So it doesn't Rotokas (in Papua
New-Guinea) which has five vowels and six consonants, but no
tones. Further, another thought. If you get your hands on 
a New Testament in Bislama, the Pidgin English of Vanuatu,
you'll see words like "God", and "kot" (coat), and "gyaman"
(to lie). But Bislama has neither g nor d. "God" are "coat"
are pronounced exactly alike: kot. What gives? The spelling
was devised by a Rev. Camden, a Presbyterian. He it right,
but the native Elders complained that it made the language
look too "childish". So, very stupidly, he caved in, and
decided on a spelling partly based on English. Hence "God"
instead of "Kot". And  "gyaman" instead of "kyaman". He
thought that "kyaman" was a distortion of "gammon" but...
it's a Chinese word! As for "mbusong" which his mob 
pronounced "putsong", he wrote it like he heard it:
"pujong". Well, if I tell you that it meand "cork" you
may guess the etymology: French bouchon. Perhaps a similar
thing happened with the VMS, which would explain its
haphazard spelling.

At any rate the Voynich alphabet has enough letters to 
write Rotokas or Piraha, and enough to spare to write
the allophonic variations of their phonemes. E.g. 
in Rotokas, t is pronounced ts or s before i. In
Piraha t is pronounced either plain t, or t
accompanied by a bilabial trill (the choice is
apparently at the speaker's whim)


> There are three things
> about the lines that make me believe the line itself is a functional
> unit. The frequency counts of the beginnings and endings of lines are
> markedly different from the counts of the same characters internally.

That is normal. The frequency counts of the beginnings and endings of
lines in Italian are markedly different from the medial ones.  Why?
Because an Italian word almost always ends in a vowel, but usually
starts with a consonant. No I haven't carried out any statistics on
that, but I learnt Italian reading Topolino when I was 15, and the
strange distribution of consonants and vowels had struck me: it did
not look like a "real" language. I mean... "tavola" to say "table"?
Are you joking? And this word: "popolo". Surely, Sir, you are
pulling my leg!


> There are, for instance, some characters that may not occur initially
> in a line. 

In Ancient Basque, no word could start with a "k", a "p", or a "t".
Ancient Basque had no "m". When Basque acquired an "m", it was from
"nb" becoming "mb", then "m", so no word could start (or end) with
an "m". Since line length varies in the VMS, we can infer that the
authors did not break words. Or, if they did, they did it at 
syllable boundaries. In all the languages I can think of right
now (French, English, Italian, Swahili, Spanish, Chinese, Japanese...)
the distribution of phonemes is quite different syllable-initially,
medially, and finally.

 
>         Okay, if the first character of a line is a line indicator of some
> sort... 

If the VMS is written in a natural human language, like the many I have
learnt, the even more many I know about,  then there is nothing there
to write home about. The scribes did not break words randomly, like th
is, that's  al
l fo
lks!


> Jorge?  You've got
> quite a computer system for crunching numbers -- Is this worth
> looking into?

Of course it is worth looking into. But you don't need a high-powered
number cruncher. If you search the archives, you'll find a Pascal 
program I wrote that computes letter frequencies in those different
positions. I must have written it almost 10 years ago when I had
only a state-of-the-art PC, a 386DX running at 33MHz, with a super
whopper 8M of RAM. I think that, somewhere in those archives,
there are tables of letter frequencies produced by that thing.

No, if there is a cipher there I think it can only be a Bacon
cipher, but not binary. If gallows were instructions to switch
to a different coding wheel, we'd observe strikingly different
letter frequencies between gallows, and strikingly different
letter groups. We don't.