From jim@mail.rand.org  Sun May  7 23:49:31 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id XAA07293
	for <reeds@fry.research.att.com>; Sun, 7 May 2000 23:49:31 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id A18974CE0F; Sun,  7 May 2000 23:49:31 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id 23ABB4CE06
	for <reeds@research.att.com>; Sun,  7 May 2000 23:49:27 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id UAA08418; Sun, 7 May 2000 20:46:17 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA12769; Sun, 7 May 2000 20:46:16 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id UAA03979 for <voynich@rand.org>; Sun, 7 May 2000 20:44:34 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA12749 for <voynich@rand.org>; Sun, 7 May 2000 20:44:34 -0700 (PDT)
Received: from yarf.eecs.umich.edu (yarf.eecs.umich.edu [141.213.12.211]) by mail03-lax.pilot.net with ESMTP id UAA12086 for <voynich@rand.org>; Sun, 7 May 2000 20:44:33 -0700 (PDT)
Received: (from kckluge@localhost)
	by yarf.eecs.umich.edu (8.9.3/8.9.1) id XAA10480;
	Sun, 7 May 2000 23:44:28 -0400 (EDT)
Date: Sun, 7 May 2000 23:44:28 -0400 (EDT)
Message-Id: <200005080344.XAA10480@yarf.eecs.umich.edu>
From: Karl Kluge <kckluge@eecs.umich.edu>
To: voynich@rand.org
Subject: Concensus transcription?
Sender: jim@mail.rand.org
Status: O


Howdy,

I recall seeing someone refer to a transcription based on majority vote 
of the collated existing transcriptions, but can't seem to find it. I'm 
revisiting finite state automaton induction on the "words" in the Biological
B folios, and want to have the best transcript possible.

BTW, the initial results are interesting...in particular, I don't think
it can be what we've been calling a "verbose cipher" (single plaintext
letter maps to Voynich letter combinations of different lengths) unless
Currier C, S, and Z are nulls (at least in some/many cases). When I hit
a sufficiently frustrating wall, I'll write up what I've been doing and
post it so folks can see what the evidence and reasoning is.

Karl

From jim@mail.rand.org  Mon May  8 02:21:44 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id CAA88307
	for <reeds@fry.research.att.com>; Mon, 8 May 2000 02:21:44 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 560C01E00D; Mon,  8 May 2000 02:21:44 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (unknown [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id D8B8B1E008
	for <reeds@research.att.com>; Mon,  8 May 2000 02:21:43 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id XAA24373; Sun, 7 May 2000 23:20:25 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id XAA15657; Sun, 7 May 2000 23:20:24 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id XAA06397 for <voynich@rand.org>; Sun, 7 May 2000 23:18:26 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id XAA15605 for <voynich@rand.org>; Sun, 7 May 2000 23:18:25 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail01-lax.pilot.net with ESMTP id XAA23990 for <voynich@rand.org>; Sun, 7 May 2000 23:18:24 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id DAA21516
	for <voynich@rand.org>; Mon, 8 May 2000 03:18:09 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id DAA15222
	for <voynich@rand.org>; Mon, 8 May 2000 03:18:09 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id DAA13809;
	Mon, 8 May 2000 03:18:07 -0300 (EST)
Date: Mon, 8 May 2000 03:18:07 -0300 (EST)
Message-Id: <200005080618.DAA13809@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@dcc.unicamp.br>
To: voynich@rand.org
Subject: Re: Concensus transcription?
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
In-Reply-To: <200005080344.XAA10480@yarf.eecs.umich.edu>
References: <200005080344.XAA10480@yarf.eecs.umich.edu>
Reply-To: stolfi@dcc.unicamp.br
Sender: jim@mail.rand.org
Status: OR


    > I recall seeing someone refer to a transcription based on majority vote 
    > of the collated existing transcriptions, but can't seem to find it. I'm 
    > revisiting finite state automaton induction on the "words" in the Biological
    > B folios, and want to have the best transcript possible.

I have "majority vote" and a "consensus" version of the interlinear file.
I thought I had mentioned it here, but I notice that it is not listed 
in my Voynich pages.  Anyway, here it is:

http://www.dcc.unicamp.br/~stolfi/voynich/Notes/045/inter-cm.evt

It is in EVA encoding, basically in the EVMT format used by Gabriel
and Rene. The majority version is marked with transcriber code ";A>",
the consensus one with ";Y>". You should be able to use Rene's VTT
tool to extract the one you need.

You should also consider using Takeshi's version (code ";H>"), which
is the only complete one so far.

I used simple majority for the "A" version. Perhaps I should have
given different weights to different transcribers. But that would
would have been easy: a transcriber's reliability seems to vary with
page and character. (For instance, Currier often disagrees with
Friedman/D'Imperio on "i" vs "ii".) That may be another reason to use
Takeshi's version...

All the best,

--stolfi

From jim@mail.rand.org  Mon May  8 03:40:41 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id DAA92913
	for <reeds@fry.research.att.com>; Mon, 8 May 2000 03:40:41 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 585DB4CE2A; Mon,  8 May 2000 03:40:41 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-blue.research.att.com (Postfix) with ESMTP id 9E3644CE20
	for <reeds@research.att.com>; Mon,  8 May 2000 03:40:40 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id AAA09891; Mon, 8 May 2000 00:39:56 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id AAA17311; Mon, 8 May 2000 00:39:55 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id AAA10394 for <voynich@rand.org>; Mon, 8 May 2000 00:38:28 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id AAA17269 for <voynich@rand.org>; Mon, 8 May 2000 00:38:27 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail02-lax.pilot.net with ESMTP id AAA09570 for <voynich@rand.org>; Mon, 8 May 2000 00:38:25 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id EAA22438
	for <voynich@rand.org>; Mon, 8 May 2000 04:38:22 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id EAA20211
	for <voynich@rand.org>; Mon, 8 May 2000 04:38:21 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id EAA13846;
	Mon, 8 May 2000 04:38:21 -0300 (EST)
Date: Mon, 8 May 2000 04:38:21 -0300 (EST)
Message-Id: <200005080738.EAA13846@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@dcc.unicamp.br>
To: voynich@rand.org
Subject: Re: Concensus transcription?
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
In-Reply-To: <200005080618.DAA13809@coruja.dcc.unicamp.br>
References: <200005080344.XAA10480@yarf.eecs.umich.edu>
	<200005080618.DAA13809@coruja.dcc.unicamp.br>
Reply-To: stolfi@dcc.unicamp.br
Sender: jim@mail.rand.org
Status: OR

    
My public apologies for misspelling of Jim Gillogly's last name in the
consensus/majority interlinear file. (I now fixed the bug in the
master file, but the fix will not show up until I make a new release
of the interlinear.)
    
All the best,

--stolfi

From jim@mail.rand.org  Tue May  9 23:27:10 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id XAA09041
	for <reeds@fry.research.att.com>; Tue, 9 May 2000 23:27:10 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 362424CE25; Tue,  9 May 2000 23:27:05 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id AE0EF4CE18
	for <reeds@research.att.com>; Tue,  9 May 2000 23:26:54 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id UAA04276; Tue, 9 May 2000 20:26:10 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA04829; Tue, 9 May 2000 20:26:09 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id UAA09068 for <voynich@rand.org>; Tue, 9 May 2000 20:25:53 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA04819 for <voynich@rand.org>; Tue, 9 May 2000 20:25:53 -0700 (PDT)
Received: from yarf.eecs.umich.edu (yarf.eecs.umich.edu [141.213.12.211]) by mail02-lax.pilot.net with ESMTP id UAA23844 for <voynich@rand.org>; Tue, 9 May 2000 20:25:52 -0700 (PDT)
Received: (from kckluge@localhost)
	by yarf.eecs.umich.edu (8.9.3/8.9.1) id XAA13371;
	Tue, 9 May 2000 23:25:39 -0400 (EDT)
Date: Tue, 9 May 2000 23:25:39 -0400 (EDT)
Message-Id: <200005100325.XAA13371@yarf.eecs.umich.edu>
From: Karl Kluge <kckluge@eecs.umich.edu>
To: ixohoxi@micro-net.com
Cc: voynich@rand.org
In-reply-to: <39175DA3.114E426F@micro-net.com> (message from Dennis on Mon, 08
	May 2000 19:36:51 -0500)
Subject: Re: Concensus transcription?
References: <200005080344.XAA10480@yarf.eecs.umich.edu> <39175DA3.114E426F@micro-net.com>
Sender: jim@mail.rand.org
Status: OR


Dennis,

> This is probably from Stolfi: http://www.dcc.unicamp.br/~stolfi/voynich/

Thanks, he also sent a pointer to this.

> > BTW, the initial results are interesting...in particular, I don't think
> > it can be what we've been calling a "verbose cipher" (single plaintext
> > letter maps to Voynich letter combinations of different lengths) unless
> > Currier C, S, and Z are nulls (at least in some/many cases). When I hit
> > a sufficiently frustrating wall, I'll write up what I've been doing and
> > post it so folks can see what the evidence and reasoning is.
> 
>     This sounds very interesting, pretty close to what I'm plannig.  I'll be
> looking forward to what you get

With respect to the issue of nulls, the reasoning is somewhat as follows:
If you have an induced DFSA derived from some word list (in this case,
words which occur 3+ times in Biological B), then if you enumerate possible
substrings that can occur between final states in the DFSA that should give
you character combos and fragments of character combos from the verbose cipher
(verified using what I called "Tiltman-encoded Genesis"). What you in fact
get with the Biol B vocab list is a list of substrings longer than the input
word list (!), and it is clear that Currier C, S, and Z are to blame.

Unfortunately, if this is a verbose cipher then it looks as if the construction
of the letter combinations was not terribly orthogonal (for want of a better
word). My current best guess involves [OA][EMRNJ] (i.e., OE, AE, OM, AM, etc.)
+ A[TD6] + A, B, E, F, M, O, R, U, 2, 4, 8 and 9. (The assumes V = B and P = F
as per D'Imperio's speculation based on the one repeated marginal sequence and
that Q, W, X, and Y map to P, B, F, and V when the surrounding S is removed.)
Unfortunately, F, 2, and 9 are too common with this mapping, and the entropy is
still "too low."

Karl

From jim@mail.rand.org  Wed May 10 04:35:23 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id EAA37372
	for <reeds@fry.research.att.com>; Wed, 10 May 2000 04:35:23 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id A0D9F1E00C; Wed, 10 May 2000 04:35:18 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id 1AAE41E018
	for <reeds@research.att.com>; Wed, 10 May 2000 04:35:08 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id BAA17197; Wed, 10 May 2000 01:34:35 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id BAA13106; Wed, 10 May 2000 01:34:34 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id BAA20459 for <voynich@rand.org>; Wed, 10 May 2000 01:34:26 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id BAA13096 for <voynich@rand.org>; Wed, 10 May 2000 01:34:25 -0700 (PDT)
Received: from sun7.bham.ac.uk (sun7.bham.ac.uk [147.188.128.108]) by mail01-lax.pilot.net with ESMTP id BAA17170 for <voynich@rand.org>; Wed, 10 May 2000 01:34:24 -0700 (PDT)
Received: from ds13.bham.ac.uk ([147.188.72.20] helo=golem)
	by sun7.bham.ac.uk with esmtp (Exim 3.03 #1)
	id 12pRxX-0003kn-00
	for voynich@rand.org; Wed, 10 May 2000 09:35:11 +0100
From: "Gabriel Landini" <G.Landini@bham.ac.uk>
Organization: The University of Birmingham, UK.
To: voynich@rand.org
Date: Wed, 10 May 2000 09:33:02 +0100
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Re: Concensus transcription?
Reply-To: G.Landini@bham.ac.uk
In-reply-to: <200005100325.XAA13371@yarf.eecs.umich.edu>
References: <39175DA3.114E426F@micro-net.com> (message from Dennis on Mon, 08	May 2000 19:36:51 -0500)
X-mailer: Pegasus Mail for Win32 (v3.12a)
Message-Id: <E12pRxX-0003kn-00@sun7.bham.ac.uk>
Sender: jim@mail.rand.org
Status: OR

On 9 May 00, at 23:25, Karl Kluge wrote:
> What you in fact get with the Biol B vocab
> list is a list of substrings longer than the input word list (!), and
> it is clear that Currier C, S, and Z are to blame.

I still do not understand this fully, but it seems interesting. What 
happens if you analyse the "dain-daiin" encoding?

Cheers,

Gabriel



From jim@mail.rand.org  Thu May 11 20:34:57 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id UAA01168
	for <reeds@fry.research.att.com>; Thu, 11 May 2000 20:34:56 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 704C84CE2E; Thu, 11 May 2000 20:34:56 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id D10B74CE2C
	for <reeds@research.att.com>; Thu, 11 May 2000 20:34:55 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id RAA29121; Thu, 11 May 2000 17:25:51 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id RAA27947; Thu, 11 May 2000 17:25:50 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id RAA13657 for <voynich@rand.org>; Thu, 11 May 2000 17:25:44 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id RAA27930 for <voynich@rand.org>; Thu, 11 May 2000 17:25:43 -0700 (PDT)
Received: from hme0.mailrouter01.sprint.ca (hme0.mailrouter01.sprint.ca [207.107.250.16]) by mail03-lax.pilot.net with ESMTP id RAA19535 for <voynich@rand.org>; Thu, 11 May 2000 17:25:42 -0700 (PDT)
Received: from outlander (spc-isp-ott-uas-24-35.sprint.ca [209.148.160.236])
	by hme0.mailrouter01.sprint.ca (8.8.8/8.8.8) with SMTP id UAA02781
	for <voynich@rand.org>; Thu, 11 May 2000 20:25:16 -0400 (EDT)
Message-ID: <000501bfbba9$3d0a1000$eca094d1@outlander>
Reply-To: "John Grove" <4groves@sprint.ca>
From: "John Grove" <4groves@sprint.ca>
To: "Voynich List" <voynich@rand.org>
References: <391B9DBF.87F@alphalink.com.au>
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
Date: Thu, 11 May 2000 20:28:57 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: jim@mail.rand.org
Status: OR

Well, things are slow on the VMS front so I'll bite for what its worth.

    I'm not a statistician but it doesn't seem that an all-encompassing
formula can be applied in all cases of undeciphered text. The Egyptian
hieroglyphic symbols are quite extant as well as the corpus - but they
weren't deciphered until a short text was found with three versions of the
same underlying intelligence.

    Secondly, consider a numerical cipher to an undeciphered script and you
produce something like 44 12 89 31 56 10 3 92 1 33 31 56 12 89 31 56 44 1 12
18 44 etc... So really, you only have 0-9 in characters, but the frequency
of use will give way to the discovery of the text without having to have 100
characters. This is assuming the analyst takes a stab at the underlying
language and knows the frequency of character usage. Additionally, if our
script is written with actually word separation as an undeciphered 'natural'
language would be - then we have added information that the all-encompassing
formula hasn't taken into consideration: 441289315610392 133 315612
893156441121844 etc...

    This may be a poor example, but it seems to me that the Squaring of the
Character Set has nothing to do with the decipherability of any given text.
To experiment, perhaps your friend might try giving a short
simple-substitution foreign language script to a study group and ask them to
discover the text. Using a non-familiar script with another known language.
For example, tell the students that the language is a Romance language
written in the Cyrillic script - can they discover which Romance language
and the meaning of the text. - Try varying the length of the text - some
above the Square, and some below the Square. See if anyone can solve the
shorter ones - offer Bonus marks to the shortest text deciphered!

    Just a thought!

    John.

----- Original Message -----
From: Jacques Guy <jguy@alphalink.com.au>
To: <voynich@rand.org>
Sent: Friday, May 12, 2000 1:59 AM
Subject: John Chadwick (Linear B) of corpus size. Comments invited.


> I have been corresponding with Andrew Robinson, who is
> literary editor for the (London) Times Higher Education
> Supplement and is working on a book on undeciphered
> scripts.
>
> In the introduction he writes this about the amount of
> corpus necessary for decipherment, taking  the Phaistos
> Disk (45 different signs) against Linear B and Linear
> A:
>
> "[John] Chadwick was not saying that if you had n
> squared characters, a script would definitely be
> decipherable. But he was suggesting that since 45
> squared equals 2025, and this number is almost ten
> times 250, a Phaistos disc decipherment is currently
> impossible. There were several tens of thousands of
> characters of Linear B available to Ventris--which is
> several times bigger than 7569, the square of 87, the
> number of basic Linear B signs. Linear A, with about
> 7500 characters and perhaps 100 signs (n squared equals
> 10,000), is correspondingly less likely to be
> deciphered."
>
>
> I answered:
>
>
> "No, that statististics is meaningless. Imagine a
> writing system with 32 symbols; n=32, and n squared
> 32x32=1024, and call it the minimum amount of text for
> decipherability. Now, encode that same script into a
> binary system (e.g. a => 0000, b => 0001...), and
> translate your corpus into it. The information is
> preserved. You have now 2 symbols. The minimum amount
> of text for decipherability is now... 4 symbols long!
> (Not enough to fit a single one of the 32)"
>
>
> And he asked back:
>
> "I am going to try Chadwick's 'formula' on Elisabeth
> Barber ("Archaeological Decipherment") and Whitfield
> Diffie (cryptographer, whose wife is an Egyptologist).
> But can you just clarify for me--you are saying that
> Chadwick's idea is wrong, not just that my numbers are
> wrong--yes?"
>
> Yes, my point was that Chadwick's formula is dead
> wrong. However, I would like other opinions. I know, I
> know, over the years, we have thrashed this matter to
> the death. Robinson's book will be aimed at the general
> public (he'd previously written "The Story of Writing"
> (Thames and Hudson, 1995), and I would hate to see
> nonsense like "Chadwick's formula" fed to a wide
> readership. IF it is nonsense. I think it is, but I
> prefer not to trust my judgment. Comments, everybody?
>
>
> Frogguy
>

From jim@mail.rand.org  Thu May 11 22:29:47 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id WAA32337
	for <reeds@fry.research.att.com>; Thu, 11 May 2000 22:29:47 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 078BE4CE27; Thu, 11 May 2000 22:29:47 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-blue.research.att.com (Postfix) with ESMTP id 65B9B4CE21
	for <reeds@research.att.com>; Thu, 11 May 2000 22:29:46 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id TAA16508; Thu, 11 May 2000 19:27:36 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id TAA01985; Thu, 11 May 2000 19:27:35 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id TAA20553 for <voynich@rand.org>; Thu, 11 May 2000 19:27:29 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id TAA01971 for <voynich@rand.org>; Thu, 11 May 2000 19:27:29 -0700 (PDT)
Received: from smtppop3.gte.net (smtppop3.gte.net [207.115.153.22]) by mail02-lax.pilot.net with ESMTP id TAA18666 for <voynich@rand.org>; Thu, 11 May 2000 19:27:28 -0700 (PDT)
Received: from gte.net (1Cust239.tnt1.honolulu.hi.da.uu.net [63.28.92.239])
	by smtppop3.gte.net  with ESMTP
	for <voynich@rand.org>; id VAA23118358
	Thu, 11 May 2000 21:26:30 -0500 (CDT)
Message-ID: <391B6D3A.9C33D2F0@gte.net>
Date: Thu, 11 May 2000 16:32:28 -1000
From: Brian Eric Farnell <bfarnell@gte.net>
X-Mailer: Mozilla 4.72 [en] (Win98; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Voynich List <voynich@rand.org>
Subject: Specialty words
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR

    I haven't had a chance to catch up on 8 years of e-mail, so my ideas
may be old hat but here goes.  I'm a Chinese linguist with some basic
experience in cryptography.  I don't have the software or know-how to do
this stuff, but I would (from my Chinese experience) approach the
manuscript's words instead of symbols.  To do this we would have to
assume (on shaky ground) that the spaces correspond to word breaks.
Once we do this and have a baseline word frequency (already done I
understand) an effort should be made to break the text as well as
possible into subjects.  Perhaps the pictures don't delineate changes in
subjects---perhaps they do.  If they do, it should be possible to
compare those section's word frequency to the whole and thus produce a
list of 'specialty words'.  In an astronomy text one can reasonably
expect to see words like 'star' and 'planet' far more often than in a
biology book.  Once a list of words with a high probability of being
subject specialized has been determined, than consider the cipher
amongst those words only.  Look for things like 'star' being in the
pattern ABCD, which makes 'planet' EFCGHB, because of the sharing of the
letters 'a' and 't' amongst the two.  Comparing these similarities in
specialty words to lists of specialty words generated in languages
considered possible 'hits' should help ident the language. Words where a
letter occurs twice or start and end with the same letter are a gold
mine in this deciphering technique. Some sections would be more fruitful
than others.  Some subjects lend themselves to flowery descriptions and
metaphysical allusions, but stuff that's very hands-on should be written
in ordinary language as a matter of habit.  Recipes for instance are
likely to  contain a very high frequency of 'measure words' that you
won't find anywhere else.  This method also has a high probability of
correctly ident-ing the language even if Voynich is written in an
obscure regional dialect---or even written by someone improperly
schololed in the language he was writing in.  This is because of
principles set forth in Grimm's law.  They (the Grimm brothers) studied
Germanic languages and discovered that languages shift and change in
regular patterns.  They set forth rules that turn translation of one
Germanic language to another into a substitution cipher (to oversimplify
things).  For instance English is much softer than German, the German
'tag' becomes the English 'day' as the less harsh tongue takes the 't'
into a 'd' and gets lazy on the endings, dropping hard 'g's in favor of
the less voiced 'y' modification.  I saw a demonstration of this in my
German class years ago, given a long German passage that none of us had
a clue about, then given a set of rules we were able to translate it
easily into something akin to Old English and then had no trouble
understanding.  This method could very possibly produce what seems to be
a positive 'hit' on one language for the specialty words and then seems
to fail the rest of the manuscript.  My personal assumption is that
Voynich statistically looks funny because it is written in two or more
languages.  I'm not talking about the differences in Voynich A and B,
but the idea that a lot of 'scholarly words' in the text might be
something like Latin or Greek while the rest could be in a common
language, similar to any medical or legal textbook you find at modern
universities.  The differences in Voynich A and B may be due to a
difference in classical education.  Again, very practical sections of
the manuscript are the gold mine, they are likely to have far fewer
words that aren't in the common language.  'Old habits die hard' in this
case would be a saving grace.  I think if this sort of analyses were to
be attempted, dropping the endings from the words in the  lists made
from possible languages should also be done for comparison.  The words
in Voynich seem too short and the little I've seen shows way too many
common letter sequences in the beginning of words, these combinations
look like verb or noun endings to me.  They may have been chopped off
and added to non-sense letters or to nulls.  Any thoughts?

Respectfully,
Brian Farnell


From jim@mail.rand.org  Thu May 11 23:03:58 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id XAA90322
	for <reeds@fry.research.att.com>; Thu, 11 May 2000 23:03:57 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id B22241E010; Thu, 11 May 2000 23:03:42 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-green.research.att.com (Postfix) with ESMTP id 141201E008
	for <reeds@research.att.com>; Thu, 11 May 2000 23:03:42 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id UAA25523; Thu, 11 May 2000 20:03:14 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA02815; Thu, 11 May 2000 20:03:14 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id UAA23322 for <voynich@rand.org>; Thu, 11 May 2000 20:03:09 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id UAA02805 for <voynich@rand.org>; Thu, 11 May 2000 20:03:08 -0700 (PDT)
Received: from mtiwmhc23.worldnet.att.net (mtiwmhc23.worldnet.att.net [204.127.131.48]) by mail01-lax.pilot.net with ESMTP id UAA01476 for <voynich@rand.org>; Thu, 11 May 2000 20:03:07 -0700 (PDT)
Received: from worldnet.att.net ([12.79.22.175])
          by mtiwmhc23.worldnet.att.net
          (InterMail vM.4.01.02.39 201-229-119-122) with ESMTP
          id <20000512030236.NHBY3646.mtiwmhc23.worldnet.att.net@worldnet.att.net>;
          Fri, 12 May 2000 03:02:36 +0000
Message-ID: <391B75B8.70A3F00A@worldnet.att.net>
Date: Thu, 11 May 2000 23:08:40 -0400
From: John Stojko <oko@worldnet.att.net>
X-Mailer: Mozilla 4.5 [en]C-WNS5.0  (Win95; I)
X-Accept-Language: en,uk
MIME-Version: 1.0
To: Brian Eric Farnell <bfarnell@gte.net>, voynich@rand.org
Subject: Re: Specialty words
References: <391B6D3A.9C33D2F0@gte.net>
Content-Type: text/plain; charset=x-user-defined
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR

Hi Brian,

I like your approach of solving the VMS puzle.
I done the solution almost the same way which
I call, brute-force.
Visit my Home Page http://home.att.net/~oko/home.htm

John

Brian Eric Farnell wrote:
> 
>     I haven't had a chance to catch up on 8 years of e-mail, so my ideas
> may be old hat but here goes.  I'm a Chinese linguist with some basic
> experience in cryptography.  I don't have the software or know-how to do
> this stuff, but I would (from my Chinese experience) approach the
> manuscript's words instead of symbols.  To do this we would have to
> assume (on shaky ground) that the spaces correspond to word breaks.
> Once we do this and have a baseline word frequency (already done I
> understand) an effort should be made to break the text as well as
> possible into subjects.  Perhaps the pictures don't delineate changes in
> subjects---perhaps they do.  If they do, it should be possible to
> compare those section's word frequency to the whole and thus produce a
> list of 'specialty words'.  In an astronomy text one can reasonably
> expect to see words like 'star' and 'planet' far more often than in a
> biology book.  Once a list of words with a high probability of being
> subject specialized has been determined, than consider the cipher
> amongst those words only.  Look for things like 'star' being in the
> pattern ABCD, which makes 'planet' EFCGHB, because of the sharing of the
> letters 'a' and 't' amongst the two.  Comparing these similarities in
> specialty words to lists of specialty words generated in languages
> considered possible 'hits' should help ident the language. Words where a
> letter occurs twice or start and end with the same letter are a gold
> mine in this deciphering technique. Some sections would be more fruitful
> than others.  Some subjects lend themselves to flowery descriptions and
> metaphysical allusions, but stuff that's very hands-on should be written
> in ordinary language as a matter of habit.  Recipes for instance are
> likely to  contain a very high frequency of 'measure words' that you
> won't find anywhere else.  This method also has a high probability of
> correctly ident-ing the language even if Voynich is written in an
> obscure regional dialect---or even written by someone improperly
> schololed in the language he was writing in.  This is because of
> principles set forth in Grimm's law.  They (the Grimm brothers) studied
> Germanic languages and discovered that languages shift and change in
> regular patterns.  They set forth rules that turn translation of one
> Germanic language to another into a substitution cipher (to oversimplify
> things).  For instance English is much softer than German, the German
> 'tag' becomes the English 'day' as the less harsh tongue takes the 't'
> into a 'd' and gets lazy on the endings, dropping hard 'g's in favor of
> the less voiced 'y' modification.  I saw a demonstration of this in my
> German class years ago, given a long German passage that none of us had
> a clue about, then given a set of rules we were able to translate it
> easily into something akin to Old English and then had no trouble
> understanding.  This method could very possibly produce what seems to be
> a positive 'hit' on one language for the specialty words and then seems
> to fail the rest of the manuscript.  My personal assumption is that
> Voynich statistically looks funny because it is written in two or more
> languages.  I'm not talking about the differences in Voynich A and B,
> but the idea that a lot of 'scholarly words' in the text might be
> something like Latin or Greek while the rest could be in a common
> language, similar to any medical or legal textbook you find at modern
> universities.  The differences in Voynich A and B may be due to a
> difference in classical education.  Again, very practical sections of
> the manuscript are the gold mine, they are likely to have far fewer
> words that aren't in the common language.  'Old habits die hard' in this
> case would be a saving grace.  I think if this sort of analyses were to
> be attempted, dropping the endings from the words in the  lists made
> from possible languages should also be done for comparison.  The words
> in Voynich seem too short and the little I've seen shows way too many
> common letter sequences in the beginning of words, these combinations
> look like verb or noun endings to me.  They may have been chopped off
> and added to non-sense letters or to nulls.  Any thoughts?
> 
> Respectfully,
> Brian Farnell

From jim@mail.rand.org  Thu May 11 19:05:13 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id TAA28635
	for <reeds@fry.research.att.com>; Thu, 11 May 2000 19:05:13 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id F261F4CE09; Thu, 11 May 2000 19:05:08 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id 61E834CE05
	for <reeds@research.att.com>; Thu, 11 May 2000 19:04:57 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id QAA25679; Thu, 11 May 2000 16:02:11 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id QAA20181; Thu, 11 May 2000 16:02:10 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id QAA04185 for <voynich@rand.org>; Thu, 11 May 2000 16:01:13 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id QAA20127 for <voynich@rand.org>; Thu, 11 May 2000 16:01:12 -0700 (PDT)
Received: from mail.alphalink.com.au (mail.alphalink.com.au [203.24.205.7]) by mail03-lax.pilot.net with ESMTP id QAA20402 for <voynich@rand.org>; Thu, 11 May 2000 16:01:10 -0700 (PDT)
Received: from LOCALNAME (d22-as7-mel.alphalink.com.au [202.161.96.213])
	by mail.alphalink.com.au (8.9.3/8.9.3) with SMTP id JAA00582
	for <voynich@rand.org>; Fri, 12 May 2000 09:01:00 +1000
Message-ID: <391B9DBF.87F@alphalink.com.au>
Date: Thu, 11 May 2000 22:59:27 -0700
From: Jacques Guy <jguy@alphalink.com.au>
Reply-To: jguy@alphalink.com.au
X-Mailer: Mozilla 3.01 (Win16; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: John Chadwick (Linear B) of corpus size. Comments invited.
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR

I have been corresponding with Andrew Robinson, who is
literary editor for the (London) Times Higher Education
Supplement and is working on a book on undeciphered
scripts.

In the introduction he writes this about the amount of
corpus necessary for decipherment, taking  the Phaistos
Disk (45 different signs) against Linear B and Linear
A:

"[John] Chadwick was not saying that if you had n
squared characters, a script would definitely be
decipherable. But he was suggesting that since 45
squared equals 2025, and this number is almost ten
times 250, a Phaistos disc decipherment is currently
impossible. There were several tens of thousands of
characters of Linear B available to Ventris--which is
several times bigger than 7569, the square of 87, the
number of basic Linear B signs. Linear A, with about
7500 characters and perhaps 100 signs (n squared equals
10,000), is correspondingly less likely to be
deciphered."


I answered:


"No, that statististics is meaningless. Imagine a
writing system with 32 symbols; n=32, and n squared
32x32=1024, and call it the minimum amount of text for
decipherability. Now, encode that same script into a
binary system (e.g. a => 0000, b => 0001...), and
translate your corpus into it. The information is
preserved. You have now 2 symbols. The minimum amount
of text for decipherability is now... 4 symbols long!
(Not enough to fit a single one of the 32)"


And he asked back:

"I am going to try Chadwick's 'formula' on Elisabeth
Barber ("Archaeological Decipherment") and Whitfield
Diffie (cryptographer, whose wife is an Egyptologist).
But can you just clarify for me--you are saying that
Chadwick's idea is wrong, not just that my numbers are
wrong--yes?"

Yes, my point was that Chadwick's formula is dead
wrong. However, I would like other opinions. I know, I
know, over the years, we have thrashed this matter to
the death. Robinson's book will be aimed at the general
public (he'd previously written "The Story of Writing"
(Thames and Hudson, 1995), and I would hate to see
nonsense like "Chadwick's formula" fed to a wide
readership. IF it is nonsense. I think it is, but I
prefer not to trust my judgment. Comments, everybody?


Frogguy

From jim@mail.rand.org  Fri May 12 05:26:26 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id FAA10561
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 05:26:25 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id C69A34CE09; Fri, 12 May 2000 05:26:25 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-blue.research.att.com (Postfix) with ESMTP id 5160D4CE22
	for <reeds@research.att.com>; Fri, 12 May 2000 05:26:25 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id CAA26549; Fri, 12 May 2000 02:25:57 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id CAA12069; Fri, 12 May 2000 02:25:57 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id CAA06133 for <voynich@rand.org>; Fri, 12 May 2000 02:25:36 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id CAA12039 for <voynich@rand.org>; Fri, 12 May 2000 02:25:35 -0700 (PDT)
Received: from sun7.bham.ac.uk (sun7.bham.ac.uk [147.188.128.108]) by mail03-lax.pilot.net with ESMTP id CAA18174 for <voynich@rand.org>; Fri, 12 May 2000 02:25:34 -0700 (PDT)
Received: from ds13.bham.ac.uk ([147.188.72.20] helo=golem)
	by sun7.bham.ac.uk with esmtp (Exim 3.03 #1)
	id 12qBhu-0003bf-00; Fri, 12 May 2000 10:26:07 +0100
From: "Gabriel Landini" <G.Landini@bham.ac.uk>
Organization: The University of Birmingham, UK.
To: jguy@alphalink.com.au, voynich@rand.org
Date: Fri, 12 May 2000 10:23:56 +0100
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
Reply-To: G.Landini@bham.ac.uk
In-reply-to: <391B9DBF.87F@alphalink.com.au>
X-mailer: Pegasus Mail for Win32 (v3.12a)
Message-Id: <E12qBhu-0003bf-00@sun7.bham.ac.uk>
Sender: jim@mail.rand.org
Status: OR

On 11 May 00, at 22:59, Jacques Guy wrote:
> I would hate to see
> nonsense like "Chadwick's formula" fed to a wide
> readership. IF it is nonsense. I think it is, but I
> prefer not to trust my judgment. Comments, everybody?

I think that your example of a binary representation is excellent to 
show that  "a character" in an unknown script is a very non-intuitive 
issue. As a consequence Chadwick's formula may not apply 
because of the decipherer-to-be's inability to retrieve the character 
set. Further this is similar to consider in the Roman alphabet each 
stroke as a character, or Stolfi's superanalytical alphabet.

Where does Chadwick's formula come from? I have not idea.
If we imagine that we want to be sure to have read all alphabet 
characters at least once and their distribution is flat (and their 
probability of appearing is random), then this may be somewhat 
related as the "collector's dilemma" problem ( I can't remember the 
formulation right now but I am sure includes Euler's number).
(How many items you have to collect before your collection is 
complete).

Of course this is not the case in languages since the character 
distribution is not flat, etc but I wonder whether the size of the 
corpus that you need to make sure that at least everything 
appeared once could be calculated by accounting for the shape of 
the distribution of the characters.
But! this would be known only *after* we know what a character is 
and the size of the alphabet. In turn this will depend on what we call 
a character and therefore the size of the corpus may  be different. 

Would this mean that we never will be certain of what a character is 
in an unknown script?

I am more confused now... :-/
Gabriel

From jim@mail.rand.org  Fri May 12 06:47:15 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id GAA55259
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 06:47:15 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 17EE41E009; Fri, 12 May 2000 06:47:15 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id 9EAC81E008
	for <reeds@research.att.com>; Fri, 12 May 2000 06:47:14 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id DAA28399; Fri, 12 May 2000 03:46:50 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id DAA13492; Fri, 12 May 2000 03:46:49 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id DAA07578 for <voynich@rand.org>; Fri, 12 May 2000 03:46:45 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id DAA13482 for <voynich@rand.org>; Fri, 12 May 2000 03:46:44 -0700 (PDT)
Received: from flowers.sprint.ca ([207.107.250.66]) by mail03-lax.pilot.net with ESMTP id DAA28386 for <voynich@rand.org>; Fri, 12 May 2000 03:46:44 -0700 (PDT)
Received: from outlander (spc-isp-mtl-58-4-302.sprint.ca [149.99.138.49])
	by flowers.sprint.ca (8.8.8/8.8.8) with SMTP id GAA21315
	for <voynich@rand.org>; Fri, 12 May 2000 06:45:29 -0400 (EDT)
Message-ID: <001101bfbbff$e0a49400$318a6395@outlander>
Reply-To: "John Grove" <4groves@sprint.ca>
From: "John Grove" <4groves@sprint.ca>
To: "Voynich List" <voynich@rand.org>
References: <391B6D3A.9C33D2F0@gte.net>
Subject: Re: Specialty words
Date: Fri, 12 May 2000 06:49:47 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: jim@mail.rand.org
Status: OR


    One of the problems of trying to match word patterns is that the VMS
does not
use double characters as frequently as would be expected in any natural
language that
I can see. Whether syllabic or alphabetic - you would expect a certain
amount of 'doubling' (except if there is a character that signifies - double
the preceding letter). We also have a problem with the consistent 'end
forms' - specifically the an, ain, aiin, aiiin types. This may indicate that
a character's shape depends on where it is in the word (like Arabic) and
makes it difficult once again to make the comparison you suggest - if STAR
is ABCD, but the T is written differently in PLANET - then you don't really
see it as the same character.

    John Grove

----- Original Message -----
From: Brian Eric Farnell <bfarnell@gte.net>
To: Voynich List <voynich@rand.org>
Sent: Thursday, May 11, 2000 10:32 PM
Subject: Specialty words


>     I haven't had a chance to catch up on 8 years of e-mail, so my ideas
> may be old hat but here goes.  I'm a Chinese linguist with some basic
> experience in cryptography.  I don't have the software or know-how to do
> this stuff, but I would (from my Chinese experience) approach the
> manuscript's words instead of symbols.  To do this we would have to
> assume (on shaky ground) that the spaces correspond to word breaks.
> Once we do this and have a baseline word frequency (already done I
> understand) an effort should be made to break the text as well as
> possible into subjects.  Perhaps the pictures don't delineate changes in
> subjects---perhaps they do.  If they do, it should be possible to
> compare those section's word frequency to the whole and thus produce a
> list of 'specialty words'.  In an astronomy text one can reasonably
> expect to see words like 'star' and 'planet' far more often than in a
> biology book.  Once a list of words with a high probability of being
> subject specialized has been determined, than consider the cipher
> amongst those words only.  Look for things like 'star' being in the
> pattern ABCD, which makes 'planet' EFCGHB, because of the sharing of the
> letters 'a' and 't' amongst the two.  Comparing these similarities in
> specialty words to lists of specialty words generated in languages
> considered possible 'hits' should help ident the language. Words where a
> letter occurs twice or start and end with the same letter are a gold
> mine in this deciphering technique. Some sections would be more fruitful
> than others.  Some subjects lend themselves to flowery descriptions and
> metaphysical allusions, but stuff that's very hands-on should be written
> in ordinary language as a matter of habit.  Recipes for instance are
> likely to  contain a very high frequency of 'measure words' that you
> won't find anywhere else.  This method also has a high probability of
> correctly ident-ing the language even if Voynich is written in an
> obscure regional dialect---or even written by someone improperly
> schololed in the language he was writing in.  This is because of
> principles set forth in Grimm's law.  They (the Grimm brothers) studied
> Germanic languages and discovered that languages shift and change in
> regular patterns.  They set forth rules that turn translation of one
> Germanic language to another into a substitution cipher (to oversimplify
> things).  For instance English is much softer than German, the German
> 'tag' becomes the English 'day' as the less harsh tongue takes the 't'
> into a 'd' and gets lazy on the endings, dropping hard 'g's in favor of
> the less voiced 'y' modification.  I saw a demonstration of this in my
> German class years ago, given a long German passage that none of us had
> a clue about, then given a set of rules we were able to translate it
> easily into something akin to Old English and then had no trouble
> understanding.  This method could very possibly produce what seems to be
> a positive 'hit' on one language for the specialty words and then seems
> to fail the rest of the manuscript.  My personal assumption is that
> Voynich statistically looks funny because it is written in two or more
> languages.  I'm not talking about the differences in Voynich A and B,
> but the idea that a lot of 'scholarly words' in the text might be
> something like Latin or Greek while the rest could be in a common
> language, similar to any medical or legal textbook you find at modern
> universities.  The differences in Voynich A and B may be due to a
> difference in classical education.  Again, very practical sections of
> the manuscript are the gold mine, they are likely to have far fewer
> words that aren't in the common language.  'Old habits die hard' in this
> case would be a saving grace.  I think if this sort of analyses were to
> be attempted, dropping the endings from the words in the  lists made
> from possible languages should also be done for comparison.  The words
> in Voynich seem too short and the little I've seen shows way too many
> common letter sequences in the beginning of words, these combinations
> look like verb or noun endings to me.  They may have been chopped off
> and added to non-sense letters or to nulls.  Any thoughts?
>
> Respectfully,
> Brian Farnell
>
>

From jim@mail.rand.org  Fri May 12 07:11:11 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id HAA62549
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 07:11:11 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 417C21E015; Fri, 12 May 2000 07:11:11 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-green.research.att.com (Postfix) with ESMTP id B80091E009
	for <reeds@research.att.com>; Fri, 12 May 2000 07:11:10 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id EAA09431; Fri, 12 May 2000 04:09:22 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id EAA13845; Fri, 12 May 2000 04:09:21 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id EAA07925 for <voynich@rand.org>; Fri, 12 May 2000 04:09:17 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id EAA13835 for <voynich@rand.org>; Fri, 12 May 2000 04:09:16 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail03-lax.pilot.net with ESMTP id EAA01082 for <voynich@rand.org>; Fri, 12 May 2000 04:09:15 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id IAA08573
	for <voynich@rand.org>; Fri, 12 May 2000 08:09:10 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id IAA21636
	for <voynich@rand.org>; Fri, 12 May 2000 08:09:07 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id IAA24450;
	Fri, 12 May 2000 08:09:07 -0300 (EST)
Date: Fri, 12 May 2000 08:09:07 -0300 (EST)
Message-Id: <200005121109.IAA24450@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@dcc.unicamp.br>
To: voynich@rand.org
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
In-Reply-To: <391B9DBF.87F@alphalink.com.au>
References: <391B9DBF.87F@alphalink.com.au>
Reply-To: stolfi@dcc.unicamp.br
Sender: jim@mail.rand.org
Status: OR


    > Yes, my point was that Chadwick's formula is dead wrong.
    > However, I would like other opinions. I know, I know, over the
    > years, we have thrashed this matter to the death. .... I would
    > hate to see nonsense like "Chadwick's formula" fed to a wide
    > readership. IF it is nonsense. I think it is, but I prefer not
    > to trust my judgment. Comments, everybody?

I had never heard of "Chadwick's formula", and I can't imagine how it
could be derived. Your binary reencoding argument is a good point---
at best, the formula needs some special assumptions.

One can define 6 "limiting" types of undeciphered languages, depending
on whether (1) the script, (2) the language, and (3) the meaning of
the corpus texts are known or unknown. Thus Rongorongo, which is
almost surely in the local language, would be of type NYN (Y=known,
N=not known). Phaistos and Voynich are NNN, Etruscan would lie
somewhere between types YNY and YNN, etc.

Clearly, the amount of text one needs for successful decipherment
strongly depends on the language's type.  Roughly, in
order of increasing difficulty:

  scr lng mng  example                   decipherment needs:
  --- --- ---  ------------------------  -----------------------------------
   N   Y   Y   Egyptian after Rosetta,   A fairly small corpus, basically 
               almost.                   large enough for each glyph to 
                                         occur at least once.
      
   N   Y   N   Linear B after Ventris's  A somewhat larger corpus, basically
               breakthrough, almost.     large enough for a few dozen function
                                         words and inflections to occur
                                         and be reconized.
      
   Y   N   Y   Etruscan after Pyrgi,     A fairly large corpus, large
               perhaps?                  enough to pinpoint the meaning
                                         of individual words (rather than
                                         whole sentences) and extract a 
                                         basic vocabulary.
      
   N   N   Y   (no idea)                 Basically the same as the previous
                                         case.
                                         
   Y   N   N   Elamite, perhaps?         A very large corpus, large enough
                                         to spot syntactic structures
                                         and reliably guess their meaning.
                                         
   N   N   N   Voynich, Phaistos         Ditto, only harder.
   

(The type YYY means there is no problem to solve, and YYN is nonsensical.)

The last two entries of the table include cases where the language is
actually known but is still unidentified. Thus Linear B and Egyptian
used to be NNN, but once the language was identified they became YNN
or YNY, and decipherment soon followed. (Hopefully the same will
happen to Voynichese .)

Chadwick's formula does not make sense if it ignores the "little
details" of language and text meaning. Your binary encoding trick may
be said to simplify the script, but make the language much more
obscure.

All the best, 

From jim@mail.rand.org  Fri May 12 08:21:42 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id IAA11590
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 08:21:42 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id E75441E010; Fri, 12 May 2000 08:21:41 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id 597831E008
	for <reeds@research.att.com>; Fri, 12 May 2000 08:21:41 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id FAA10658; Fri, 12 May 2000 05:21:11 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id FAA15553; Fri, 12 May 2000 05:21:10 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id FAA09834 for <voynich@rand.org>; Fri, 12 May 2000 05:21:04 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id FAA15533 for <voynich@rand.org>; Fri, 12 May 2000 05:21:03 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail01-lax.pilot.net with ESMTP id FAA10848 for <voynich@rand.org>; Fri, 12 May 2000 05:20:57 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id JAA10731
	for <voynich@rand.org>; Fri, 12 May 2000 09:20:50 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id JAA00725
	for <voynich@rand.org>; Fri, 12 May 2000 09:20:50 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id JAA24469;
	Fri, 12 May 2000 09:20:50 -0300 (EST)
Date: Fri, 12 May 2000 09:20:50 -0300 (EST)
Message-Id: <200005121220.JAA24469@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@dcc.unicamp.br>
To: voynich@rand.org
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
In-Reply-To: <391B9DBF.87F@alphalink.com.au>
References: <391B9DBF.87F@alphalink.com.au>
Reply-To: stolfi@dcc.unicamp.br
Sender: jim@mail.rand.org
Status: OR


PS. By coincidence, two weeks ago I got a copy of Robinsons's "The
Story of Writing" from a friend. Jacques's message prompted me to
write down my impressions of the book. If you are interested:

http://www.dcc.unicamp.br/~stolfi/TheStoryOfWriting.html

From jim@mail.rand.org  Fri May 12 11:40:04 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id LAA60791
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 11:40:04 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 2C2E74CE31; Fri, 12 May 2000 11:40:04 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id 53A4D4CE2A
	for <reeds@research.att.com>; Fri, 12 May 2000 11:40:03 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id IAA03368; Fri, 12 May 2000 08:36:41 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id IAA26920; Fri, 12 May 2000 08:36:40 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id IAA25789 for <voynich@rand.org>; Fri, 12 May 2000 08:35:59 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id IAA26870 for <voynich@rand.org>; Fri, 12 May 2000 08:35:58 -0700 (PDT)
Received: from shell18.ba.best.com (root@shell18.ba.best.com [206.184.139.150]) by mail03-lax.pilot.net with ESMTP id IAA06689 for <voynich@rand.org>; Fri, 12 May 2000 08:35:57 -0700 (PDT)
Received: (from kornai@localhost)
	by shell18.ba.best.com (8.9.3/8.9.2/best.sh) id IAA28291;
	Fri, 12 May 2000 08:35:02 -0700 (PDT)
Message-Id: <200005121535.IAA28291@shell18.ba.best.com>
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
In-Reply-To: <391B9DBF.87F@alphalink.com.au> from Jacques Guy at "May 11, 0 10:59:27 pm"
To: jguy@alphalink.com.au
Date: Fri, 12 May 2000 08:35:02 -0700 (PDT)
Cc: voynich@rand.org
Reply-To: andras@kornai.com
From: andras@kornai.com
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR

Jacques Guy writes:
> Yes, my point was that Chadwick's formula is dead
> wrong. However, I would like other opinions. 

OK, here is my $0.02. The formula may not be as wrong as everybody here
seems to suppose, at least if we approach it with a little goodwill. First,
it should be taken to apply only to "NYN" type languages. Robinson must be
made aware of the incredible importance of getting the language right, but 
once we have that much, it's not that hard to justify the formula. 

The 0th step of the analysis is to arrange the symbols in frequency order.
The 1st step is to construct a grid of what can follow what. As we try to 
grok the pattern, it is this grid that gets rearranged over and over again:
something that is extremely hard to do with higher order statistics because 
we don't really have the visual means to deal with higher dimensional grids. 

Given our human limitations in constructing, displaying, and comprehending 
higher order data, it is likely that 1st order statistics will contimue to
play a very significant role in solving the puzzle, even if computers can 
store (and selectively display) the higher order material with ease. 

Now, to fill in a bigram grid with any chance of random fluctuations not
totally overwhelming the true pattern we need n^2 data points (actually, some
constant time n^2 is better), so there is a fair bit of engineering wisdom 
in the formula.

This of course applies only to ordinary phoneme- mora- or syllable-based
scripts, where the usual goal (if systems created by a long evolutionary
process can be said to have a goal) is to map sounds to symbols in a simple
fashion. For the VMS, we don't even know whether there is a spoken system
behind it (though I personally strongly suspect there is) and the goal of the
script seems to be to delibarately obscure, rather than plainly present, the
relationship between sounds and symbols (so a corpus larger than n^2 should be
required). 

I think this goes a long way towards explaining why the kind of binary
encoding suggested as a counterexamle renders the formula meaningless: once 
such an encoding is performed the relationship between the symbols and the 
sounds is anything but straightforward. 

Andras Kornai

From jim@mail.rand.org  Fri May 12 12:00:24 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id MAA79840
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 12:00:23 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 779B91E018; Fri, 12 May 2000 12:00:22 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id BF8631E008
	for <reeds@research.att.com>; Fri, 12 May 2000 12:00:21 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id IAA16448; Fri, 12 May 2000 08:58:24 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id IAA28835; Fri, 12 May 2000 08:58:23 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id IAA28739 for <voynich@rand.org>; Fri, 12 May 2000 08:58:18 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id IAA28825 for <voynich@rand.org>; Fri, 12 May 2000 08:58:18 -0700 (PDT)
Received: from sun7.bham.ac.uk (sun7.bham.ac.uk [147.188.128.108]) by mail02-lax.pilot.net with ESMTP id IAA24851 for <voynich@rand.org>; Fri, 12 May 2000 08:58:17 -0700 (PDT)
Received: from ds13.bham.ac.uk ([147.188.72.20] helo=golem)
	by sun7.bham.ac.uk with esmtp (Exim 3.03 #1)
	id 12qHqC-0005tk-00
	for voynich@rand.org; Fri, 12 May 2000 16:59:04 +0100
From: "Gabriel Landini" <G.Landini@bham.ac.uk>
Organization: The University of Birmingham, UK.
To: voynich@rand.org
Date: Fri, 12 May 2000 16:56:54 +0100
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
Reply-To: G.Landini@bham.ac.uk
In-reply-to: <200005121535.IAA28291@shell18.ba.best.com>
References: <391B9DBF.87F@alphalink.com.au> from Jacques Guy at "May 11, 0 10:59:27 pm"
X-mailer: Pegasus Mail for Win32 (v3.12a)
Message-Id: <E12qHqC-0005tk-00@sun7.bham.ac.uk>
Sender: jim@mail.rand.org
Status: OR

On 12 May 00, at 8:35, andras@kornai.com wrote:
> The 0th step of the analysis is to arrange the symbols in frequency
> order.

I think that your example may apply once you know the character set 
(I am not sure that I follow the analysis correctly, though).
The problem seems to be (in Jacques example) to "get" the 
characters right to then get the minimum corpus size for decoding. 
I.e. an  NN* case in Stolfi's classification.

The questions may be rephrased as:

1. How much can one dissect the unknown script to make sure that 
one is dealing with a character? 

and then comes:

2. Given that dissected set, does Chadwick's law apply?

I have the feeling that Jacques example was questioning of 2 when 
you do not know 1.

Cheers,
Gabriel




From jim@mail.rand.org  Fri May 12 12:32:46 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id MAA47728
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 12:32:46 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 4A8881E008; Fri, 12 May 2000 12:32:46 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-green.research.att.com (Postfix) with ESMTP id ACC811E01A
	for <reeds@research.att.com>; Fri, 12 May 2000 12:32:45 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id JAA09830; Fri, 12 May 2000 09:30:04 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id JAA01590; Fri, 12 May 2000 09:30:02 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id JAA03256 for <voynich@rand.org>; Fri, 12 May 2000 09:28:35 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id JAA01429 for <voynich@rand.org>; Fri, 12 May 2000 09:28:35 -0700 (PDT)
Received: from ProdWEB01 ([199.203.199.242]) by mail01-lax.pilot.net with ESMTP id JAA27733 for <voynich@rand.org>; Fri, 12 May 2000 09:28:32 -0700 (PDT)
Received: from mail pickup service by ProdWEB01 with Microsoft SMTPSVC;
	 Fri, 12 May 2000 12:15:41 -0400
From: "WHquestion" <registration@letuknow.com>
To: <voynich@rand.org>
Subject: Is there a real correlation between race and IQ?
Date: Fri, 12 May 2000 12:15:41 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Message-ID: <0efa74115160c50NURONIASRV@ProdWEB01>
Sender: jim@mail.rand.org
Status: OR

Hello,

Members of our community - the WHquestion free info arena - posted the
following questions:

"Is there a real correlation between race and IQ?"

and also -

"WHat's the easiest way to get from Jakarta to the Suez Canal by train?"

If you hold the answers to these or other essential questions, or if you
feel like asking some questions yourself, please drop by our site at:
http://www.whquestion.com/

To answer one of the questions above click here:
http://www.whquestion.com/emailentry.asp?p=qgvjw;xTRWDQIWV;OlmCalsj761aA

To see what's it all about click here: http://www.whquestion.com/

Not interested? 
Just reply to this email with "remove" as the subject line and we won't
bother you again.

========================================================
This message complies with the US Federal requirements as well as the
Washington State Commercial Email Bill.
Sender information: Neuronia Ltd.
Email: contact@WHquestion.com, Tel: +972 (3) 6394304
========================================================

WHo? WHat? WHy? WHen? WHere? WHquestion is the answer!
http://www.whquestion.com/

From jim@mail.rand.org  Fri May 12 12:24:34 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id MAA02505
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 12:24:33 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id BA0064CE2A; Fri, 12 May 2000 12:24:33 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-blue.research.att.com (Postfix) with ESMTP id 385214CE29
	for <reeds@research.att.com>; Fri, 12 May 2000 12:24:33 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id JAA27467; Fri, 12 May 2000 09:22:39 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id JAA00942; Fri, 12 May 2000 09:22:38 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id JAA01876 for <voynich@rand.org>; Fri, 12 May 2000 09:21:11 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id JAA00784 for <voynich@rand.org>; Fri, 12 May 2000 09:21:10 -0700 (PDT)
Received: from tollhouse.psnw.com (tollhouse.psnw.com [205.199.144.121]) by mail01-lax.pilot.net with ESMTP id JAA23884 for <voynich@rand.org>; Fri, 12 May 2000 09:21:10 -0700 (PDT)
Received: from micro-net.com (ts001d20.bat-la.concentric.net [64.1.27.32])
	by tollhouse.psnw.com (8.9.3/8.9.3) with ESMTP id JAA12981
	for <voynich@rand.org>; Fri, 12 May 2000 09:20:42 -0700 (PDT)
Message-ID: <391C30CF.97A6833F@micro-net.com>
Date: Fri, 12 May 2000 11:26:55 -0500
From: Dennis <ixohoxi@micro-net.com>
X-Mailer: Mozilla 4.05 [en] (Win95; I)
MIME-Version: 1.0
To: Voynich List <voynich@rand.org>
Subject: Psychedelic Imagery in the VMs
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR

>

On a different topic, here's something  I don't ever recall every having seen on
the list.From Richard Shand, a friend of the VMs.

>   Its great it the VMs can be published.  The
> folios on the Yale site are fascinating.  My first impression is that
> the images indicate that they were done under the influence of an
> hallucinogenic drug.  I remember an artist  showing me elaborate
> calligraphic and embroidered images (like the psychedelic posters of
> the 60s) and said he was in telepathic communication with his soul
> mate on Nepture.  With him the boundaries between schizophrenic
> delusion and self-induced fantasy were blurred.  The pictures in the
> VMs have the same combination of fanciful artificiality and obsession
> with detail.
> The plants, real or imaginary, may represent psychotropic agents and
> the text, real or imaginary, may (at least in part) represent recipes
> for their alchemical preparation.  The nymphs have strong erotic
> overtones and the association with water (ritual cleansing?) brings
> to mind Wagner's Rhine maidens. Their depiction with stars on the
> astrological charts suggests that they are angelic beings.




From jim@mail.rand.org  Fri May 12 15:17:23 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id PAA55527
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 15:17:23 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 9C68B1E022; Fri, 12 May 2000 15:17:23 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-green.research.att.com (Postfix) with ESMTP id 24EA61E016
	for <reeds@research.att.com>; Fri, 12 May 2000 15:17:23 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id MAA24035; Fri, 12 May 2000 12:15:36 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA15999; Fri, 12 May 2000 12:15:34 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id MAA24729 for <voynich@rand.org>; Fri, 12 May 2000 12:15:15 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA15973 for <voynich@rand.org>; Fri, 12 May 2000 12:15:14 -0700 (PDT)
Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17]) by mail02-lax.pilot.net with ESMTP id MAA23846 for <voynich@rand.org>; Fri, 12 May 2000 12:15:13 -0700 (PDT)
Received: from fwd05.sul.t-online.de 
	by mailout02.sul.t-online.com with smtp 
	id 12qKu1-0002yD-01; Fri, 12 May 2000 21:15:13 +0200
Received: from Noname (0625764225-0001@[62.156.39.17]) by fwd05.sul.t-online.de
	with esmtp id 12qKtn-1QXkO0C; Fri, 12 May 2000 21:14:59 +0200
Message-ID: <391C58CE.6AB55D04@voynich.nu>
Date: Fri, 12 May 2000 21:17:34 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
X-Priority: 3 (Normal)
References: <391B9DBF.87F@alphalink.com.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

Here are another pair of cents.

The proposed binary encoding is a bit of a nasty trick and
I suspect that Chadwick's knee-jerk reaction to such a trick
might be something like: that's not what I meant.

If the binary encoding is done without loss of information
(which would be fair), one needs two more symbols: a character
space and a word space. These were available before too.
Then, any child will see that instead of 4 symbols, there 
really are N+2 macro-symbols, and one is back to square 1.
So Chadwick's rule (valid or not) could be rephrased (in a 
way that makes it pretty useless) by working with such macro-
symbols.

That was $0,01. Here's nr.2: How does Chadwick's formula apply
to the VMs? Looks pretty good for us I'm sure. So do we have
one counter-example to his rule here? Or does his rule one
work in one direction: you can't ... if you've got fewer than...

Cheers, Rene

From jim@mail.rand.org  Fri May 12 15:24:11 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id PAA90295
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 15:24:11 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 461601E009; Fri, 12 May 2000 15:24:11 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id B47EC1E008
	for <reeds@research.att.com>; Fri, 12 May 2000 15:24:10 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id MAA16553; Fri, 12 May 2000 12:22:37 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA16576; Fri, 12 May 2000 12:22:35 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id MAA25386 for <voynich@rand.org>; Fri, 12 May 2000 12:22:28 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA16558 for <voynich@rand.org>; Fri, 12 May 2000 12:22:28 -0700 (PDT)
Received: from yarf.eecs.umich.edu (yarf.eecs.umich.edu [141.213.12.211]) by mail01-lax.pilot.net with ESMTP id MAA13696 for <voynich@rand.org>; Fri, 12 May 2000 12:22:27 -0700 (PDT)
Received: (from kckluge@localhost)
	by yarf.eecs.umich.edu (8.9.3/8.9.1) id PAA17820;
	Fri, 12 May 2000 15:22:25 -0400 (EDT)
Date: Fri, 12 May 2000 15:22:25 -0400 (EDT)
Message-Id: <200005121922.PAA17820@yarf.eecs.umich.edu>
From: Karl Kluge <kckluge@eecs.umich.edu>
To: voynich@rand.org
In-reply-to: <391C58CE.6AB55D04@voynich.nu> (Zandbergen@t-online.de)
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
References: <391B9DBF.87F@alphalink.com.au> <391C58CE.6AB55D04@voynich.nu>
Sender: jim@mail.rand.org
Status: OR


I seem to recall Jim Reed's paper on Trithemus containing reference to
the number of characters one needs (on info theoretic grounds) to solve
a monoalphabetic. Jim?

From jim@mail.rand.org  Fri May 12 16:19:51 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id QAA70509
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 16:19:51 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 6D82A1E022; Fri, 12 May 2000 16:19:51 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id EC2211E016
	for <reeds@research.att.com>; Fri, 12 May 2000 16:19:50 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id NAA06366; Fri, 12 May 2000 13:18:55 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id NAA21008; Fri, 12 May 2000 13:18:54 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id NAA02995 for <voynich@rand.org>; Fri, 12 May 2000 13:18:36 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id NAA20974 for <voynich@rand.org>; Fri, 12 May 2000 13:18:36 -0700 (PDT)
Received: from mailout05.sul.t-online.com (mailout05.sul.t-online.com [194.25.134.82]) by mail01-lax.pilot.net with ESMTP id NAA06247 for <voynich@rand.org>; Fri, 12 May 2000 13:18:35 -0700 (PDT)
Received: from fwd01.sul.t-online.de 
	by mailout05.sul.t-online.com with smtp 
	id 12qLtK-0006Oe-02; Fri, 12 May 2000 22:18:34 +0200
Received: from Noname (0625764225-0001@[193.159.4.96]) by fwd01.sul.t-online.de
	with esmtp id 12qLtF-2ApSmeC; Fri, 12 May 2000 22:18:29 +0200
Message-ID: <391C67AE.E30DD985@voynich.nu>
Date: Fri, 12 May 2000 22:21:02 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: An inventory of the collections of Rudolpf II
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

Dear all,

there have been occasional references to "a catalogue of Rudolph's
collections". Here's one in:
Jacqueline Dauxois: L'empereur des alchimistes, Rodolphe II de
Habsbourg (German translation of 1997):

   In an inventory list of 1619, seven years after his death, 3000
   paintings are listed from Michelangelo, Da Vinci, Raffael [etc],
   2500 sculptures and thousands of other objects, which have
   been estimated at the incredible value of 17 Million florins.

But here's another one:
After world war two, one Gustav Wilhelm discovered an ancient
document in a library in Liechtenstein. It was a hitherto unkown
inventory of  Rudolph's "Kunstkammer", started in 1607 and 
containing entries up to 1611.
He made a transcription in 1947 and handed this over to one 
Erwin Neumann in 1956. The latter managed to identify a large 
number of the objects but died prematurely. The edition
finally appeared in 1976.
The inventory was written by the painter Daniel Frschl, 
who succeeded Ottavio Strada as imperial antiquarian  on 1 May
1607.

Has anyone ever seen specific references to the latter inventory?
Its edition appeared after Evans' Rudolph II and his world (1973) 
but he may have known about it. Does anyone who has access
to this work know if he refers to it? 

This inventory is now kept in the library of the prince (Frst)
of Liechtenstein. There are two publications about it (which I
have not yet seen):

E. Neumann: "Das Inventar der rudolfinischen Kunstkammer
von 1607/11", in: Analecta Reginensia. Queen Christina of
Sweden, documents and studies. Stockholm 1966, S. 262-265.

R. Bauer and H. Haupt: "Das Kunstkammerinventar Kaiser
Rudolphs II. 1607-1611. In: Jahrbuch der Kunsthistorischen
Sammlungen in Wien, Bd. 72, 1976.

Any hint / suggestion would be greatly appreciated. I will
of course try to find the above references.

Cheers, Rene

From jim@mail.rand.org  Fri May 12 18:26:12 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id SAA60868
	for <reeds@fry.research.att.com>; Fri, 12 May 2000 18:26:12 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 825601E008; Fri, 12 May 2000 18:26:12 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id BCF3F1E005
	for <reeds@research.att.com>; Fri, 12 May 2000 18:26:08 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id PAA20725; Fri, 12 May 2000 15:24:35 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA00714; Fri, 12 May 2000 15:24:34 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id PAA19282 for <voynich@rand.org>; Fri, 12 May 2000 15:24:22 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA00704 for <voynich@rand.org>; Fri, 12 May 2000 15:24:20 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail02-lax.pilot.net with ESMTP id PAA08761 for <voynich@rand.org>; Fri, 12 May 2000 15:24:16 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id TAA05721
	for <voynich@rand.org>; Fri, 12 May 2000 19:24:09 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id TAA12737
	for <voynich@rand.org>; Fri, 12 May 2000 19:24:08 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id TAA24652;
	Fri, 12 May 2000 19:24:06 -0300 (EST)
Date: Fri, 12 May 2000 19:24:06 -0300 (EST)
Message-Id: <200005122224.TAA24652@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@dcc.unicamp.br>
To: voynich@rand.org
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
In-Reply-To: <391C58CE.6AB55D04@voynich.nu>
References: <391B9DBF.87F@alphalink.com.au>
	<391C58CE.6AB55D04@voynich.nu>
Reply-To: stolfi@dcc.unicamp.br
Sender: jim@mail.rand.org
Status: OR


    > [Rene:] If the binary encoding is done without loss of
    > information (which would be fair), one needs two more symbols: a
    > character space and a word space.
    
Um, not really. The word separator would be coded in binary just 
like any letter. As for character separators, they are
not needed if one uses a fixed-length code (e.g. ascii), or
any of a number of self-delimiting codes.

    > Then, any child will see that instead of 4 symbols, there really
    > are N+2 macro-symbols, and one is back to square 1.
    
Well, but how would one know that certain groups of characters are
really macro-characters, before deciphering the script?

    > [Andras:] The formula may not be as wrong as everybody here
    > seems to suppose... First, it should be taken to apply only to
    > "NYN" type languages.
    > 
    > The 0th step of the analysis is to arrange the symbols in
    > frequency order. The 1st step is to construct a grid of what can
    > follow what. ... we need n^2 data points (actually, some
    > constant time n^2 is better) ...
    
This analysis assumes a cryptographic-style attack based on digraph
frequencies. But my impression is that decipherment of known natural
languages with unknown scripts (my "NYN" case) hardly happen that way.
For one thing, the frequencies of letters --- not to mention digrams
--- are strongly affected by subject matter and spelling anomalies,
which are the rule in ancient texts. 

Unfortunately there don't seem to be many historical examples of pure
NYN decipherment to go by. In most cases (Egyptian, Cuneiform, Maya,
Hittite), decipherment relied heavily on "cribs", i.e. on some
information about the meaning of the text. 

Perhaps Linear B is one legitimate example of crib-less NYN
decipherment? But, in that specific case, I believe that the solution
was found by successfully guessing the structure of some sentences,
and identifying some characteristic morphological elements ---
particles, inflections, connective verbs, whatever. From that Ventris
got tentative values for some letters, which then made it possible to
identify other non-function words. Is this account correct?

For this approach, one needs a corpus that is just long enough to
display recognizable morphological regularities. I don't see how this
parameter could be related to the alphabet's size (except that the
approach would not work well with a logographic script).

All the best,

--stolfi

From jim@mail.rand.org  Sat May 13 20:26:00 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id UAA51135
	for <reeds@fry.research.att.com>; Sat, 13 May 2000 20:26:00 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 620551E00F; Sat, 13 May 2000 20:26:00 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id E4B351E00D
	for <reeds@research.att.com>; Sat, 13 May 2000 20:25:59 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id RAA03531; Sat, 13 May 2000 17:25:35 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id RAA27433; Sat, 13 May 2000 17:25:34 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id RAA01636 for <voynich@rand.org>; Sat, 13 May 2000 17:23:45 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id RAA27393 for <voynich@rand.org>; Sat, 13 May 2000 17:23:45 -0700 (PDT)
Received: from yarf.eecs.umich.edu (yarf.eecs.umich.edu [141.213.12.211]) by mail01-lax.pilot.net with ESMTP id RAA03304 for <voynich@rand.org>; Sat, 13 May 2000 17:23:44 -0700 (PDT)
Received: (from kckluge@localhost)
	by yarf.eecs.umich.edu (8.9.3/8.9.1) id UAA20011;
	Sat, 13 May 2000 20:23:42 -0400 (EDT)
Date: Sat, 13 May 2000 20:23:42 -0400 (EDT)
Message-Id: <200005140023.UAA20011@yarf.eecs.umich.edu>
From: Karl Kluge <kckluge@eecs.umich.edu>
To: voynich@rand.org
In-reply-to: <391C67AE.E30DD985@voynich.nu> (Zandbergen@t-online.de)
Subject: Re: An inventory of the collections of Rudolpf II
References:  <391C67AE.E30DD985@voynich.nu>
Sender: jim@mail.rand.org
Status: OR


Rene,

Looking at the U. Michigan library catalog, we don't have much on
good old Rudy. While it's pre-discovery of the Voynich,

 Author:         Bolton, Henry Carrington, 1843-1903.
 Title:          The follies of science at the court of Rudolph II, 1576-1612.
 Published:      Milwaukee, Pharmaceutical review publishing co., 1904.
 Description:    5 p.l., 217 p., 1 l. front., illus., plates, ports. 23 cm.
 SUBJECT HEADINGS (Library of Congress; use s=):
                 Rudolf II, Holy Roman Emperor, 1552-1612.
                 Science--Czech Republic--Bohemia--History
                 Alchemy--Czech Republic--Bohemia--History.

might nevertheless be interesting to look through. We do have

 Author:         Platen, Magnus von. ed.
 Title:          Queen Christina of Sweden; documents and studies.
 Published:      Stockholm, 1966.
 Description:    389 p., (1) l. illus. 25 cm.
 Series:         Sweden. Nationalmuseum. Skriftserie, nr. 12.
                 Analecta reginensia, 1.
 SUBJECT HEADINGS (Library of Congress; use s=):
                 Kristina, Queen of Sweden, 1626-1689..
 ------------------------------------------------------------------------------
  LOCATION:              CALL NUMBER:               STATUS:
  FINE ARTS              DL 719 .P72                Chkd-Out, Due: 06/01/2000 

I can put a hold on it, if you like -- I need to put holds on a couple other
things anyways. The other reference, unfortunately, is held in Special
Collections, meaning I'd have to get over there during business hours. I'll
see if I can make some time this week.

Karl

From jim@mail.rand.org  Sun May 14 13:13:55 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id NAA34649
	for <reeds@fry.research.att.com>; Sun, 14 May 2000 13:13:55 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id C09461E00F; Sun, 14 May 2000 13:13:54 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id F027D1E00D
	for <reeds@research.att.com>; Sun, 14 May 2000 13:13:53 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id KAA12861; Sun, 14 May 2000 10:13:29 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id KAA07356; Sun, 14 May 2000 10:13:29 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id KAA17656 for <voynich@rand.org>; Sun, 14 May 2000 10:11:47 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id KAA07317 for <voynich@rand.org>; Sun, 14 May 2000 10:11:47 -0700 (PDT)
Received: from mail.msen.com (root@conch.msen.com [148.59.19.5]) by mail02-lax.pilot.net with ESMTP id KAA16482 for <voynich@rand.org>; Sun, 14 May 2000 10:11:46 -0700 (PDT)
Received: from mail.msen.com (fl36-d11.msen.net [148.59.239.19])
	by mail.msen.com (8.9.3/8.9.3) with ESMTP id NAA06783
	for <voynich@rand.org>; Sun, 14 May 2000 13:11:44 -0400 (EDT)
Message-ID: <391F8788.B65DF27C@mail.msen.com>
Date: Mon, 15 May 2000 01:13:44 -0400
From: Bruce Grant <bgrant@mail.msen.com>
X-Mailer: Mozilla 4.7 [en]C-NSCPCD  (Win95; U)
X-Accept-Language: ru
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: Specialty words
References: <391B6D3A.9C33D2F0@gte.net> <001101bfbbff$e0a49400$318a6395@outlander>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: jim@mail.rand.org
Status: OR



John Grove wrote:

>     One of the problems of trying to match word patterns is that the VMS
> does not
> use double characters as frequently as would be expected in any natural
> language that
> I can see. Whether syllabic or alphabetic - you would expect a certain
> amount of 'doubling' (except if there is a character that signifies - double
> the preceding letter). We also have a problem with the consistent 'end
> forms' - specifically the an, ain, aiin, aiiin types. This may indicate that
> a character's shape depends on where it is in the word (like Arabic) and
> makes it difficult once again to make the comparison you suggest - if STAR
> is ABCD, but the T is written differently in PLANET - then you don't really
> see it as the same character.
>
>     John Grove

Maybe the language suppresses double letters, as Spanish appears to do compared
to French.

Or, maybe it's Arabic, and they neglected to write in the superscript characters
(shaddas) to indicate doubling. (My copy of "Teach Yourself Arabic" indicates
that the shaddas are not always there in present-day printed matter.)

Bruce Grant

From jim@mail.rand.org  Mon May 15 08:23:35 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id IAA91837
	for <reeds@fry.research.att.com>; Mon, 15 May 2000 08:23:35 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id ABB084CE21; Mon, 15 May 2000 08:23:35 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-blue.research.att.com (Postfix) with ESMTP id D93B74CE19
	for <reeds@research.att.com>; Mon, 15 May 2000 08:23:34 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id FAA17576; Mon, 15 May 2000 05:22:36 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id FAA25409; Mon, 15 May 2000 05:22:35 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id FAA11054 for <voynich@rand.org>; Mon, 15 May 2000 05:20:55 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id FAA25363 for <voynich@rand.org>; Mon, 15 May 2000 05:20:55 -0700 (PDT)
Received: from hme0.mailrouter04.sprint.ca (itac-sun-13.sprint.ca [207.107.250.17] (may be forged)) by mail02-lax.pilot.net with ESMTP id FAA17328 for <voynich@rand.org>; Mon, 15 May 2000 05:20:54 -0700 (PDT)
Received: from outlander (spc-isp-ott-uas-14-13.sprint.ca [209.103.35.164])
	by hme0.mailrouter04.sprint.ca (8.8.8/8.8.8) with SMTP id IAA23314
	for <voynich@rand.org>; Mon, 15 May 2000 08:20:21 -0400 (EDT)
Message-ID: <000701bfbe68$a73667a0$a42367d1@outlander>
Reply-To: "John Grove" <4groves@sprint.ca>
From: "John Grove" <4groves@sprint.ca>
To: <voynich@rand.org>
References: <391B6D3A.9C33D2F0@gte.net> <001101bfbbff$e0a49400$318a6395@outlander> <391F8788.B65DF27C@mail.msen.com>
Subject: Re: Specialty words
Date: Mon, 15 May 2000 08:24:38 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: jim@mail.rand.org
Status: OR

    I like to think that I'm non-biased and open to all venues of attack on
the VMS, but the more I focus on what appears to be 'key elements' to me,
the more I lean toward an Arabic type of language...
1. Letter shapes change according to position in a word/syllable - so that
'o,a, and y' could be the same character. Sometimes a character you might
expect at the beginning (o) is instead replaced by (y) - because in reality
the letter is followed by a glottal stop or specific vowel. [The down side
to this - the overall character set is reduced unless minor changes in the
writing are significant rather than minor.] We find the same characters
(iin) almost always in word final position. [Yes, it could be that the
spaces aren't real boundaries but caused by the style of this character].
2. Doubling, as I've discussed before, and as Bruce mentions below - It is
possible that the 'double' marker isn't written - thus adding to the
confusion. dain might actually be daain sometimes in reality.[Any consonant
only system that 'solves' the VMS should be able to withstand criticism if
the occurrence of double letters falls within one word - For example if I
made dain equivalent to {d=b in word initial, medial positions, a = t in
medial position, and in= r in word final position, I could create the words
"bitter, butter, or biter"
3. Even though this text is obviously written left to right [it could be
mirrored], I am convinced that the centered text following a paragraph or
page is a title for the preceding text, and this possibly indicates that the
text should be read from the bottom to top of a page [Wild assumption -
yes].

    Well so much for not being biased!
I think [IF] the script represents a natural language that it is either:
1. - A Consonant script akin to Arabic (not Ukrainian, sorry)
2. - A Syllabic script with an inherent vowel (to reduce the amount of
variations) akin to Brahmi script
3. - An Alphabetic script that we've unfortunately reduced to too small a
set of actual characters because letters that look the same - really are
significantly different.

I suppose Jorge's Chinese could fit in NR. 2 above 8-), but I'm leaning
toward NR. 1 - although - I'm still very open minded about what the
underlying language is!

    John

----- Original Message -----
From: Bruce Grant <bgrant@mail.msen.com>
To: <voynich@rand.org>
Sent: Monday, May 15, 2000 1:13 AM
Subject: Re: Specialty words


> Maybe the language suppresses double letters, as Spanish appears to do
compared
> to French.
>
> Or, maybe it's Arabic, and they neglected to write in the superscript
characters
> (shaddas) to indicate doubling. (My copy of "Teach Yourself Arabic"
indicates
> that the shaddas are not always there in present-day printed matter.)
>
> Bruce Grant
>

From jim@mail.rand.org  Mon May 15 17:52:10 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id RAA37314
	for <reeds@fry.research.att.com>; Mon, 15 May 2000 17:52:10 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 46BF81E01E; Mon, 15 May 2000 17:52:10 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id AD0BC1E018
	for <reeds@research.att.com>; Mon, 15 May 2000 17:52:09 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id OAA09384; Mon, 15 May 2000 14:50:58 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id OAA09727; Mon, 15 May 2000 14:50:56 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id OAA22482 for <voynich@rand.org>; Mon, 15 May 2000 14:50:28 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id OAA09642 for <voynich@rand.org>; Mon, 15 May 2000 14:50:25 -0700 (PDT)
Received: from mailout03.sul.t-online.com (mailout03.sul.t-online.com [194.25.134.81]) by mail03-lax.pilot.net with ESMTP id OAA09186 for <voynich@rand.org>; Mon, 15 May 2000 14:50:24 -0700 (PDT)
Received: from fwd05.sul.t-online.de 
	by mailout03.sul.t-online.com with smtp 
	id 12rSkp-0003S1-04; Mon, 15 May 2000 23:50:23 +0200
Received: from Noname (0625764225-0001@[193.159.5.44]) by fwd05.sul.t-online.de
	with esmtp id 12rSkh-1xi47EC; Mon, 15 May 2000 23:50:15 +0200
Message-ID: <392071BB.61511C2B@voynich.nu>
Date: Mon, 15 May 2000 23:52:59 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: An inventory of the collections of Rudolpf II
X-Priority: 3 (Normal)
References: <391C67AE.E30DD985@voynich.nu> <200005140023.UAA20011@yarf.eecs.umich.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

Karl wrote:
> [...] We do have
> 
>  Author:         Platen, Magnus von. ed.
>  Title:          Queen Christina of Sweden; documents and studies.
>  Published:      Stockholm, 1966.
>  Description:    389 p., (1) l. illus. 25 cm.
>  Series:         Sweden. Nationalmuseum. Skriftserie, nr. 12.
>                  Analecta reginensia, 1.
>  SUBJECT HEADINGS (Library of Congress; use s=):
>                  Kristina, Queen of Sweden, 1626-1689..
>  ------------------------------------------------------------------------------
>   LOCATION:              CALL NUMBER:               STATUS:
>   FINE ARTS              DL 719 .P72                Chkd-Out, Due: 06/01/2000
> 
> I can put a hold on it, if you like -- I need to put holds on a couple other
> things anyways. The other reference, unfortunately, is held in Special
> Collections, meaning I'd have to get over there during business hours. I'll
> see if I can make some time this week.

Wow! You've been more lucky than I - no hits so far in the libraries
whose catalogues I can access online (well, the ones I tried).

I presume, by implication, that you can read German.
The second reference should, according to my interpretation, be 
the more interesting one, but if anyone in the list knows that this
catalogue has already been combed by somone knowledgeable about the
VMs, he/she could save us some time by speaking up :-)

Cheers, Rene

From jim@mail.rand.org  Thu May 18 05:37:49 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id FAA84528
	for <reeds@fry.research.att.com>; Thu, 18 May 2000 05:37:49 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 1EBD74CE48; Thu, 18 May 2000 05:37:49 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id 6CCC74CE43
	for <reeds@research.att.com>; Thu, 18 May 2000 05:37:48 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id CAA15267; Thu, 18 May 2000 02:36:24 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id CAA27538; Thu, 18 May 2000 02:36:24 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id CAA29294 for <voynich@rand.org>; Thu, 18 May 2000 02:35:55 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id CAA27528 for <voynich@rand.org>; Thu, 18 May 2000 02:35:54 -0700 (PDT)
Received: from esacom43.esoc.esa.de (esacom43.esoc.esa.de [131.176.86.4]) by mail03-lax.pilot.net with ESMTP id CAA15351 for <voynich@rand.org>; Thu, 18 May 2000 02:35:53 -0700 (PDT)
Received: from esacom53.esoc.esa.de (esacom53.esoc.esa.de [131.176.85.6])
	by esacom43.esoc.esa.de (8.9.2/8.9.2/ESA-ESOC-v1.8) with ESMTP id JAA19195
	for <voynich@rand.org>; Thu, 18 May 2000 09:13:30 GMT
Received: from voynich.nu (dcla4.dev.esoc.esa.de [131.176.58.162])
	by esacom53.esoc.esa.de (8.9.2/8.9.2/ESA-ESOC-mail-gw-v1.5) with ESMTP id JAA29104
	for <voynich@rand.org>; Thu, 18 May 2000 09:35:32 GMT
Sender: rzandber@mail-gw.esoc.esa.de
Message-ID: <3923B957.C871F919@voynich.nu>
Date: Thu, 18 May 2000 09:35:19 +0000
From: Rene Zandbergen <rene@voynich.nu>
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.61 [en] (X11; I; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: An inventory of the collections of Rudolpf II
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

I wrote

> I presume, by implication, that you can read German.

and Karl:

> Actually, not a blessed word -- but I can fax or mail copies to someone
> who can...

I see :-)

The first reference (analecta reginensia) contains a section
(in German) about the Inventory of only four or so pages.

On the other hand the second book (Bauer & Haupt) is likely to
be much longer and copying is likely to be unfeasible.
(Note that the original MS has >400 pages and it isn't excluded
that the book contains the transcription....).

If it isn't too much of a hassle it might be worth just having
a look at it and sending us a short description. At the same
time I will provide a translation of the complete reference I
found.
     Cheers, Rene

From reeds Sat May 20 17:11:13 2000
From: reeds@fry.research.att.com (Jim Reeds)
Message-Id: <1000520171113.ZM2676015@fry.research.att.com>
Date: Sat, 20 May 2000 17:11:12 -0400
In-Reply-To: Karl Kluge <kckluge@eecs.umich.edu>
        "Re: John Chadwick (Linear B) of corpus size. Comments invited." (May 12, 15:22)
References: <391B9DBF.87F@alphalink.com.au>  <391C58CE.6AB55D04@voynich.nu> 
	<200005121922.PAA17820@yarf.eecs.umich.edu>
X-Mailer: Z-Mail (4.0.1 13Jan97)
To: Karl Kluge <kckluge@eecs.umich.edu>, voynich@rand.org
Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: OR

I've been away on a trip, and only just saw this correspondence
about Chadwick's formula.  I agree with Jacques, it would be
horrible if "Chadwick's formula" were to be presented to the
world in a widely distributed popular book.  I think Andras
is right about the genesis of the n^2 formula: if one is to
make use of bigram counts (as Kober and Ventris did) one needs
enough data that the bigram count distribution is distinguishable
from some uninteresting null hypothesis distribution, and the
traditional rule of thumb for this is that the expected count per 
cell should be 5 or more.  (I am sure this is a pessimistic rule
of thumb, but the principle that the bigram distribution be
noticeably different from the null hypothesis is sound.)

On May 12, 15:22, Karl Kluge wrote:

> Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
> 
> I seem to recall Jim Reed's paper on Trithemus containing reference to
> the number of characters one needs (on info theoretic grounds) to solve
> a monoalphabetic. Jim?

This is the "unicity distance formula" of Claude Shannon,
"Communication Theory of Secrecy Systems", Bell Labs Technical
Journal, vol 28, 1949, pp. 656-715; see section 15.  A very clear
explanation of this is in a little book about information theory
by Gordon Raisbeck, which I seem to have mislaid or lost.  It
is cited in our Friedmans' "The Shakespearean Ciphers Examined",
pp. 22-26.  They refer to Shannon's section 16, which is about
validity of solution, which cites as examples of invalid
cryptographic solutions the Bacon-Shakespeare ciphers and the
Voynich MS.  See also essays by Cy Deavours and myself on
the "unicity distance" in Cryptologia, vol. 1,  numbers 1, and
respectively, both 1977.  And there is another by Martin
Hellman, in IEEE Trans Info Theory, May 1977.  (The Deavours 
and Reeds essays are anthologized in "Cryptology Yesterday, Today,
and Tomorrow", Artech House, 1987.)



-- 
Jim Reeds, AT&T Labs - Research
Shannon Laboratory, Room C229, Building 103
180 Park Avenue, Florham Park, NJ 07932-0971, USA

reeds@research.att.com, phone: +1 973 360 8414, fax: +1 973 360 8178

From jim@mail.rand.org  Mon May 22 09:55:02 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id JAA23168
	for <reeds@fry.research.att.com>; Mon, 22 May 2000 09:55:02 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id BA5E64CE5F; Mon, 22 May 2000 09:55:02 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-blue.research.att.com (Postfix) with ESMTP id 206084CE55
	for <reeds@research.att.com>; Mon, 22 May 2000 09:55:02 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id GAA11796; Mon, 22 May 2000 06:54:10 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id GAA28161; Mon, 22 May 2000 06:54:08 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id GAA05680 for <voynich@rand.org>; Mon, 22 May 2000 06:51:18 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id GAA27959 for <voynich@rand.org>; Mon, 22 May 2000 06:51:17 -0700 (PDT)
Received: from sun7.bham.ac.uk (sun7.bham.ac.uk [147.188.128.108]) by mail02-lax.pilot.net with ESMTP id GAA10872 for <voynich@rand.org>; Mon, 22 May 2000 06:51:16 -0700 (PDT)
Received: from ds13.bham.ac.uk ([147.188.72.20] helo=golem)
	by sun7.bham.ac.uk with esmtp (Exim 3.03 #1)
	id 12tsck-00006M-00
	for voynich@rand.org; Mon, 22 May 2000 14:52:02 +0100
From: "Gabriel Landini" <G.Landini@bham.ac.uk>
Organization: The University of Birmingham, UK.
To: voynich@rand.org
Date: Mon, 22 May 2000 14:49:47 +0100
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Zipf's law
Reply-To: G.Landini@bham.ac.uk
Message-ID: <3929490B.19986.145A665@localhost>
X-Confirm-Reading-To: G.Landini@bham.ac.uk
X-pmrqc: 1
X-mailer: Pegasus Mail for Win32 (v3.12c)
Sender: jim@mail.rand.org
Status: OR

Hi all,
I've done a randomised version (well actually 2) of the vms to see 
whether Zipf's law still applies and also corroborate this feeling that 
the random versions should have very different token & word 
distributions.

I also revised the old Zipf's text to comply with the common 
meaning of word and token and which I mixed up.
The new bit is at the bottom of the document as Addenda.

http://web.bham.ac.uk/G.Landini/evmt/zipf.htm

All comments are welcome.

cheers,
Gabriel

From jim@mail.rand.org  Mon May 22 18:27:52 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id SAA56248
	for <reeds@fry.research.att.com>; Mon, 22 May 2000 18:27:52 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 682201E039; Mon, 22 May 2000 18:27:52 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id 6F31F1E038
	for <reeds@research.att.com>; Mon, 22 May 2000 18:27:51 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id PAA00863; Mon, 22 May 2000 15:23:37 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA11959; Mon, 22 May 2000 15:23:35 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id PAA12301 for <voynich@rand.org>; Mon, 22 May 2000 15:21:37 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA11741 for <voynich@rand.org>; Mon, 22 May 2000 15:21:36 -0700 (PDT)
Received: from mailout04.sul.t-online.com (mailout04.sul.t-online.com [194.25.134.18]) by mail03-lax.pilot.net with ESMTP id PAA08674 for <voynich@rand.org>; Mon, 22 May 2000 15:21:35 -0700 (PDT)
Received: from fwd02.sul.t-online.de 
	by mailout04.sul.t-online.com with smtp 
	id 12u0Zq-0002Jt-0A; Tue, 23 May 2000 00:21:34 +0200
Received: from Noname (0625764225-0001@[193.159.4.17]) by fwd02.sul.t-online.de
	with esmtp id 12u0Zl-0McxWaC; Tue, 23 May 2000 00:21:29 +0200
Message-ID: <3929B3A0.E277186C@voynich.nu>
Date: Tue, 23 May 2000 00:24:32 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: Inventory of Rudolf's museum (Kunstkammer)
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

Dear all,

the German text of the reference to the handwritten, contemporary
catalogue of Rudolf's museum may be found at:
http://www.geocities.com/voynichms/kunstkammer.html

I then offered it to Babelfish. The result is interesting
enough to be forwarded unmodified, as attached.

Cheers, Ren

------------------------------------


DANIEL FROESCHL 
Augsburg 1573 - Prague 1613 

Inventory of the art chamber emperor Rudolfs II. 
1607 - 1611 

Cover: Perkament over cardboard; 
19.5 x 34 cm 
Blind pressing: three-way line iron lines and ornamentation strips,
pressed in with the rolerole role 
Paper block: 415 paper sheets. The former binding pages torn off 
On the inside of the Vordeckels: Ex libris Liechtensteinianis 
Literature: Neumann 1966, 262-265; Farmer head 1976 (edition) 

The inventory has the title: > of Anno 1607. Verzaichnus, which contains
Nachtraege of the
Roem:Kay:May:Kunstkammer into found worden.< it into the year 1611. As
an author the painter Daniel Froeschl was
determined, which was since 1 May 1607 successors Ottavio Stradas as
imperial Antiquarius. The inventory was
verschollen, to it Gustav William, who former director of the art
collections of the governing prince of Liechtenstein, after
which discovered the Second World War. It made a Transkription to 1947
and transferred Erwin Neumann 1956 the
edition. Numerous object identifications arrive to this, but died he all
too-early. The publication took place finally 1976. 

The inventory changed the old conceptions over Rudolf II. as collecting
tank thoroughly. If one had predominantly limited
beforehand Rudofs interests on Mirabilia (miracle things), Rara and
strange things - which he naturally also had - and him
only as painting collecting tank estimated, then now the picture of the
universalists and Aestheten stepped into the
foreground. Fitter had judged its collections still as a multicolored
and adventurous mixture without any methodical
tendency (fitters 1908, 76 - 80). The inventory shows the universal
requirement of a enzyklopaedischen art chamber on
highest level now. It contains Spezimina from the most diverse areas of
nature (Naturalia), the art and technical skill
(Artefacta) and the science (Scientifica) and is essentially arranged
according to these classes. The art chamber was thus a
mirror of the universe, a mikrokosmos, whose center of the emperors was.
>From the elitaeren position of the Kaisertums
Rudolf derived the requirement on the noblest and rarest gifts of nature
as well as on the most admirable ones and most
precious bringing out of human abilities. Behind the interest in that
variety of the things and appearances was a
pansophische longing after a universal system, which connects everything
in one > concordia discors< (harmony in
apparent discord) - like the universal emperor idea link of the realm
was. 

The inventory contains only the art chamber in the closer sense. Are
missing rudolfinische inventories of the paintings, the
splendor weapons, the Regalia and pieces of treasure, the Gemmen and
Antiken, the tapisserien and pieces of configuration
of the premises.

From jim@mail.rand.org  Mon May 22 18:34:54 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id SAA13953
	for <reeds@fry.research.att.com>; Mon, 22 May 2000 18:34:54 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 3E6564CE14; Mon, 22 May 2000 18:34:53 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id A09CF4CE07
	for <reeds@research.att.com>; Mon, 22 May 2000 18:34:52 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id PAA04714; Mon, 22 May 2000 15:30:29 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA12479; Mon, 22 May 2000 15:30:27 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id PAA13481 for <voynich@rand.org>; Mon, 22 May 2000 15:28:50 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA12363 for <voynich@rand.org>; Mon, 22 May 2000 15:28:49 -0700 (PDT)
Received: from mailout01.sul.t-online.com (mailout01.sul.t-online.com [194.25.134.80]) by mail02-lax.pilot.net with ESMTP id PAA13239 for <voynich@rand.org>; Mon, 22 May 2000 15:28:48 -0700 (PDT)
Received: from fwd02.sul.t-online.de 
	by mailout01.sul.t-online.com with smtp 
	id 12u0gp-0006bR-00; Tue, 23 May 2000 00:28:47 +0200
Received: from Noname (0625764225-0001@[193.159.4.17]) by fwd02.sul.t-online.de
	with esmtp id 12u0gm-0SZR3oC; Tue, 23 May 2000 00:28:44 +0200
Message-ID: <3929B553.54C38A6A@voynich.nu>
Date: Tue, 23 May 2000 00:31:47 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: Oops
X-Priority: 3 (Normal)
References: <3929B3A0.E277186C@voynich.nu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

I wrote:

> the German text of the reference to the handwritten, contemporary
> catalogue of Rudolf's museum may be found at:
> http://www.geocities.com/voynichms/kunstkammer.html

Actually:

http://www.geocities.com/voynichms/inventory.html 

In case anyone wants to check it out.

Cheers, Rene

From jim@mail.rand.org  Wed May 24 15:46:50 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id PAA43752
	for <reeds@fry.research.att.com>; Wed, 24 May 2000 15:46:50 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id E43951E00C; Wed, 24 May 2000 15:46:49 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-green.research.att.com (Postfix) with ESMTP id 5E77F1E007
	for <reeds@research.att.com>; Wed, 24 May 2000 15:46:49 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id MAA07038; Wed, 24 May 2000 12:41:51 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA21684; Wed, 24 May 2000 12:41:49 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id MAA14385 for <voynich@rand.org>; Wed, 24 May 2000 12:39:39 -0700 (PDT)
Received: from mail02-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id MAA21510 for <voynich@rand.org>; Wed, 24 May 2000 12:39:38 -0700 (PDT)
Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.7.8]) by mail02-lax.pilot.net with ESMTP id MAA06008 for <voynich@rand.org>; Wed, 24 May 2000 12:39:15 -0700 (PDT)
Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.3/8.9.3) with ESMTP id QAA19987
	for <voynich@rand.org>; Wed, 24 May 2000 16:38:55 -0300 (EST)
Received: from coruja.dcc.unicamp.br (coruja.dcc.unicamp.br [143.106.24.80])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with ESMTP id QAA04860
	for <voynich@rand.org>; Wed, 24 May 2000 16:38:54 -0300 (EST)
Received: (from stolfi@localhost)
	by coruja.dcc.unicamp.br (8.8.5/8.8.5) id QAA02522;
	Wed, 24 May 2000 16:38:52 -0300 (EST)
Date: Wed, 24 May 2000 16:38:52 -0300 (EST)
Message-Id: <200005241938.QAA02522@coruja.dcc.unicamp.br>
From: Jorge Stolfi <stolfi@ic.unicamp.br>
To: voynich@rand.org
Subject: The letters <p> and <f>, again
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
Reply-To: stolfi@ic.unicamp.br
Sender: jim@mail.rand.org
Status: OR


Hi folks,

You may recall my observation, several months ago, that there seemed
to be two variants of the EVA letters <p> and <f>, distinguished by
the presence or absence of a "hook" at the end of the horizontal arm.

The main argument for making that distinction is the "pronunciation
table" on f66r, where the two variants of <f> are listed next to each
other in the middle column, and then "exemplified" with two successive
words on the left column (and also on the right-hand text).

[Fancy versions?]

On the other hand, we all know that <p> and <f> are almost (but not
exclusively) found on paragraph-initial lines, where <k> and <t> seem to
be comparatively rare; and that the four letters tend to occur in
similar contexts (e.g. as parts of "platform gallows").

These facts, besides their shapes, strongly suggest that <p> and <f>
are basically ornate variants of <t> and <k>. But then the
hooked/straight distinction would have no parallel in <k> and <t>.

[Statistical differences]

Now, while reviewing the statistics of the "crust/mantle/core" word
paradigm, I just noticed another detail which may be relevant to that
question: namely, the EVA letter <e> essentially *never* --- with 1
exception in the whole book --- follows <p> and <f>, whereas it is
pretty common after <k> and <t>. (I believe Currier already had
noticed that.)

Here you can find a tabulation of the "mantle suffixes" --- essentially,
the <e>/<ch>/<sh>/ combinations that can follow each of the four gallows
letters:
http://www.dcc.unicamp.br/~stolfi/voynich/Notes/057/ktpf-cmp.txt 

[Platform gallows don't matter]

That table shows that the distributions of mantle suffixes after the
*platformed* gallows <ckh>, <cth>, <cph>, and <cfh> are are extremely
similar, even for the <e> suffix. 

We can explain this fact by assuming that the gallows conceptually
lies "inside" the platform, and is effectively shielded by it; hence
all four combinations are equivalent to <ch> as far as the following
letter is concerned.  So let's leave platformed gallows aside for now.

[Naked gallows are peculiar]

The suffix distributions after "naked" gallows, however, are quite
different between the "normal" and "fancy" versions. Below is a
condensed version of the table, restricted to the few significant
suffixes. (The rest, I believe, are mostly accidents created by
omission of word spaces.) The fractions, as in the big table, are
relative to total for each gallows:

     mantle suffix
     ----------------------------------------------------------
     -     e     ech   ch    sh    ee    che   she   eee   chee
     ----  ----  ----  ----  ----  ----  ----  ----  ----  ----
  t  .446  .168  .003  .112  .016  .128  .052  .012  .008  .004
  k  .443  .167  .005  .070  .010  .206  .032  .008  .017  .002  
  p  .290  ....  ....  .232  .019  ....  .269  .028  ....  .026
  f  .386  ....  ....  .234  .024  ....  .193  .017  .006  .027


[About <ee> and <ch>]

Note first the anomalously high counts for <ch> (but not <sh>) after
<p> and <f>, and the total absence of <ee> --- which is quite common
after <k> and <t>.

However, note that the combined frequencies of <ch>+<ee> after <k> and
<t> are quite close to the frequencies of <ch> only after <p> and <f>.
A possible explanantion for these numbers is that <ee> is actually
<ch> with the ligature omitted. (Or, possibly, <ch> is actually <ee>
with a ligature added to resolve ambiguities.) If that is the case,
then it seems plausible that, in those contexts where <k> and <t> are
replaced by fancier versions <f> and <p>, the scribe would also be
more careful about "crossing his <ee>s" with a ligature.

However this explanation fails to account for the <che>, <she>, and
<eee> suffixes. True, we see a relative absence of <eee> after <p> and
<f>, and an increase in the frequency of <che>; but the latter is way
too large to be explained as a "transmutation" of <eee> into <che>.
Perhaps some of the <k> and <t> mutated into <pche> and <fche>? Or is
there a more complex dance going on?

[About <e> and <ech>]

The point of this message is actually note the absence of suffixes <e>
and <ech> after <p> and <f>, contrasting with their extreme popularity
after <k> and <t>.

Given my theory that an isolated <e> is always a modifier for the
preceding letter, I think that these two columns are at least
consistent with each other: we can summarize both by saying the
letters <p> and <e> do not accept the <e> modifier (unlike <k> and
<t>).

[Hooked arm is an extra <e>?]

It remains to explain this difference between fancy gallows and regular
gallows? Well, perhaps the difference doesn't
actually exist. An intriguing possibility is that the hooked-arm
<p> and <f> are actually fancy versions of <pe> and <fe>.

Note that even though the hook comes physically before the <p>/<f>
body, it comes *after* it temporally: hence it is not unreasonable to
read <pe>/<fe>, as opposed to <ep>/<ef>. I believe that there are
plenty of examples of such "non-linear fancifications" in medieval
manuscripts.

[Hooked arms on platformed gallows]

A complication of this theory is that the hooked arms often occur on
platformed gallows. For example. On page f1r, line 13, we read

  shoy.ckhey.kodaiin.cphy.cphodaiils.cthey.she.oldain.d

The first <p> has a hooked arm, the second one is straight.

So we need readings for those combinations too. My current guess is
that hooked-<cph> should be read as <cphe>. Note that if we apply this
guess to the above example, we get an intriguing "alliteration":

  ckhey ... cphey ... cthey

If the "fancy variant" theory is true, that should be read as

  ckhey ... cthey ... cthey

For whatever it is worth.

I will try to get a larger sample of alternative "hook = <e>" 
readings.  From a few examples I have seen, however, it seems 
that the new reading is sometimes a plausible word, sometimes
not quite.  So, again, the truth may be more complicated.

Any comments?
  
All the best,

--stolfi

From jim@mail.rand.org  Wed May 24 21:16:29 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id VAA52346
	for <reeds@fry.research.att.com>; Wed, 24 May 2000 21:16:11 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id DD1F51E09A; Wed, 24 May 2000 21:16:10 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail03-lax.pilot.net (mail-lax-3.pilot.net [205.139.40.17])
	by mail-green.research.att.com (Postfix) with ESMTP id 5706A1E007
	for <reeds@research.att.com>; Wed, 24 May 2000 21:16:10 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail03-lax.pilot.net with ESMTP id SAA01993; Wed, 24 May 2000 18:15:42 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id SAA15449; Wed, 24 May 2000 18:15:41 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id SAA22052 for <voynich@rand.org>; Wed, 24 May 2000 18:15:29 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id SAA15439 for <voynich@rand.org>; Wed, 24 May 2000 18:15:29 -0700 (PDT)
Received: from yarf.eecs.umich.edu (yarf.eecs.umich.edu [141.213.12.211]) by mail03-lax.pilot.net with ESMTP id SAA01875 for <voynich@rand.org>; Wed, 24 May 2000 18:15:28 -0700 (PDT)
Received: (from kckluge@localhost)
	by yarf.eecs.umich.edu (8.9.3/8.9.1) id VAA08381;
	Wed, 24 May 2000 21:15:23 -0400 (EDT)
Date: Wed, 24 May 2000 21:15:23 -0400 (EDT)
Message-Id: <200005250115.VAA08381@yarf.eecs.umich.edu>
From: Karl Kluge <kckluge@eecs.umich.edu>
To: voynich@rand.org
In-reply-to: <200005241938.QAA02522@coruja.dcc.unicamp.br> (message from Jorge
	Stolfi on Wed, 24 May 2000 16:38:52 -0300 (EST))
Subject: Re: The letters <p> and <f>, again
References:  <200005241938.QAA02522@coruja.dcc.unicamp.br>
Sender: jim@mail.rand.org
Status: OR


Stolfi wrote:

> On the other hand, we all know that <p> and <f> are almost (but not
> exclusively) found on paragraph-initial lines, where <k> and <t> seem to
> be comparatively rare; and that the four letters tend to occur in
> similar contexts (e.g. as parts of "platform gallows").
> 
> These facts, besides their shapes, strongly suggest that <p> and <f>
> are basically ornate variants of <t> and <k>. But then the
> hooked/straight distinction would have no parallel in <k> and <t>.

According to my transcription alphabet table

EVA	  Currier
p	  B
f	  V
k	  F
t	  P

D'Imperio points to f57r where there is a repeated sequence of characters,
except in one sequence V occurs for B. On that basis, she suggests that
perhaps B = V (and therefore, possibly F = P) (i.e., p = f and k = t
rather than p = t and f = k).

Karl

From jim@mail.rand.org  Thu May 25 04:42:41 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id EAA22962
	for <reeds@fry.research.att.com>; Thu, 25 May 2000 04:42:26 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 650D24CE3A; Thu, 25 May 2000 04:42:21 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail02-lax.pilot.net (mail-lax-2.pilot.net [205.139.40.16])
	by mail-blue.research.att.com (Postfix) with ESMTP id 2289F4CE20
	for <reeds@research.att.com>; Thu, 25 May 2000 04:42:07 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail02-lax.pilot.net with ESMTP id BAA15358; Thu, 25 May 2000 01:41:20 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id BAA26730; Thu, 25 May 2000 01:41:20 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id BAA05089 for <voynich@rand.org>; Thu, 25 May 2000 01:41:09 -0700 (PDT)
Received: from mail01-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id BAA26718 for <voynich@rand.org>; Thu, 25 May 2000 01:41:08 -0700 (PDT)
Received: from sun7.bham.ac.uk (sun7.bham.ac.uk [147.188.128.108]) by mail01-lax.pilot.net with ESMTP id BAA23922 for <voynich@rand.org>; Thu, 25 May 2000 01:41:07 -0700 (PDT)
Received: from ds13.bham.ac.uk ([147.188.72.20] helo=golem)
	by sun7.bham.ac.uk with esmtp (Exim 3.03 #1)
	id 12utDF-0002HW-01
	for voynich@rand.org; Thu, 25 May 2000 09:41:54 +0100
From: "Gabriel Landini" <G.Landini@bham.ac.uk>
Organization: The University of Birmingham, UK.
To: voynich@rand.org
Date: Thu, 25 May 2000 09:39:37 +0100
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Re: The letters <p> and <f>, again
Reply-To: G.Landini@bham.ac.uk
Message-ID: <392CF4D9.2549.257D09@localhost>
In-reply-to: <200005241938.QAA02522@coruja.dcc.unicamp.br>
X-mailer: Pegasus Mail for Win32 (v3.12c)
Sender: jim@mail.rand.org
Status: OR

On 24 May 2000, at 16:38, Jorge Stolfi wrote:
> Any comments?
In my first draft I get more *pe* and *fe* although I agree that their 
occurrence is extremely low. (one of each):
_pe_
 1 tope
 1 peshy
 1 ypeeeg
 1 shepe
 1 peedeedy
 1 peed
 1 pechol
 1 opehaldg
 1 opeesy
 1 opeedy
 1 peedy
 1 qopeeedar
 1 peeo
 1 opees
 1 chepeeees  
 1 chepeedy
 1 cheepeedy
 1 apeeety

_fe_
 1 ofee
 1 ofe*edar
 1 feedly
 1 aifehey

As I said this is not checked.
 
The other comment is that there are "in between" forms as well 
(with and without the hooks) which complicates things a bit.

Cheers,
Gabriel



From jim@mail.rand.org  Thu May 25 10:02:00 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-green.research.att.com (mail-green.research.att.com [135.207.30.103])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id KAA65385
	for <reeds@fry.research.att.com>; Thu, 25 May 2000 10:01:45 -0400 (EDT)
Received: by mail-green.research.att.com (Postfix)
	id 530851E02A; Thu, 25 May 2000 10:01:41 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-green.research.att.com (Postfix) with ESMTP id A30001E029
	for <reeds@research.att.com>; Thu, 25 May 2000 10:01:40 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id HAA21469; Thu, 25 May 2000 07:00:37 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id HAA05187; Thu, 25 May 2000 07:00:37 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id HAA13995 for <voynich@rand.org>; Thu, 25 May 2000 07:00:20 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id HAA05153 for <voynich@rand.org>; Thu, 25 May 2000 07:00:19 -0700 (PDT)
Received: from esacom43.esoc.esa.de (esacom43.esoc.esa.de [131.176.86.4]) by mail03-lax.pilot.net with ESMTP id HAA25811 for <voynich@rand.org>; Thu, 25 May 2000 07:00:18 -0700 (PDT)
Received: from esacom53.esoc.esa.de (esacom53.esoc.esa.de [131.176.85.6])
	by esacom43.esoc.esa.de (8.9.2/8.9.2/ESA-ESOC-v1.8) with ESMTP id NAA25858
	for <voynich@rand.org>; Thu, 25 May 2000 13:37:32 GMT
Received: from voynich.nu (dcla4.dev.esoc.esa.de [131.176.58.162])
	by esacom53.esoc.esa.de (8.9.2/8.9.2/ESA-ESOC-mail-gw-v1.5) with ESMTP id NAA21991
	for <voynich@rand.org>; Thu, 25 May 2000 13:59:32 GMT
Sender: rzandber@mail-gw.esoc.esa.de
Message-ID: <392D31C2.5E6DEF4F@voynich.nu>
Date: Thu, 25 May 2000 13:59:30 +0000
From: Rene Zandbergen <rene@voynich.nu>
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.61 [en] (X11; I; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: The letters <p> and <f>, again
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: O

Hi all,

Stolfi's table shows that one can usually exchange <k> for <t> (or v.v.)
and come up with a valid Voynich word. This is also true for the pair
<f> and <p>. At the same time, it shows that you cannot exchange
{<k> or <t>} for {<f> or <p>} (although there may be exceptions).

The appearance of <f> and <p> at top lines of paragraphs only
(virtually),
should remind us of the gallows in the letter shown in Capelli, where
these
are purely ornamental additions to existing letters at the top and
bottom lines only. Do we get valid Voynich words if the f's and p's
are simply removed? Or are they ornate variations of other letters?

On a more frivolous note, having recently been to Prague I find
it irresistable not to learn a bit more about the Czech language.
(I can already say: "do not enter or leave the train, the doors
are about to close") :-)
To the point. Czech has a number of orthographic rules which remind
me a bit of some of the observations made by Stolfi (no 'e' after
'f' or 'p').
I'm not saying that Voynichese is Czech, but if it is an invented
language, perhaps it was invented by someone who knew the Czech
language very well... Perhaps this is a feature of other Slavionic
lanugages too...
(Although I should add that I have no idea what the pronounciation
or spelling rules were in the 15th or 16th C.)

Cheers, Rene 


-- 
Rene

From jim@mail.rand.org  Sun May 28 18:06:59 2000
Return-Path: <jim@mail.rand.org>
Received: from mail-blue.research.att.com (mail-blue.research.att.com [135.207.30.102])
	by fry.research.att.com (980427.SGI.8.8.8/8.8.7) with ESMTP id SAA58437
	for <reeds@fry.research.att.com>; Sun, 28 May 2000 18:06:59 -0400 (EDT)
Received: by mail-blue.research.att.com (Postfix)
	id 8CFEF4CE0E; Sun, 28 May 2000 18:06:59 -0400 (EDT)
Delivered-To: reeds@research.att.com
Received: from mail01-lax.pilot.net (mail-lax-1.pilot.net [205.139.40.18])
	by mail-blue.research.att.com (Postfix) with ESMTP id 012B14CE01
	for <reeds@research.att.com>; Sun, 28 May 2000 18:06:58 -0400 (EDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail01-lax.pilot.net with ESMTP id PAA22512; Sun, 28 May 2000 15:06:30 -0700 (PDT)
Received: from mail.rand.org (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA05031; Sun, 28 May 2000 15:06:30 -0700 (PDT)
Received: from mail01-rand.pilot.net (unknown-23-138.pilot.net [204.48.23.138]) by mail.rand.org (8.9.3/8.9.3) with ESMTP id PAA14935 for <voynich@rand.org>; Sun, 28 May 2000 15:05:51 -0700 (PDT)
Received: from mail03-lax.pilot.net (localhost [127.0.0.1]) by mail01-rand.pilot.net (8.8.5/8.8.5) with ESMTP id PAA05009 for <voynich@rand.org>; Sun, 28 May 2000 15:05:50 -0700 (PDT)
Received: from mailout00.sul.t-online.com (mailout00.sul.t-online.com [194.25.134.16]) by mail03-lax.pilot.net with ESMTP id PAA21054 for <voynich@rand.org>; Sun, 28 May 2000 15:05:49 -0700 (PDT)
Received: from fwd06.sul.t-online.de 
	by mailout00.sul.t-online.com with smtp 
	id 12wBBs-0001FL-01; Mon, 29 May 2000 00:05:48 +0200
Received: from Noname (0625764225-0001@[193.159.5.7]) by fwd06.sul.t-online.de
	with esmtp id 12wBBh-1B3EYKC; Mon, 29 May 2000 00:05:37 +0200
Message-ID: <393198F2.37BB784C@voynich.nu>
Date: Mon, 29 May 2000 00:08:50 +0200
From: Zandbergen@t-online.de (Rene Zandbergen)
Reply-To: rene@voynich.nu
X-Mailer: Mozilla 4.01 [de]C-DT  (Win95; I)
MIME-Version: 1.0
To: voynich@rand.org
Subject: Re: The letters <p> and <f>, again
X-Priority: 3 (Normal)
References: <392D31C2.5E6DEF4F@voynich.nu> <200005262227.TAA08368@coruja.dcc.unicamp.br>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Sender: 0625764225-0001@t-dialin.net
Sender: jim@mail.rand.org
Status: OR

Stolfi wrote:

> Indeed, swapping <k> for <t> in a whole word generally produces
> another valid word.  Ditto for swapping <p> with <f>.  *But these
> swaps change the word frequencies substantially*, and there
> doesn't seem to be a clear pattern.
> 
> So I would rather believe that <k> and <t> are "phonetically" similar,
> but semantically distinct (like "t" and "d" in Spanish or Italian,
> say), and that the vocabulary is so "dense" that almost any
> phonetically valid string is a common word.

Well, this observation would make perfect sense if the VMs were a
rather phonetic rendering of some language. In that case 'oteey'
and 'okeey' are the same but only pronounced differently.
But that would reduce the size of the vocabulary from 'good' to
'rather small' for the size of the text. Especially if combined
with all the other cases of 'similar' letters occurring in 
similar locations. 

>     > The appearance of <f> and <p> at top lines of paragraphs only
>     > (virtually), should remind us of the gallows in the letter shown
>     > in Capelli, where these are purely ornamental additions to
>     > existing letters at the top and bottom lines only.
> 
> Surely <k> and <t> are not ornamental; their distribution is too
> consistent for that.

Absolutely.
And this is probably significant. p/f are like k/t in the way their
shapes
distinguish themselves from all the other letters (except q - Currier
4).
At the same time k/t behave normally while p/f don't.
What did the person(s) who made this up have in mind?

>     > Do we get valid Voynich words if the f's and p's are simply
>     > removed?
> 
> Good question.
> 
> Here is a tentative answer. 
> Below are the words with <p>/<f> gallows,
> where both variants together have at least 5 occurrences in the book,
> for which omitting the gallows produces a word with less than 5
> occurrences. 

All these can be written with k/t instead of p/f !!

> And here are those where the <p> and <f> variants occur
> at least 5 times, and the gallows-less variant occurs
> more than 10 times:

A much longer list. Again, most of the time k/t are valid replacements
of p/f, except (or less frequently) when 'l' precedes the gallows.

> It seems that the <p> and <f> gallows are deletable when they
> occur at the beginning of the word (with or without
> platform.

Likely to be paragraph-starters...

> On the other hand, the instances that cannot be
> deleted are generally preceded by <o> or <qo>.

> On the other hand, here are some common <t>/<k> words where the
> gallows apparently can be omitted:

Another very long list. A special family of these is formed by the
platform or pedestalled gallows: (ckh, cth, cfh, cph) => ch.

> Of course, the fact that a given letter can be removed
> from many words does not mean that it is superflous. (Consider
> final "s" or "y" in English, 

What it all means in Voynichese is an open question. I would not
expect that letters that can be removed are superfluous. But this
is not a typical feature for any language I know (except for
some obvious cases at word endings as above).

>     > On a more frivolous note, having recently been to Prague I find
>     > it irresistable not to learn a bit more about the Czech language.
>     > (I can already say: "do not enter or leave the train, the doors
>     > are about to close") :-)
> 
> Let me see, I bet it it sounds something like
> "U concete prosm u vstup a nstup ..."  8-)

Close!
If this is from memory, I am impressed.
The initial 'u' is actually attached to the word concete.
This word is the verb 'stop', where the initial 'u' is attached as
a regular verb mode modifier (but don't ask me how regular). Other
such prefixes exist (na-, ne-). Intrigues me...

The bit that struck me most about Czech (but I know too little)
is that the consonants can be split up into 'hard' and 'soft' ones,
and this has some semi-regular consequences for orthograhy.
Also, there are certain fixed consonant change patterns included in
various grammatical rules. 
I know that this is too vague to make any sense, but I'll explain 
when I understand it better.

>     > To the point. Czech has a number of orthographic rules which remind
>     > me a bit of some of the observations made by Stolfi (no 'e' after
>     > 'f' or 'p').
> 
> Unfortunately many other languages have these rules, too.
> (Even... you know which one. 8-)

I Am Not A Liguist, as the saying goes, but I've browsed books dealing
with a variety of languages. Czech is the first major European language
which I've seen behave in suspicious ways, from a Voynich point of view.
Another 'suspicious language', for a very different reason, is Arabic
(I guess that could have been formulated more precisely....).

Cheers, Rene

