From koontz@spot.colorado.edu  Wed Oct  6 19:32:40 2004
Return-Path: <koontz@spot.colorado.edu>
X-Original-To: stolfi@brasilia.ic.unicamp.br
Delivered-To: stolfi@brasilia.ic.unicamp.br
Received: from localhost (localhost [127.0.0.1])
	by brasilia.ic.unicamp.br (Postfix) with ESMTP id 68BD910D63F
	for <stolfi@brasilia.ic.unicamp.br>; Wed,  6 Oct 2004 19:32:40 -0300 (BRST)
Received: from brasilia.ic.unicamp.br ([127.0.0.1])
 by localhost (brasilia.ic.unicamp.br [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 23460-01 for <stolfi@brasilia.ic.unicamp.br>;
 Wed,  6 Oct 2004 19:32:40 -0300 (BRT)
Received: from maceio.ic.unicamp.br (maceio.ic.unicamp.br [143.106.7.31])
	by brasilia.ic.unicamp.br (Postfix) with ESMTP id 06AAE10D627
	for <stolfi@brasilia.ic.unicamp.br>; Wed,  6 Oct 2004 19:32:40 -0300 (BRST)
Received: from spot.colorado.edu (daemon@spot.colorado.edu [128.138.129.2])
	by maceio.ic.unicamp.br (8.11.6/8.11.6) with ESMTP id i96MWcr07688
	for <stolfi@ic.unicamp.br>; Wed, 6 Oct 2004 19:32:38 -0300
Received: from localhost (koontz@localhost)
	by spot.colorado.edu (8.12.10/8.12.10/ITS-6.0/test) with ESMTP id i96MWc8X029471
	for <stolfi@ic.unicamp.br>; Wed, 6 Oct 2004 16:32:38 -0600 (MDT)
Date: Wed, 6 Oct 2004 16:32:38 -0600 (MDT)
From: Koontz John E <John.Koontz@Colorado.EDU>
To: Jorge Stolfi <stolfi@ic.unicamp.br>
Subject: Re: OP Text Archives
In-Reply-To: <33361.143.106.24.42.1097094767.squirrel@webmail.ic.unicamp.br>
Message-ID: <Pine.GSO.4.58.0410061556500.9711@spot.colorado.edu>
References: <Pine.GSO.4.58.0410060845520.4673@spot.colorado.edu>
 <33361.143.106.24.42.1097094767.squirrel@webmail.ic.unicamp.br>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Virus-Scanned: by amavisd-new at ic.unicamp.br
Status: RO
X-Status: 
X-Keywords:                 

On Wed, 6 Oct 2004, Jorge Stolfi wrote:
> > - Accented y (in Windows view) represents accent over the preceding vowel.
>
> I presume that you mean an acute accent, right?

Yes.  Sorry - acute is the default accent!  In any event, the character
rerpesenting accent is the odd-ball character that occurs once or twice in
most words, while the rarer one (still common) is raised n!  These looked
reasonable in a DOS version of the file, which gives you a timeline on the
preparation of the files ...

I think I decided to write accent as a separate character partly to avoid
any asymmetry with writing nasality that way.  I'd probably go with
accented vowels today.  Note that only i and o can be nasalized.  But
(unmarked) nasalized o is written a following m and n.  Students of this
subfamily of Siouan waffle over whether to write oN or aN and on whether
oN actually contrasts with aN.  Historically whatever it is derives from
*uN and *aN.

Up to a point you could get away writing nasalization as n (or m before
labials) after a vowel, but Siouanists don't do this for two reasons.
One is that in some languages VN finally contrasts with Vn and Vm, e.g.,
in Many Dakota or "Sioux" dialects.  This does not apply in Omaha-Ponca's
subfamily, but it does happen in OP that you get sequences like aNa
(otherwise oNa) that contrast with aNna and anaN (oNna and anoN).  In
fact, whereas in European languages with nasal vowels nasality always
seems to develop historically from postvocalic n and/or m, in Siouan
languages it is the reverse, and mV and nV seem to develop from wVN and
rVN.  For example, Omaha-Ponca aNthaN (th being the r here), a common
sequence in some verb paradigms, comes out aNnaN in Osage.

In Siouan languages that have lost nasality in nasal vowels you only get m
and n as allophones of w and r in, for example, initial position, and half
the time the speakers just say w and r.  It used to give the early
transcribers conniptions.

Somewhat similar situations exist in many Amazonian languages.

> > - This text suppresses the difference between tense or geminate stops (pp,
> > tt, cc, kk) and aspirated ones (ph, th, ch, kh).  Dorsey represents it
> > only sporadically, and as I intended this version for pre-XML Web use, I
> > gritted my teeth and consolidated.
>
> Do you mean that all ph etc. were mapped to pp, or that the
> files were just concatenated so usage is presently inconsistent?

All ph and pp mapped to p.  All th and tt mapped to t.  Etc.  So the
geminate/aspirate constrast is completely neutralized.  Aspirates are far
rarer than geminates in the vocabulary, though some aspirates are high
frequency in text, e.g., the [tHe] 'the [inanimate, vertically extended]'.
Note that the future marker is tte ~ tta written te ~ ta in this file.
If this merger is not the case, then I need to look again at the files!

I hope I was clear about the actual "th" in the files representing,
essentially r (or l), which sounds like English voiced "th."  It seems
in fact to be a retroflex lateral as opposed to an interdental, but the
sound quality is similar.

> > Ch or c, whichever I used are cc/ch (tense and aspirated "ch").
>
> Do you mean both "ch" and "c" occur in the files, or only one of them?

I believe I reduced all ts^ affricates (otherwise c^) (^ representing
hacek) to ch, representing both c^h (aspirated c^) and c^c^ (geminate/
tense c^).  Surprising as it may seem, it is possible to have unaspirated,
aspirated, and geminate/tense variants of c^.

Incidentally, geminate is probably clear enough - a la Italian - but tense
refers to a somewhat mysterious quality that distinguishes pp etc. from p
etc. in initial position.  In Osage (closely related to Omaha-Ponca) this
mysterious quality is preaspiration, i.e., pp etc. are hp etc.
Omaha-Ponca pp etc. are not preaspirated, but simple p etc. do not occur
initially, the simple or unaspirated or lax series being replaced by
voiced forms b etc.  So in Omaha-Ponca the mysterious distinguishing
quality is voicelessness.  But there may be something more, e.g., perhaps
some affect on the sound quality of the following vowel.  We're not sure.

Anyway, one would never mistake /tte/ for /tHe/ unless one were an English
speaker, which, of course, the first transcribers were.

