# Last edited on 2004-02-25 18:35:45 by stolfi
#
# ACIP-JS Encoding for transcription of the Tibetan script
# including Tibetanized Sanskrit.
#
# This is a slight modification of the encoding used the ACIP (Asian
# Classics Input Project) http://www.asianclassics.org/
#
# The table says that "W" is a letter byt "V" is not, yet
# in two files the opposite seems to be true.
#
#
# Consonants (TIBETAN_LETTER_xxx)
#
# ACIP | Unicode UnicodeName | Appearance+Obs
# ---- | --------- --------------- | --------------
# A | 0F68 A | Base for dependent vowels.
# B | 0F56 BA |
# BH | 0F57 BHA | = 0F56+0FB7
# C | 0F45 CA |
# CH | 0F46 CHA |
# D | 0F51 DA |
# DH | 0F52 DHA | = 0F51+0FB7
# DZ | 0F5B DZA |
# DZH | 0F5C DZHA | = 0F5B+0FB7
# G | 0F42 GA |
# GH | 0F43 GHA |
# H | 0F67 HA |
# J | 0F47 JA |
# K | 0F40 KA |
# KH | 0F41 KHA |
# Ksh | 0F69 KSSA | = 0F40+0FB5
# L | 0F63 LA |
# M | 0F58 MA |
# N | 0F53 NA |
# NG | 0F44 NGA |
# NY | 0F49 NYA |
# P | 0F54 PA |
# PH | 0F55 PHA |
# R | 0F62 RA |
# S | 0F66 SA |
# SH | 0F64 SHA |
# T | 0F4F TA |
# TH | 0F50 THA |
# TS | 0F5A TSHA |
# TZ | 0F59 TSA |
# W | 0F5D WA | "V" in some files?
# Y | 0F61 YA |
# Z | 0F5F ZA |
# ZH | 0F5E ZHA |
# ` | 0F60 -A |
# d | 0F4D DDA |
# dH | 0F4E DDHA | = 0F4C+0FB7
# n | 0F4E NNA |
# sh | 0F65 SSA |
# t | 0F4A TTA |
# th | 0F4B TTHA |
#
#
# Vowels after consonants (TIBETAN_VOWEL_SIGN_xxx)
#
# ACIP | Unicode UnicodeName | Appearance+Obs
# ---- | --------- --------------- | --------------
# A | ---- -- | No mark.
# 'A | 0F71 AA | "2" below.
# I | 0F72 I | Left-hook above.
# 'I | 0F73 II | Left-hook above and "2" below = 0F71+0F72.
# U | 0F74 U | Right-pipe below.
# 'U | 0F75 UU | Right-pipe below and "2" below = 0F71+0F74.
#
# E | 0F7A E | Spout above.
# EE | 0F7B EE | Spout-doubled above.
# O | 0F7C O | Wings above.
# OO | 0F7D OO | Wings-doubled above.
#
# Ri | 0F76 VOCALIC_R | Spout below and right-hook above = 0FB2+0F80.
# R'i | 0F77 VOCALIC_RR | Spout below and right-hook above and "2" below ~ 0FB2+0F81.
#
# Li | 0F78 VOCALIC_L | "21" below and right-hook above. = 0FB3+0F80
# L'i | 0F79 VOCALIC_LL | "21" below and right-hook above and "2" below = 0FB3+0F81.
#
#
# Vowels in initial position: (TIBETAN_LETTER_xxx + TIBETAN_VOWEL_SIGN_yyy)
#
# ACIP | Unicode UnicodeName | Appearance+Obs
# ---- | --------- --------------- | --------------
# AA | 0F68 AA | Like "6V".
# A'A | 0F68+0F71 AA+ | AA with "2" below.
# AI | 0F68+0F72 AA+I | AA with left-hook above.
# A'I | 0F68+0F73 AA+II | AA with left-hook above and "2" below.
# AU | 0F68+0F74 AA+U | AA with right-pipe below.
# A'U | 0F68+0F75 AA+UU | AA with right-pipe below and "2" below.
# |
# AE | 0F68+0F7A AA+E | AA with spout above.
# AEE | 0F68+0F7B AA+EE | AA with spout-doubled above.
# AO | 0F68+0F7C AA+O | AA with wings above.
# AOO | 0F68+0F7D AA+OO | AA with wings-doubled above.
# |
# Ri | 0F62+0F76 RA+VOCALIC_R | RA with spout base and right-hook above.
# R'i | 0F62+0F77 RA+VOCALIC_RR | RA with spout base and "2" below.
# |
# Li | 0F63+0F78 LA+VOCALIC_L | LA with right-hook above.
# L'i | 0F63+0F79 LA+VOCALIC_LL | LA with "2" below.
#
#
# Postfix vowel modifiers (TIBETAN_SIGN_xxx)
#
# ACIP | Unicode UnicodeName | Appearance+Obs
# ---- | --------- --------------- | --------------
# m | 0F7E RJES_SU_NGA_RO | App: Circle above letter.
# | | Eqv: Sanskrit anusvara.
#
# : | 0F7F RNAM_BCAD | App: Colon with open circles at left.
# | | Eqv: Sanskrit visarga.
#
#
# Punctuation (TIBETAN_MARK_xxx)
#
# ACIP | Unics UnicodeName | Appearance+Obs
# ---- | ----- --------------- | --------------
# \ | 0F84 HALANTA | Aka: srog med.
# | | App: backlash below.
# | | Sem: Separates syllables that
# | | are actually consonant clusters.
# | | Eqv: Devanagari virama.
#
# SP | 0F0B INTERSYLLABIC_TSHEG | App: A dot aligned with the letter clothline.
# | 0F0C DELIMITER_TSHEG_BSTAR | Sem: Separates syllables (not just words).
# | | The Unicode names are misleading.
# | | 0F0B is morpheme delim (breaking)
# | | 0F0C is syllable delim (non-breaking)
#
# & | 0F85 PALUTA | App: Curly "3" with tail, similar to NYA.
# | | Sem: Sanskrit apostrophe.
# | | Eqv: Sanskrit avagraha.
#
# , | 0F0D SHAD | App: a nail-like vertical stroke.
# | | Sem: Used in pairs to delimit phrases.
# | | Marks end of a section of text
#
# = | 0F0E NYS_SHAD | App: double shad.
# | | Sem: Used after double- or triple-scroll
# | | at the beginning of the text, or as a
# | | full stop. Marks the end of a whole topic.
#
# ` | 0F08 SBRUL_SHAD | App: nail with wings and tilde on top.
# | | Sem: Decorative version of the shad, sometimes
# | | used at the beginning or end of a text
# | | Separates sections of meanings equivalent to
# | | topics and sub-topics.
#
# ; | 0F11 RIN_CHEN_SPUNGS_SHAD | App: nail-like stroke with three dots above
# | | Sem: Decorative version of the shad, sometimes
# | | used at the beginning or end of a text.
# | | Shad which follows a tsheg-bar
# | | that starts a new line.
#
# ^ | | App: like Greek lowercase upsilon, prefixed to syll.
#
# ° | 0F37 NGAS_BZUNG_SGOR_RTAGS | App: small circle under preceding syllable
# | | Sem: Emphasis; used as underlining.
#
# × | | App: small "x" under preceding syllable
#
# % | 0F35 NGAS_BZUNG_NYI_ZLA | App: small circle-on-bowl under preceding syllable
# | | Aem: Honorific; emphasis; used like underlining
#
# / | 0F3C ANG_KHANG_GYON (opn) | App: diagonal brace (rise = open, fall = close)
# | 0F3C ANG_KHANG_GYAS (cls) | Sem: open or close tibetan brace
#
#
# Decorative signs (TIBETAN_MARK_xxx):
#
# ~~ | 0F04+0F05 INITIAL_YIG_MGO_MDUN_MA+ | App: Double-scroll.
# | CLOSING_YIG_MGO_SGAB_MA | Sem: Decorative sign used
# | | at the beginning of texts.
#
# ~~~ | 0F04+0F05² INITIAL_YIG_MGO_MDUN_MA 0F04+ | App: Triple-scroll.
# | 2 × CLOSING_YIG_MGO_SGAB_MA(?) | Sem: Decorative sign used at
# | | the beginning of texts.
#
#
# Formatting codes:
#
# X-Y X and Y are separate letters, side by side.
# X+Y Sanskrit stack of letter X over letter Y.
# ÷ End of line in the original book.
# * Unreadable/missing/untranscribed character.
# @op{..}{..} Control line ("@" must be in column 1).
# {...} Text between braces is an editorial note.
# # Rest of line is a comment.
#
# The following changes to the ACIP code were done by J. Stolfi
# for compatibility with existing scripts:
#
# "~~~" was "#"
# "~~" was "*"
# "=" was ",,"
# "×" was "x"
# "°" was "o"
# "{...}" was "[...]"
# "÷" was a blank line.