# Last edited on 2012-02-02 01:49:17 by stolfilocal

FORMAT OF THE ".src" FILES

  The file contains a mix of "text" and "directives".
  A "stand-alone" directive must appear on a line by itself.
  An "embedded" directive may be inserted in text or in #-comments.

  Comments:

    (blank line)      = treated like a comment
    # ...             = #-comment (stand-alone)
    {...}             = {}-comment (embedded)

    A #-comment should use constructs @{TEXT} 
    to mark parts of the comment that are in the 
    target language.  In this way, encoding changes
    can be be applied to those parts of comments, too.

  Sectioning directives (stand-alone):

    @begin {TAG}        = start of sub-section with tag TAG 
    @end {TAG}          = end of sub-section with tag TAG 
    @section LEV {TAG}  = start of level-LEV section with tag TAG
    
    The TAG must not include blanks or braces. Nested sections
    must have distinct TAGs. An "@end" may be omitted if it comes
    before another "@end". The directive "@section LEV {TAG}" is
    equivalent to (1) "@end" all open sections with level
    greater than or equal to LEV, then (2) "@begin TAG".
  
  Include directives (stand-alone):

    @include {FILE}        = insert contents of FILE here

  Charset specs (stand-alone):

    @chars alpha {CHARS}   = a word is `alpha' iff it uses these chars only.
    @chars symbol {CHARS}  = any of these chars turns a word into a `symbol'.
    @chars punct {CHARS}   = each of these chars is a word by itself.
    @chars blank {CHARS}   = these chars are word separators.
    @chars null {CHARS}    = these chars should be deleted.
    @chars invalid {CHARS} = these chars are not allowed (default).

    The CHARS must be ASCII SP or printable ISO-Latin-1 chars.
    ASCII SP is always implicitly included in the "blank" chars.
    The characters "#@{}" cannot be included in any chars.
    
    Characters "*", "÷" and "=" should be reserved for invalid
    (or unreadable) characters, significant line breaks in the 
    original text (e.g. verse separators), and paragraph-like
    breaks (e.g. stanza separators), respectively.

  Word mapping directives (match /^[ ]*[@]/): 
    
    @wordmap alpha {FILE}   = words listed in FILE are "alpha"
    @wordmap symbol {FILE}  = words listed in FILE are "symbol"
    @wordmap punct {FILE}   = words listed in FILE are "punct"
    @wordmap blank {FILE}   = words listed in FILE should be deleted
    @wordmap null {FILE}    = words listed in FILE should be deleted
    @wordmap invalid {FILE} = words listed in FILE are not allowed

    The FILE must contain a list of words or word pairs, one
    per line.
    
  Word type directives (may be embedded):

    @TYPE{STRING}
    
    where TYPE is one of "a" ("alpha"), "s" ("symbol"), "p" ("punct"), 
    "b" ("blank"), or "n" ("null").  The TEXT cannot contain any of the 
    characters "#@{}".  In the TEXT, SP is the only special character
    (word separator) and the @wordmap directives do not apply.
    
  Every non-blank line that is neither a stand-alone #-comment
  nor an @-directive is parsed from left to right into a sequence
  of zero or more colored words, delimited by spaces, as follows:
       
      0. Let W, the current word, be empty. Insert spaces at both
      ends of the line. Repeat 1--9 below until the line is
      exhausted.
         
      1. If the next character belongs to the "null" chars, delete it.

      2. If the next thing on the line is an {}-comment, delete it.

      3. If it is an @n{...} directive, delete it.
         
      4. If it is an "alpha" "or "symbol" character, append it to
      the current word W.
         
      5. If W is not empty, color it according to the "@chars"
      directives and the word tables, output it, and reset W to empty.
         
      6. If the next thing is a @b{...} directive, delete it.
         
      7. If it is a "punct" character, make it into a word by
      itself, color it "p", map it through the word tables, and
      output it.
           
      8. If it is a @p{}, @s{} or @a{} directive,
      output the blank-delimited words in the argument, all
      colored "p", "s" or "a", respectively.
         
      9. If it is none of the above, the input is invalid.
  
  In step 5, the word W is colored "alpha" iff it uses only
  characters from the "alpha" chars. Otherwise it must consist
  of one or more "symbol" characters possibly mixed with "alpha",
  and the whole word is colored "symbol.  
  
  In steps 5 and 7, before the word W is written out
  it is looked up in the word tables. If the "alpha"
  table contains a pair "W X" then W is replaced by X
  and re-colored "alpha".  Similarly for the other tables.