# Last edited on 2004-12-30 17:48:14 by stolfi OVERVIEW The input to the SELVA grammar preprocessor consists of four sections: MARKERS zero or more {MARKDEF}s SYMBOLS one or more {SYMDEF}s RULES one or more {RULE}s. Comments start with "#" and extend to the end of the current line. MARKERS SECTION The MARKERS section defines the valid marker types and their value ranges. Each {MARKDEF} has the form MARKDEF => MARKTYPE ":" MARKVAL |.. "\n" MARKTYPE => [A-Z]+ A marker type is identified by one or more uppercase letters: "G", "P", "EO". MARKVAL => [a-z0-9]+ Each marker type has a set of possible values ({MARKVAL}s) which are strings of lowercase letters and/or numbers: "sin", "fem", "1", "p1". SYMBOLS SECTION The SYMBOLS section defines the terminal and non-terminal symbols, and their markers. Each entry has the form SYMDEF => {"*"}? SYMBOL {"(" MARKVAR ,.. ")"}? "\n" The names of the {MARKVAR}s (formal parameters) in the definition of a {SYMBOL} must be all distinct, for documentation purposes; but only their {MARKTYPE}s are significant. The prefix "*" specifies that the symbol is not important and should be "short-circuited" when displaying the parse trees. SYMBOL => NTSYMBOL | TSYMBOL NTSYMBOL => [A-Z][A-Za-z0-9_+]* TSYMBOL => [a-z][a-z0-9_]* By convention, non-terminal symbols start with an upper case letter, and terminal symbols with lowercase. Both classes must be defined in the SYMBOLS section and may have marker parameters. MARKVAR => MARKTYPE MARKSUFF? MARKSUFF => [0-9] The type of a marker variable is determined by the letters in its name (upper-case only): thus "P" and "P2" have {MARKTYPE} = "P". Numeric suffixes allow more than one marker variable of the same type to appear in a single rule. RULES Grammar rules have the form RULE => NTSYMBOL "{" RTAG "}" {"(" MARKER,.. ")"}? "->" "\n" FACTOR "\n".. "." "\n" Each factor of the right-hand side must be on a separate line. The rule ends with a period on a line by itself. RTAG => [a-z][a-z0-9]* Each rule for the same {NTSYMBOL} must have a distinct {RTAG}. MARKER => MARKVAR { ":" MARKVAL |.. }? | MARKVAL The types of the {MARKER} parameters (values and/or variables) must agree with the {MARKTYPE}s listed in the SYMBOLS section for the {NTSYMBOL}. FACTOR => { FUNCLABEL ":"} KERNEL FUNCLABEL => [A-Z][A-Za-z0-9_+]* The {FUNCLABEL} is used to label the factors according to their syntactic function. An occurrence of a factor like "F:X" automatically generates a rule "F:X -> X" in the expanded grammar. KERNEL => ATOM "?" => ATOM "^" { MARKVAR | [01] } => ATOM The "?" notation means that the {KERNEL} is either {ATOM} or empty. In the "^" notation, the exponent (the {MARKVAL}, or all possible values of the {MARKVAR}) must be either "0" (meaning that the {KERNEL} is empty) or "1" (meaning that the {KERNEL} is just the {ATOM}). ATOM => NTSYMBOL {"{" RTAG ,.. "}"}? {"(" MARKER,.. ")"}? => TSYMBOL {"(" MARKER,.. ")"}? The types of the markers (values or variables) must agree with the {MARKTYPE}s listed in the symbol's {SYMDEF}. The {RTAG}s must be all distinct, and each of them must match the {RTAG} of a {RULE} for the {NTSYMBOL}.