The Holy Quran - consonant-only text in Arabic encoding
Last edited on 2004-01-27 03:13:23 by stolfi

FETCHING

  Fetched the consonant-only Quran file "quran.txt" from somewhere 
  on the net.  It appears to be in ISO-8859-6 encoding, possibly
  the subset u0600-u065F mapped to single-byte x60 
  

SOURCE CLEANUP
  
    set impb = "/home/staff/stolfi/IMPORT/texts/arabic/Quran-Bytes"
    cat ${impb}/quran.txt \
      | tr -d '\014\015\032' \
      | cat -s \
      >> byte.txt

  The file main.raw is simply byte.txt with some initial comments and
  a more regular format, with directives @chapter{CHAPNUM}{TITLE} and
  @verse{CHAPNUM}{VERSENUM}.
  
  The file main.org has official @-directives (@unit, etc.),
  and converted to JSAR (through iso-8859-6-to-hexbytes and hexbytes-to-jsar)