#! /usr/bin/python3

import sys, re
import html_gen as h
from process_funcs import bash
import html_report_funcs as hr

last_edit = "Last edited on 2026-02-28 04:25:43 by stolfi"

def main():

  global last_edit
  
  title = "The Red Rooster recipe of the Shennong Bencaojing in various languages"
  st = h.new_doc(title, "#eeffdd", text_width = 1600)
  thumb_width = 80*st['text_width']//100 # Width for image thumbnails.
  
  h.section(st, 2, "Summary")

  h.parags(st, """The Starred Paragraphs section (SPS) of the Voynich Manuscript (VMS) has been identified as a transcription or translation of the Shennong Bencaojing (SBJ), a classic Chinese <i>materia medica</i> (list of remedies and their indications).  The identification is based on the close match between the structure of the longest entry of the SBJ ("Red rooster") with the longest paragraph of the SPS (folio f105v, lines 32-38). Namely, both can be parsed into eight sub-entries entries of matching sizes, seven of them marked by the "keyword" 主 (meaning roughly "main uses") in the SBJ version and the similar-looking words @daiin, @dair, and @laiin in the SPS version.
  
  The structure of the SPS words suggest that its text is a phonetic transcription of the SBJ, read and possibly translated into some monosyllabic language, which has not yet been identified.  There are hundreds of possible candidates, including all the so-called "dialects" of Chinese, as well as Vietnamese, Lao, Thai, Burmese, Tibetan, and other languages that were spoken in East Asia around 1400 (the presumed date of composition of the VMS).
  
  This webpage presents versions of the "Red rooster" recipe into some of those monosyllabic East Asian languages, parsed into the eight sub-entries and showing the matching keywords.  The latter are phonetic transcriptions of the modern pronunciation of the original Chinese text or of its translations.  This page also shows the English translation of the Rooster recipe, as well as he matching SPS paragraph.
  
    A significant obstacle to the identification of the language is that all transcriptions of the VMS contain a significant number of errors, due to our current ignorance of the "true" Voynichese alphabet, the large number of deformed glyphs with uncertain identification, and probably many errors by the Scribe and by later owners who tried to restore faded parts of the text.   Even the division of the text into words and paragraphs is uncertain, because inter-word spaces are sometimes hard to distinguish from inter-glyph spaces. 
    
    Another major obstacle is that all those languages have substantially changed their pronunciation since the 1400s.  The changes are particularly large for languages with ideographic script (like all Chinese "dialects" and Vietnamese) or with etymological rather than phonetic spelling (like Thai and Tibetan).
    
    A third major obstacle is that the original Chinese text of the SBJ as it circulated in the 1400 has been lost. The existing versions are reconstructions produced by scholars after the 1500s, from fragments and quotes in other books. And finally there is evidence that the SPS was created from an abridged version of the SBJ that omitted some fields of the entries (like taste, alternate names, and place of origin of the remedies), and may have have followed a different ordering of the entries and of the disease lists withing the entries.
  
  <b>Important disclaimer:</b> My knowledge of the East Asian languages shown below is absolutely zero. I relied completely on Google AI (GAI) for the translations of the Chinese text into those languages and into English. I bet that there are many errors.  GAI claimed support of scholarly sources for all translations; however, in spite of my ignorance, I caught several gross errors, such as mixing different languages in the same translation.  Therefore, the texts provided by GAI are included only to give an idea, hopefully not too wrong, of what the correct translations may look like. 
  
  On the other hand, I did not use GAI for any analysis or processing of the VMS text. The parsing of paragraph f105v.32-38 and the establishment of its correspondence to the "Red rooster" entry of the SBJ were done entirely by hand, without the assistance of GAI or any other LLM/AI system. """)

  h.section(st, 2, "The SBJ entry (Chinese)")

  h.parags(st, """
    First, my reference version of the Red Rooster recipe in Chinese characters (hanzi):""")
  
  h.append_preformatted(st, """
80 chars
丹雄鸡主治女子崩中漏下赤白沃补虚温中止血通神杀毒辟不祥头主杀鬼肪主治耳聋肠主治遗溺肶胵裹黄皮主治泄利屎白主治消渴伤寒寒热翮羽主下血闭鸡子除热火疮痫痓可作虎魄神物
""", ind = 4, centered = False)

  h.parags(st, """
The entry is actually eight mostly separate sub-entries (A)-(H), each for a part of the red rooster.  Each sub-entry has the part's name, followed by 主治 zhǔ zhì ("main uses", "indications", etc.) or just 主 zhǔ ("mainly for", "treats", etc.), and a list of diseases or uses of the part.  Except that the last sub-entry (H), for the eggs (!) has no key.  Here is the same text, with modern punctuation, parsed into its eight sub-entries:""")
  h.append_preformatted(st, """
80 chars
(A)　　　　丹雄鸡：［主治］女子崩中漏下，赤白沃，补虚，温中，止血，通神，杀毒，辟不祥。
(B)　　　　　　头：［主］　杀鬼。
(C)　　　　　　肪：［主治］耳聋。
(D)　　　　　　肠：［主治］遗溺。
(E)　　肶胵裹黄皮：［主治］泄利。
(F)　　　　　屎白：［主治］消渴，伤寒寒热。
(G)　　　　　翮羽：［主］　下血闭。
(H)　　　　　鸡子：［］　　除热火疮，痫痓，可作虎魄神物。
""", ind = 4, centered = False)

  h.parags(st, """
The original SBJ, as it may have existed in the 1400s, would have had no punctuation.  The text above was taken from two digital files that I obtained from the internet, which had some differences but agreed on that recipe (which apparently was an important and often quoted one).  

That text above lacks the "nature" field ("sweet and slightly warm"［味］甘微温) that, in the internet version, came right after the recipe name.  It also lacks the veterinary use ("white meat fattens pigs", 鸡白蠹，肥猪) and the "habitat" field ("grows in plains and marshes",［生］平泽) that were at the end of the entry in the internet version. I removed those fields because they seem to have been omitted in the VMS version.  Thus the original from which the SPS was transcribed seems to be an abridged edition of the SBJ, as would be owned by a practicing doctor, rather than a scholarly version.  

Google AI claims that some versions had an extra 主 ("[also] mainly for") inserted between 止血 ("stops bleeding") and 通神 ("communicate with spirits").  I considered inserting one in my SBJ file, since the SPS parag has an extra @dair (as a suffix) in that sub-entry.  However it does not seem to be quite in the right place, so I opted against that correction. """)

  h.section(st, 2, "Translations")
  
  h.section(st, 3, "English (literal)")

  h.parags(st, """
Here is a rather literal translation of the Rooster entry in English, with modern punctuation.""")
  
  h.append_preformatted(st, """  
(A)                   Red male chicken: [main uses] women's mid-collapse and leaking downward, red and white discharges, supplementing vacuity, warming the center, stopping blood, communicating with the spirit, killing toxins, and warding off the unpropitious.
(B)                               Head: [mainly]    killing ghosts.
(C)                                Fat: [main uses] ear deafness.
(D)                         Intestines: [main uses] involuntary urination.
(E)         Gizzard lining yellow skin: [main uses] diarrhea and dysentery.
(F)                    Excrement white: [main uses] wasting-thirst, damage by cold with chills and fever.
(G)                     Quill feathers: [mainly]    discharge blood-closure.
(H)                                Egg: []          eliminates heat, fire-sores, and spasms; can be made into an amber-like divine substance.
""", ind = 4, centered = False)

  h.section(st, 3, "English (free)")

  h.parags(st, """
    Here is a more meaningful translation:""")
  
  h.append_preformatted(st, """  
(A)                  Red rooster: [main uses] uterine bleeding and chronic vaginal discharge, tonifies exhaustion, warms the digestive system, stops bleeding, enhances mental clarity, detoxifies, and averts misfortune.
(B)                         Head: [mainly] exorcises malevolent spirits.
(C)                          Fat: [main uses] hearing loss.
(D)                   Intestines: [main uses] urinary incontinence.
(E) Yellow lining of the gizzard: [main uses] acute and chronic diarrhea.
(F)      White part of droppings: [main uses] diabetes-like thirst, and alternating chills and fever from acute infections.
(G)               Quill feathers: [mainly] resolves menstrual blockages and blood stasis.
(H)                          Egg: []       clears inflammatory heat, treats burns, and relieves convulsions; can be transformed alchemically into an amber-like material.
""", ind = 4, centered = False)

  h.section(st, 3, "Mandarin (pinyin)")

  h.parags(st, """
    Here is a reading of that text into Mandarin in the pinyin phonetic romanization. """)
  
  h.append_preformatted(st, """  
80 words
(A)        dān xióng jī [zhǔ zhì] nǚ zǐ bēng zhōng lòu xià chì bái wò bǔ xū wēn zhōng zhǐ xiě tōng shén shā dú pì bù xiáng                
(B)                 tóu [zhǔ]     shā guǐ                                             
(C)                fáng [zhǔ zhì] ěr lóng                                             
(D)               cháng [zhǔ zhì] yí nì                                               
(E) pí chǐ guǒ huáng pí [zhǔ zhì] xiè lì                                              
(F)             shǐ bái [zhǔ zhì] xiāo kě shāng hán hán rè                            
(G)               hé yǔ [zhǔ]     xià xuè bì                                          
(H)               jī zǐ []        chú rè huǒ chuāng xián zhì kě zuò hǔ pò shén wù     
""", ind = 4, centered = False)

  h.section(st, 3, "Voynichese")

  h.parags(st, """
    And here is the identified version in the SPS (parag f105v.32-38), from my fresh transcription, treating commas like word spaces.
The parsing assumes that the @daiin, as a word or suffix, means either 主治 zhǔ zhì ("main uses") or just 主 zhǔ ("mainly for").  There are five occurrences of that word, whose relative positions match five of the seven occurrences of the Chinese keywords.  There is also one occurrence each of @laiin and @dair as words, that match the other two Chinese keywords. Both of these words are very close to @daiin in "ink distance".  (And there is also one occurrence of @dair as a suffix, but it does not match any 主 in the Chinese text.)

In the following layout, the breaks between the list of diseases of a sub-entry and and the name of the part of the next sub-entry are only approximate,
based on rough word counts.  It is possible that the true breaks are within words rather than between them.""")
  
  h.append_preformatted(st, """ 
73 words  
(A)                  poar keeo [daiin] qoaiin ar acphhey qoeedeody qokaiin qotedaiin apo raiin apy lsheody taiin oteey oteeo ol otaiin okeey qokaiin or aiiin al dal 
(B)                      sheeo [daiin] chsd                                                                            
(C)                    qokeeey [daiin] okaiin otaiin                                                                   
(D)                         che[daiin] olkal lkl dain doee okcheeo                                                     
(E) ltaiin otcheedy chor aiin o[daiin] chedy otaiin                                                                    
(F)                  al kaishd [laiin] sheod okeeody qoaiin ytaiin otaiin chdal                                        
(G)                 dy daiil ch[daiin] ockhhy yshey ckhy                                                               
(H)                 sheo qoeeo []      lkaiin chs okol tchdy sheeey okaiin ar aildy cheody oaiiin ain okshey           
""", ind = 4, centered = False)

  h.section(st, 3, "Cantonese")

  h.parags(st, """
Below is the reading of the Chinese 80-character sequence in phonetic Cantonese, the language (or "dialect", in the Chinese view) spoken in Shanghai and Hongkong.  The grammar of Cantonese is close enough to that of Mandarin that the reading is one-to-one on the syllables.""")
  
  h.append_preformatted(st, """
80 words
(A)         daan1 hung4 gai1 [zyu2 zi6] neoi5 zi2 bang1 zung1 lau6 haa6 cek3 baak6 juk6 bou2 heoi1 wan1 zung1 zi2 hyut3 tung1 san4 saat3 duk6 pik1 bat1 coeng4                      
(B)                     tau4 [zyu2]     saat3 gwai2                                                           
(C)                    fong1 [zyu2 zi6] ji5 lung4                                                             
(D)                   coeng4 [zyu2 zi6] wai6 niu6                                                             
(E) bei2 ci2 gwo2 wong4 pei4 [zyu2 zi6] sit3 lei6                                                             
(F)                si2 baak6 [zyu2 zi6] siu1 hot3 soeng1 hon4 hon4 jit6                                       
(G)               gaak3 jyu5 [zyu2]     haa6 hyut3 bai3                                                       
(H)                 gai1 zi2 []         ceoi4 jit6 fo2 cong1 haan4 zi3 ho2 zok3 fu2 paak3 san4 mat6           
""", ind = 4, centered = False)

  h.section(st, 3, "Bai (Jianchuan)")

  h.parags(st, """This is the reading of that same Chinese character sequence in phonetic Jianchuan, one of the languages ("Chinese dialects") spoken by the Bai people in the Chinese province of Yunnan.  Like Cantonese, its grammar is close enough to that of Mandarin that the reading is one-to-one on the syllables.""")
  
  h.append_preformatted(st, """
80 words
(A)              tã33 xuŋ44 kai33 [tsu33 tsz42] ɲy33 tsz33 pɑ̃33 tsoŋ33 lo42 xɑ42 tshe44 pɛ44 jo44 po33 xy33 uɛ̃33 tsoŋ33 tsz33 xy44 thoŋ33 xɛ̃44 sɑ44 to44 phi44 pɛ44 tsiɔ̃44
(B)                         tho44 [tsu33]       sɑ44 kuɛ33                                                              
(C)                          fɑ̃44 [tsu33 tsz42] ji44 loŋ44                                                              
(D)                         tso44 [tsu33 tsz42] ji44 ni42                                                               
(E) pi33 tshz33 kuo33 xuɑ̃44 phi44 [tsu33 tsz42] siɛ42 li42                                                              
(F)                     sz33 pɛ44 [tsu33 tsz42] siɔ33 kho42 xuã33 xon44 xon44 ye42                                      
(G)                     kɛ44 jy33 [tsu33]       xɑ42 xy44 pi42                                                          
(H)                   kai33 tsz33 []            tsho44 tsz42 fo33 tsho33 xɛ̃33 tshz42 kho33 tsuo44 xu33 pho44 sɛ̃44 mu44  
""", ind = 4, centered = False)

  h.section(st, 3, "Vietnamese")

  h.parags(st, """And here is the translation of that entry into (mostly modern) Vietnamese, in its standard script (Chữ Quốc ngữ).  While Vietnamese is monosyllabic and tonal, it does not belong to the same family as Mandarin and other "Chinese dialects".  Linguists place it in the Austroasiatic family, together with Khmer (Cambodian) other languages of Southeast Asia.  Its grammar and lexicon are very different.  Still the translation turns out to have about the same number of syllables:""")
  
  h.append_preformatted(st, """
81 words
(A)                   đan hùng kê [chủ trị] nữ tử băng trung lậu hạ xích bạch oắc bổ hư ôn trung chỉ huyết thông thần sát độc tịch bất tường                     
(B)                           đầu [chủ]     sát quỷ                                                         
(C)                         phóng [chủ trị] nhĩ lung                                                        
(D)                        trường [chủ trị] di nịch                                                         
(E)                    kê nội kim [chủ trị] tiết lị                                                         
(F)                   kê thỉ bạch [chủ trị] tiêu khát thương hàn hàn nhiệt                                  
(G)                       cách vũ [chủ]     hạ huyết bế                                                     
(H)                         kê tử []        trừ nhiệt hỏa sang nhàn chí có thể chế thành hổ phách thần vật  
""", ind = 4, centered = False)

  h.section(st, 3, "Thai")

  h.parags(st, """The Thai language too is monosyllabic and tonal, but it is placed by linguists in a third separate family, Kra-Dai, together with Lao. The Thai translation of the Rooster entry, in phonetic transcription, is shown below.  It is only ~25% longer (in syllable count) than the original Chinese. The language has 5 tones, indicated by a digit suffix; unmarked syllables have tone 1 by default.""")
  
  h.append_preformatted(st, """
102 words
(A)            kai thuek2 [chu tri] su4 tri baang3 chung lao4 ha4 ra du khao5 daeng lai mai3 yut2 bam rung sang5 khan un trong ham3 lueat4 thong shen chit2 kha3 phit4 rai3 khap2 sing2 ap2 mong khon
(B)                  hua5 [chu]     prap2 phi
(C)                   man [chu tri] hu5 tuek2                                                                                             
(D)                  sai5 [chu tri] pa sa5 wa4 rin mai3 yut2                                                                            
(E) phang phuet3 nai kuen [chu tri] thong3 sia5 thong3 ruan                                                                              
(F)       khi3 kai2 khao5 [chu tri] rok3 hiu5 nam4 khai wat2 nao5 ron4                                                                   
(G)             khon pik2 [chu]     la lai lueat4 khang4                                                                                
(H)            khai2 kai2 []        dap2 phit4 ron4 phu phong fai kan chak2 kak2 krieo tham pen khong5 sak2 sit2 am phan
""", ind = 4, centered = False)

  h.section(st, 3, "Tibetan")

  h.parags(st, """The Tibetan language is placed by linguists in a fourth separate family, Tibeto-Burman, a subfamily of the Sino-Tibetan family, and "sister" of the Sinitic subfamily that comprises the "Chinese dialects".  In spite of its remote relation to Chinese, Tibetan has a very different grammar, including a subject-object-verb (SOV) sentence structure, instead of subject-verb-object (SVO) like the previous languages.  Thus a proper translation of the SBJ Rooster entry, while it has only ~30% more syllables, does not match the SPS entry, since the keys phrases "main uses" and "mainly for" must come at the end of each sub-entry.  Tibetan has two tones, high (here unmarked) and low (here marked with a grave accent).""")
  
  h.append_preformatted(st, """
106 words
(A) byà po dmàr po: mò nèy khrag sab dàng kàr màr gè po khog pa drò wà khrag cho wa sem dug tra mì shi pa dok pà  [sò wa] 
(B)                                                                                 go: bòe gyal po dù po sil wà  [sò wa] 
(C)                                                                                          tshi: nà mà thub pà [sel wa] 
(D)                                                                                        gyù: ti wà mà thub pà [sel wa] 
(E)                                                                              nàng sha sèr po: khog pa shè wà [sel wa] 
(F)                                                                        khyì lùd kar: gòng nèy dàng rìm tshàd [sel wa] 
(G)                                                                                           shog pa: khrag gag [sel wa] 
(H)                   go ngay: tsad pa mè chù dàng sà wa sel wà pò shè tà bù shìng tù khyàd par chèn gyì ngòe po [shò wa] 
""", ind = 4, centered = False)

  h.section(st, 3, "Burmese")

  h.parags(st, """Burmese, the main language of Myanmar, is placed by linguists in the same sub-family as Tibetan.  Like it, it has an SOV sentence structure, which requires the equivalent of the "uses" key to be at the end of each sub-entry.  A translation into modern Burmese would be quite prolix, but GAI claims that the following more "telegraphic" translation, matching the style of the Chinese SBJ, would be current in the 1400s. Burmese has 4 tones; unmarked syllables below are tone 3.""")
  
  h.append_preformatted(st, """
110 words
(A) kyet4 ni1 pho1: mi4 ma2 tha1 ein1 thway1 sin1 jin2 a phyu1 sin1 jin2 a1 thoke3 thoke3, a pu1 un1 jin2, thway1 teit4, seit4 wi1 nyin2 kyi1 lin2, a seik4 a tauk4 [ku1 tha1]
(B)                                                                                                                                   gaung3: nat4 soe3 hneim1 nin2 [a1 thone3]
(C)                                                                                                                                         a1 see2: nar1 lay2 jin2 [ku1 tha1]
(D)                                                                                                                               u1 ma2: see1 ma htein1 nine2 jin2 [ku1 tha1]
(E)                                                                                                          kyet4 paung3 at twin2 a mhyay2: wan1 lyaw2 wan1 koite4 [ku1 tha1]
(F)                                                                                                     kyet4 chay1 phyu1: yay1 ngat4 raw1 ga2 hnin3 a phyar1 tone3 [ku1 tha1]
(G)                                                                                                                           taung2 pan2: thway1 peit4 jin2 thsin3 [a1 thone3]
(H)                                                                              kyet4 u3: a pu1 nar1 a tet4 a phyar1 phal1 shar2 pa1 yin2 htoo1 char3 thee2 a yar2 [pyu2 louk4]
""", ind = 4, centered = False)

  h.parags(st, """""")

  h.output_doc(st, sys.stdout, 99, last_edit)
  return 0
  # ----------------------------------------------------------------------

main()