# Last edited on 2026-03-03 11:57:41 by stolfi
096 Investigatin similarities between languages A and B
 
  In this note we generate tables of "root" counts and frequencies counts for each
  section ("hea", "heb", "zod", etc.) and each text type (parags,
  labels, etc.), but with fractional frequencies that take into accound
  dubious word spaces ",".  The roots are words reduced
  to equivalence classes by certain character oeprations.  And we also
  create lists of words that map to the same root.
  
  Finally we compare the frequencies of words and roots in the parags
  text of Herbal-A and Herbal-B.

SETUP

    ln -s ../.. work
    ln -s work/tabulate_frac_counts.py
    ln -s work/compute_freqs.gawk
    ln -s work/combine_counts.gawk
    ln -s work/root_from_word_funcs.gawk

    ln -s work/ivt_loc_to_type.tbl

    ln -s work/error_funcs.gawk
    ln -s work/error_funcs.py
    ln -s work/process_funcs.py
  
    ln -s ../074/st_files

MAIN OPERATIONS:

  Do it all:
  
    do_note_096.sh 25e1
  
    #0 sec type       lines     JS     RZ   words  roots
    #1 --- ------    ------ ------ ------  ------ ------
                                                        
    |  bio glyphs         6      6      0       4      2
    |  cos glyphs         6      6      0       6      2
    |  hea glyphs        26     26      0      10      3
    |  unk glyphs        43     43      0      16      5
                                                        
    |  bio labels       115    112      3     128     19
    |  cos labels       291    291      0     348     50
    |  hea labels         3      3      0       5      4
    |  pha labels       235    233      2     261     41
    |  zod labels       299    299      0     324     34
                                                        
    |  bio parags       740    294    446    1564     71
    |  cos parags       174    174      0     713     45
    |  hea parags      1209    967    242    2993    117
    |  heb parags       373    265    108    1546     82
    |  pha parags       223    158     65    1092     73
    |  str parags      1082   1082      0    3866    168
    |  unk parags       305    182    123    1449     81
                                                        
    |  cos radios        95     95      0     282     36
                                                        
    |  cos titles        32     32      0     107     32
    |  hea titles        17     17      0      52     18
    |  heb titles         5      5      0      16      6
    |  pha titles         1      1      0       8      5
    |  str titles         3      3      0      13      4
    |  unk titles        25     25      0      48     12
                                                        
    |  cos trings        43     43      0     906     86
    |  zod trings        36     36      0     736     73