# Last edited on 2026-03-03 11:57:41 by stolfi 096 Investigatin similarities between languages A and B In this note we generate tables of "root" counts and frequencies counts for each section ("hea", "heb", "zod", etc.) and each text type (parags, labels, etc.), but with fractional frequencies that take into accound dubious word spaces ",". The roots are words reduced to equivalence classes by certain character oeprations. And we also create lists of words that map to the same root. Finally we compare the frequencies of words and roots in the parags text of Herbal-A and Herbal-B. SETUP ln -s ../.. work ln -s work/tabulate_frac_counts.py ln -s work/compute_freqs.gawk ln -s work/combine_counts.gawk ln -s work/root_from_word_funcs.gawk ln -s work/ivt_loc_to_type.tbl ln -s work/error_funcs.gawk ln -s work/error_funcs.py ln -s work/process_funcs.py ln -s ../074/st_files MAIN OPERATIONS: Do it all: do_note_096.sh 25e1 #0 sec type lines JS RZ words roots #1 --- ------ ------ ------ ------ ------ ------ | bio glyphs 6 6 0 4 2 | cos glyphs 6 6 0 6 2 | hea glyphs 26 26 0 10 3 | unk glyphs 43 43 0 16 5 | bio labels 115 112 3 128 19 | cos labels 291 291 0 348 50 | hea labels 3 3 0 5 4 | pha labels 235 233 2 261 41 | zod labels 299 299 0 324 34 | bio parags 740 294 446 1564 71 | cos parags 174 174 0 713 45 | hea parags 1209 967 242 2993 117 | heb parags 373 265 108 1546 82 | pha parags 223 158 65 1092 73 | str parags 1082 1082 0 3866 168 | unk parags 305 182 123 1449 81 | cos radios 95 95 0 282 36 | cos titles 32 32 0 107 32 | hea titles 17 17 0 52 18 | heb titles 5 5 0 16 6 | pha titles 1 1 0 8 5 | str titles 3 3 0 13 4 | unk titles 25 25 0 48 12 | cos trings 43 43 0 906 86 | zod trings 36 36 0 736 73