# Last edited on 2025-09-12 08:30:55 by stolfi # 082 Character and digraph frequencies in formulaic text SUMMARY Herbal texts usually follow a formula, e.g. name of plant, list of diseases, mode of preparation, mode of application, etc. The syntax is usually simplified and function words are often omitted. This structure radically affects the frequency of words on the first and last lnes, and hence the frequency of characters and character paits. The goal of this note is to demonstrate and investigate this effect. DATA We use two versions -- Latin and English -- of the same "alchemical herbal" text, provided by Marco Ponzi, namely "latn/ahl" and "engl/ahe" ln -s ../../langbank ln -s ../.. work ln -s work/compute_freqs.gawk