#! /usr/bin/python3
# Last edited on 2025-10-29 15:44:41 by stolfi
import sys, re
import html_gen as h
def main():
wr = sys.stdout
title = "Separating inks and paints
by Bayesian classification:
Principles and basic test"
st = h.preamble(wr, title, "960px")
As a first test of the idea, we use the following clip of page f79r from the "Biological" or "Balneological" section, cropped crop 800x1024+63+2385. It covers the green-painted pool at the SW corner of the page. (Click on thumbnails for the full-size images.)
For this first test, we consider three provinces, pairwise disjoint regions of the page with distinct natures, which should have relatively homogeneous appearance:
parch: all the areas with blank parchment.
green: all the areas painted green.
dkink: all the text and outlines in dark brown ink.
For reasons discussed below, we cannot expect each province to have a single uniform color, but rather a characteritic gamut of colors. So, from each of these provinces, we manually pich a sample of pixels which we believe to be representative of its color gamut. These samples are conveniently specified by the black-and white masks. For clarity, the sampled pixels of the input image are also shown, below each mask.
| parch.png | green.png | dkink.png |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Each color taken from the input image can be visualized as a point of the RGB color cube, whose corners are the eight "basic" RGB colors: (0,0,0) for black, (1,1,1) for white, (1,0,0) for red, (1,1,0) for yellow, etc.. A sample of colors is a cloud of points in this cube.
The images below are three different views of the three clouds corresponding to the three samples above. The colors of the points in these pictures are not the colors themselves; they are arbitrarily assigned to identify the samples -- red for the parch sample, blue for the green sample, and cyan for the dkink sample. (For clarity, only a subset of ~1000 points from each cloud are plotted.)
![]() |
![]() |
![]() |
It can be seen in these snapshots that the clouds are distinctly not spherical. The causes for the spread include:
The vellum surface has bumps and dents on a scale of up to a few pixels. The oblique illumination creates shadows and bright spots that change the brightness of the surface color. That is probably the main reason for the pronounced elongation of the clouds towards the black point.
There is all sort of dirt over the surface, including over the painted and inked areas. The color at each pixel is therefore a mix of the "clean" surface color and some modest but random amount of the dirt's color.
The green paint and the writing/drawing ink are applied with varying thickness. They both seem to behave optically as a solid pigment suspension paint. Therefor the color of a pixel in a painted or inked areas is a mix of the paint/ink color and of some amount of the parch color.
Because of the limited resolution of the images, pixels near the edges of ink strokes get contaminated by the color of the adjacent blank parch.
The camera imaging hardware and principally the JPEG compression algorithm distort the colors of individual pixels in (effectively) random amounts.
Parts of the text and drawing were probably retraced or retouched in one or more occasions, with inks that seem to have somewhat differenr compositions and colors.
The way that ink is deposited by the pen naturally creates darker and lighter areas along and across each stroke, which seem to have slightly different hues.
Those point clouds were approximated by trivariate Gaussian probability density functions (PDFs). Think of each distribution as a fuzzy ellipsoid with three unequal axes, with generic position and orientation. These, in particular, have the longest axis pointing roughly (but not exactly) towards the black point (0,0,0).
Each distribution is a mathematical model that gives the probability Pr(color(p)=C | p∈S) that a pixel p will have color C, assuming that it lies on province S (parch, green, or dkink) from which the corresponding set of color samples was taken.
What we want instead is Pr(p∈S | color(p)=C), which is the probability that a pixel p belongs to province S, given that its color is C. For that we use Bayes's formula:
where Pr(p∈S) is the prior probabiliy of pixel p belonging to province S -- that is, the probability we assign to that fact before we are given its color C; and K is a constant such that the sum of the left-hand side over all provinces S is equal to 1.
To use this formula we must consider one extra province, OTHER, that is all parts of the input image that are not included in any of the provinces of interest.
For each prior probability Pr(p∈S) we could use the fraction of the area ofthe image that belongs to province S, if that number is known. In practice it is not available, because it depend on assigning each pixel to one of the provinces -- which precisely the goal of this analysis. Thus we set Pr(p∈OTHER) to some arbitray value P0, and for each chosen province S we set Pr(p∈S) to (1-P0)/m where m = 3 is the number of chosen provinces.
The following images show the result of this analysis. For each of the tree provinces plus OTHER, we get a grayscale image whose value at some pixel p is the probability of p belonging tho that province, based on its color. To make the images more intuitive, the image for parch is shown as computed (probability 1 = white) while the others are inverted (probability 1 = black).
| parch | green | dkink | OTHER |
![]() |
![]() |
![]() |
![]() |
The results of this analysis are mostly expected, with a few exceptions. Let's look closely at the OTHER probability map:

(A) These areas, which are parchment without any paint or ink, are not classified as parch because they are affected by the green paint on the verso bleeding through the parchment, so that its color is significantly different from the areas selected as samples of the parch province.
(B) and (C) The green-paint province green has a halo of a distinct color that seems to be that same bluish-green component of the paint bleeding sideways all around the edge of the painted area.
(D) These pixels are the red paint applied to the cheeks and lips of the nymph.
(E) These pixels are the light blue paint applied to the water stream.
(F) These pixels are the light yellow-pink paint applied to the hair of this and many other nymphs.
(G) The text on this page was apparently retraced almost entirely at some point. The indicated pixels are a few parts of glyphs that are substantially fainter than the rest of the page and thus may have been spared by the Retracer.
(H) These pixels are inked outlines that were covered with the green paint. In this test analysis they were mostly classified as green, but a few were distinctive enough to be marked OTHER. These are the pixels we are most interested in. We will do better in the next report.
Apart from these specific cases, the OTHER map has scattered pixels along the edges of the ink strokes, both on the text (in the northeast quadrant) and on the drawing (in the water spout at nothwest and its water stream, and on the face and arms of the nymph). The color of these pixels is a mix of the full-ink color and the parchment color, and thus are excluded from both of those two categories.
It seems that the green paint (only) has a bluish green component that can cross the parchment. This bluish-green pigment seems to be used almost alone in some pages, such as f8r. Here (and in the rest of the Bio section) it seems to be mixed with a yellowish or ocher pigment that does not bleed, resulting in a darker and more yellowish green. The McCrone report says that the green pigment (without distinguishing the two) is a "resinate of copper". But that is awfully vague. Could it be that it is not a tempera (guache) paint, but a fatty copper dye dissolved in turpentine?
Last edited on 2025-10-24 05:01:45 by stolfi