Last edited on 2005-01-17 13:57:33 by stolfi
Helena Cristina da Gama Leitão
Institute of Computing, UFF
Jorge Stolfi
Institute of Computing, UNICAMP
The topic of this project is the reconstruction of unknown objects that have been broken or torn into a large number of irregular fragments, as in figure (1)
(1) |
The motivating applications come from archaeology and art conservation, where many kinds of friable objects (pottery and glass vessels, clay tabets, papyri, mural paintings, etc.) often have to be reconstructed from thousands of mixed-up fragments [ka96,fr96]. Other possible applications are paleontology and forensics (bone and shell fragments) and failure analysis (concrete and glass debris).
We are mostly concerned with ``large'' reconstruction problems, where (possibly incomplete) collections of thousands of fragments, have to be reassembled into one or more objects of unknown shape and design. Apart from the logistical aspects of handling and cataloguing the fragments, the hardest step in the reconstruction is the identification of matching pairs---fragments that were adjacent in some original object. Given a list of such pairs, it is relatively easy to put the fragments together (either materially or virtually, as in [ka96].
It turns out that, for many friable materials, a substantial fraction of the matching pairs can be identified, with high confidence, by the fact that their outlines are congruent to within fractions of a millimeter.
(2) |
As figure (2) shows, if we take a sufficiently long segment of a fragment's oultine, the pattern of its sub-millimetric irregularities is a characteristic ``fingerprint'' of the fracture line. If this same pattern is found (complemented) on the boundary another fragment, it is very likely that the two fragments are a true matching pair.
This problem has been adressed by several researchers: both computer theoreticians, concerned with efficient algorithms for contour and surface matching) [po94,wo90], and software engineers concerned with developing usable tools, with convenient user interfaces, usually for solving specific restoration problems [ka96].
We believe that this is an inherently difficult computational problem, so that processing a thousand or more fragments will inevitably take hours of computer time, even with good algorithms. Therefore we have chosen to concentrate on the algorithmic aspects of the problem; specifically, on the development of scalable shape matching techniques, that could be used for datasets with tens of thousand fragments, with as little user intervention as possible. In particular, we do not plan to develop a fancy user interface.
For problem instances with thousands of fragments, direct comparison of all possible pairs and alignments would be far too expensive. To address that problem, we have developed a multiscale comparison heuristic that in principle should cut down the processing cost by a fraction of 100 or so, and still recover most of the identifiable matches among the given segments.
For testing, at present we are using ``artificial'' fragments, ``manufactured'' specifically for this project; but we hope to validate the algorithms on more realistic data sets, representative of the motivating applications cited above.
In the case of flat fragments, such as old paper or mural paintings, we can obtain the outlines by digitizing each fragment with a flatbed document scanner (either directly or through photographs), and then applying a suitable edge detection algorithm.
In the case of ceramic or fossil fragments, the fracture lines are three-dimensional curves, and their extraction will require stereoscopic vision techniques. This is the topic of a separate research project.
In any case, it is important that the contours be determined with sub-millimeter accuracy, since the reliable identification of adjacent fragments depends on details of that magnitude. In particular, each outline must follow the fracture line on a specific side of the object, in spite of color variations, turned edges, etc..
Figure (3) shows the outlines of two fragments from figure (1). Each grid square mesures 50×50 pixels. One can see two stretches of about ~200 pixels, one on each contour, that are congruent within a couple of pixels.
(3) |
The extracted outlines are usually contaminated by ``noise'' due to meterial erosion and digitization errors. Therefore, before comparing the outlines, it is necessary to filter away their high-frequency components, which are the ones most affected by such noise.
Filtering a curve is harder than filtering a time series, since the natural parameter for describing the curve---its length---may shrink substantially and non-uniformly as result of filtering. This shrinking changes the wavelength of the curve components, which invalidates the filtering. Ideally, the curve should be parametrized by its length after filtering, and the filtering should be based on this parametrization.
In order to break this vicious circle, we use an iterative process, loosely based on the idea of curvature diffusion [mm92]. Figure (4) shows a detail of the first outline from figure (3), before and after filtering.
(4) |
The next step is to identify pairs (C',C'') of outlines that have long and approximately congruent stretches.
In order to make this task independent of fragment rotation and translation, we use the standard technique of representing each outline by the graph of its local curvature (and torsion, in the case of three-dimensional curves) as a function of arc length [wo90].
When we discretize the curvature into a finite set of ``symbols'' (curvature values), two curves with similar shapes will be mapped into similar strings of symbols. Thus we are left with the problem of finding all pairs of similar sub-chains (modulo inversion and negation) in a given set of strings. This is a standard problem in computational geometry, for which many efficient algorithms are known [ms97,pl88].
Image (5), inspired by the FAST biosequence matching technique, shows the comparison of curvature graphs for the two outlines in figure (3). Each column i in the image corresponds to an extremal curvature value (maximum or minimum) a[i] in the first outline, and each line j ocorresponds to an extremum b[j] in the other. The lightness of pixel (i,j) is proportional to the difference |a[i]-b[j]|.
(5) |
The dark diagonal line, in the lower left part of the image, reveals a long sequence of curvature maxima and minima, in one of the outlines, that occurs---reversed and negated--- on the other outline. This coincidence is a strong indication that the two fragments were originally adjacent.
The large number of spurious dark spots in figure (5) make it difficult to distingush algorithmically the true shape congruences from random coincidences.
Part of the difficulty comes from the fact that two curves with very different shapes may have quite similar curvature graphs. To get around this problem, we plan to perform the identification at multiple scales of resolution [fs94,mm92]. Instead of comparing only the curvature graphs of the original outlines, filtered with cutoff frequency w, we compare also the outlines filtered with cutoffs w/2, w/4, .... The justification being that two curves with very different shapes will have very different curvature graphs in at least one of these scales.
The multiscale matching technique has been tested on a relatively small sample of 112 ceramic fragments. The results are shown in a separate page.