INTERFACE PZMismatch; (* Tools for computing the mismatch of candidates. *) (* Last edited on 1999-08-22 01:38:53 by hcgl *) IMPORT ParseParams; FROM PZTypes IMPORT LONG; (* MISMATCH OF A CANDIDATE A candidate consists of two curves (sequences) of `samples' "a", "b", related by a discrete matching "mt[0...m-1] = (r[0],s[0]),..(r[m-1],s[m-1])". See "PZMatch.T" and "PZCandidate.T" for definitions of these concepts. Here we assume that the curves "a" and "b" have been extracted from the contours and possibly reversed and complemented, as specified by the candidate's representation. The quality of a candidate is typically measured by a `mismatch function'. The mismatch functions for the various kinds of curves that we use generally have the same general form, described below, and differ in a few numeric parameters. CHAIN METRICS The general mismatch formula uses two metrics "h" and "d" that depend on the nature of the curves. The metric "h(a,r1,r2)" gives the `curve length' between two samples "a[r1]" and "a[r2]" of curve "a", and similarly for curve "b". The metric "d(a[r], b[s])" gives the `distance' between the two samples "a[r]" and "b[s]", one from each curve. If the samples are points on R^2 or R^3, then "d" is typically the Euclidean distance, and "h" may be either the Euclidean distance or the length of the arc along the corresponding curve between the two samples. If the samples are single values, then "h" is usually the arc length, and "d" is the absolute value of the difference between the two sample values. GENERAL MISMATCH FORMULA The mismatch between two continuous curves "a" and "b" with matching "(r,s)" is conceptually computed from distances "d(a(r(t)),b(s(t)))" of matched samples "a(r(t))" and "b(s(t))", plus a penalty term that depends on the absolute speed difference between the matching functions "r(t)" and "s(t)". Let's assume that the continuous curves "a(r)" and "b(s)" are parametrized by the respective curve lengths, and that the parameter "t" is the average of "r" and "s", i.e. "r(t) + s(t) = t", The `mismatch' for a continuous matching "(r,s)" is then the integral "Mis(r,s)" with respect to "t" of the quantity | mis(t) = MIN(maxDist, d(a(r(t)),b(s(t))))^2 | - critDist^2 | + skipDist^2 * |r'(t) - s'(t)| / 2 where "maxDist", "critDist", and "skipDist" are parameters described below. MISMATCH PARAMETERS The `critical distance' "critDist" is the average value of the sameple distance "d" (in the root mean square sense) for which a candidate is equally likely to be true or false, in the limit of very long candidates. The `saturation distance' "maxDist" is the value of the sample distance "d" beyond which its precise value is no longer significant; so the computations should use "MIN(d, maxDist)" instead of the raw sample distance "d". It is usually set to a small multiple (2 or 3) of "critDist". This clipping is meant to compensate for the fact that the distribution of "d" values along a true candidate is not Gaussian, due to the significant probability of loss of large crumbs. The `skipped step distance' "skipDist" is used to penalize imperfect matchings. Specifically, we add to the mismatch the integral of "skipDist^2" times half of the absolute speed difference "|r'(t) - s'(t)| / 2" of the two side of the matching, where "t" is the average arclength of the two curves. This penalty term is necessary mainly when matching two curves whose samples are single numbers, since in that case one can often get artifically low discrepancies "d" between random curves by manipulating the parametrizations "r" and "s". It is not necessary for geometric matching. Below we describe how the continuous integral is replaced by a discrete summation for a discrete matcing "mt". STEP LENGTH AND ASYMMETRY Refer to "PZMatch.T" for a definition of `step', `full step', and `half-step' of a matching. The `length' "len(S)" of a matching step "S = p->q", by definition, is the average of the lengths of the two curve steps, i.e. | len(p -> q) = h(a,p[0],q[0]) + h(b,p[1],q[1]) The length "Len(mt)" of the matching is the sum of "len(S)" for all its steps (i.e. the average of the lengths of the two curve segments spanned by the matching). The `asymmetry' "asymm(S)" of a step "S" is the absolute half-difference between the lengths of the two curve steps, i.e. | asymm(p -> q) = |h(a,p[0],q[0]) - h(b,p[1],q[1])| / 2 The `total asymmetry' "Asymm(mt)" of a matching "mt" is the sum of "asymm(S)" for all its steps "S". CHAIN DISTANCE The `average curve distance' "dist(S)" of a matching step is the root mean square distance between corresponding samples along the two curve steps, estimated by the formula | dist((r1,s1) -> (r2,s2)) = | MIN(maxDist, sqrt((d(a[r1],b[s1])^2 + d(a[r2]-b[s2])^2)/2)) The `integrated squared curve distance' "IntgDistSqr(mt)" of the matching "mt" is then the sum of 'dist(S)^2 * len(S)" for all steps "S"; and its `average curve distance' is then | Dist(mt) = sqrt(IntgDistSqr(mt) / Len(mt)) DISCREPANCY The `discrepancy' of a step "S" is by definition | disc(S) = sqrt(dist(S)^2 + skipDist^2 * asymm(S)/len(S)) The `integrated square discrepancy' "IntgDiscSqr(mt)" of the matching is then the sum of "disc(S)^2*len(S)" for all steps "S"; and its `average discrepancy' is then | Disc(mt) = sqrt(IntgDiscSqr(mt) / Len(mt)) MISMATCH The `mismatch' of a step "S" is the square of its discrepancy minus the critical distance squared, times the step length; i.e. | mis(S) = (disc(S)^2 - critDist^2) * len(S) = (dist(S)^2 - critDist^2) * len(S) + skipDist^2 * asymm(s) In particular, if the curve steps have all the same size 'step", then | { (dist(S)^2-critDist^2)*step for full steps, | mis(S) = { | { (dist(S)^2-critDist^2+skipDist^2)*step/2 for half steps. The mismatch of the matching "Mis(mt)" is then the sum of "mis(S)" for all steps of the matching. That is, | Mis(mt) = IntgDistSqr(mt) - Len(mt)*critDist^2 + Asymm(mt)*skipDist^2 Note that the mismatch functions "mis(S)" and "Mis(S)" have units of sample distance squared times arc length; whereas the other functionals defined above ("len", "Len", "asymm", "dist", "Dist", "disc", and "Disc") have units of sample distance. Note also that the mismatch "Mis" is an integral, whereas "Dist" and "Disc" are averages. *) TYPE Options = RECORD critDistSqr: LONG; (* Critical sample distance (xi^2). *) maxDistSqr: LONG; (* Saturation sample distance. *) skipDistSqr: LONG; (* Skipped step equivalent distance (zeta^2). *) END; (* A "MismatchOptions" record contains several parameters that are typically used in the formula for the `mismatch' of a candidate, a quantity that determines the likelyhood of the candidate being true. Note that the values stored in the record are the squares of the parameters, for programming convenience. *) PROCEDURE Eval( length: LONG; (* Total length of matching. *) intgDistSqr: LONG; (* Integral of the curve distance squared and clipped. *) totAsymm: LONG; (* Total step asymmetry. *) READONLY o: Options; (* Mismatch parameters. *) ): LONG; (* Evaluates the mismatch of a matching "mt", given "length = Len(mt)", "intgDistSqr = IntgDistSqr(mt)", and "totAsymm = Asymm(mt)", as defined above. *) CONST OptionsHelp = "-maxDist NNN -critDist NNN -skipDist NNN"; PROCEDURE ParseOptions(pp: ParseParams.T): Options RAISES {ParseParams.Error}; (* Parses mismatch parameters from the command line input, as shown in the "OptionsHelp" string above. *) END PZMismatch. (* Copyright © 2001 Universidade Estadual de Campinas (UNICAMP). Authors: Helena C. G. Leitão and Jorge Stolfi. This file can be freely distributed, used, and modified, provided that this copyright and authorship notice is preserved, and that any modified versions are clearly marked as such. This software has NO WARRANTY of correctness or applicability for any purpose. Neither the authors nor their employers chall be held responsible for any losses or damages that may result from its use. *)