# Last edited on 2001-09-20 17:54:24 by stolfi

DATASET DESCRIPTION
  
  Fragment preparation

    This directory contains the input data for the test "ceramic-3".
    
    The test objects were five unglazed ceramic tiles, each about 5cm
    × 20cm × 7mm. On the flat side, each tile had a bevel all around
    the edge, about 2mm wide and 1 mm deep. The tiles had apparently
    been baked to the "biscuit" stage (That's the stage just below
    vitrification, where the piece will no longer soften in water, but
    can still absorb it.)

    The tiles were placed face down on a paper sheet, resting upon a
    cement surface, and were hit repeatedly with a steel bar. The
    material was harder than common brick, but softer than typical
    ceramic tiles; so fracturing generated many small crumbs and dust,
    and edges were rather delicate, so fragments had to be handled 
    with care in order not to damage de fracture edges.

  Fragment scanning

    We recoverd 112 pieces large enough to handle, ranging from ~5cm to
    ~0.5 cm in diameter. These were numbered with pencil and scanned
    directly on a ordinary flatbed document scanner (UMAX model UC630
    Maxcolor, driven by Adobe Photoshop running on a Macintosh). The
    pieces were scanned in four batches of about 30 pieces each, in no
    particular order; see the files "batches/fragments-{a,b,c,d}.pgm".
    Individual pieces were later isolated from these images with the
    PZPsplit program.

    The fragments were placed, flat side down,, in random position and
    orientation. To maximize contrast, the flat face of each piece was
    lightly rubbed with chalk, and a black velvet cloth was used in lieu
    of the scanner's cover. The scanner was set to grayscale mode,
    300dpi (its nominal maximum). The images were saved in TIFF format
    and later converted to 8-bit PGM files.

  Reassembled objects

    After scanning, the pieces were manually reassembled into the
    original tiles. Each tile was wrapped in transparent plastic to
    reduce the risk of further abrasion, and scanned again; see file
    "tiles/assembled.pgm".  Unfortunately, it did not occur to us 
    to scan the original tiles before breaking them up.
  
  File history

    These data are the same fragments and images as "ceramic-2", but
    processed with a corrected and improved filtering program (PZFilter
    of 20/aug/1999). These files were originally stored in
    euler:/n/lac3/hcgl/tests/ceramic-3.

INPUT DATA FILES

  Directory "batches"

    The "batches" directory contains unprocessed images of multiple
    fragments, as obtained from the scanner:

        Bytes  File        
      -------  ---------------
      6035867  fragments-a.pgm
      5357307  fragments-b.pgm
      5113563  fragments-c.pgm
      5958513  fragments-d.pgm
      -------  ---------------

  Directory "fragments"

    The "fragments" directory contains images and extracted outlines
    of individual pieces. Each numbered sub-directory "fragments/0000"
    through "fragments/0111" contains data for one piece of the
    puzzle. The directory number should match the number penciled on
    the piece itself. The files in each sub-directory are:

      File       Produced by         Contents 
      ---------  ----------------    --------------------------------------------
      image.pgm  PZSplit             Grayscale image of the piece.
      r000.flc   PZBoundary          The piece's raw outline, from image.pgm.
      f000.flc   PZFilter            Same as r000.flc, but centered at (0,0).
      f000.lbl   PZFilter            Sample labels for f000.flc.
      f000.ps    PZDraw              Plot of f000.flc.
      ---------  ----------------    --------------------------------------------

  Directory "multiscale"

    The "multiscale" directory contains multiple versions of the
    outline curves in the "fragments" directory, smoothed with
    PZFilter at multiple resolution scales NNN (001, 002, 004, ...
    128, 256). Note that the coarsest versions are are not available
    for the smallest pieces.

      File       Produced by         Contents 
      ---------  ----------------    --------------------------------------------
      fNNN.flc   PZFilter            The piece's outline, filtered to scale NNN.
      fNNN.lbl   PZFilter            The numeric label of each sample.
      fNNN.flv   PZComputeVelAcc     The velocity vector at each sample point.
      fNNN.fla   PZComputeVelAcc     The acceleration vector at each sample point.
      fNNN.fcv   PZComputeCurvature  The outline's curvature at each sample point.
      fNNN.cvc   PZEncodeCurvature   Cuvature values encoded as letters 'z-a0A-Z'
      fNNN.ps    PZDraw              Postscript plot of fNNN.flc.
      ---------  ----------------    --------------------------------------------

    Corresponding points in two different versions of the same
    fragment outline can be identified by having the same ".lbl"
    value.

  Directory "nonfractal"
    
    This directory contains one file "fNNN-str.seg" for each scale NNN,
    that specifies which segments of each fragment are too smooth to be
    considered by the shape matching programs.  (These are likely
    to be outer edges anyway.)
  
  Directory "pairs"

    The "pairs" directory contains reference solutions, compiled by
    hand. These files are used to evaluated the precision and 
    recall of our algorithms.

    Each of the main files contains a list of `candidates' --- pairs
    of matching fragment outline segments --- in the ".can" format
    (see below).

    There were 209 `true' candidates, i.e. pairs of outline segments
    which were indeed adjacent in the original tiles. Among these, we
    identified a subset of 195 `recognizable' candidates: those which,
    in our judgement, could possibly be recognized as such by a person,
    looking at the two outline segments. The remaining 14 candidates
    were considered too dissimilar in shape (due to material loss) to be
    recognized as such.

      File                        Contents
      --------------------------  ---------------------------------------
      f000-t.can                  The 195 "recognizable" true candidates.
      f000-t-can.dgr              LaTeX adjacency graph of the same.

      f000-u.can                  The  14 "unrecognizable" true candidates.
      f000-u-can.dgr              LaTeX adjacency graph of the same.

      adj-graph.tex               LaTeX file to display the graphs.
      adj-graph.make              Makefile for the above.

      f004-tr-0000??-dr-f.eps     Postscript plots of some true candidates.
      --------------------------  ---------------------------------------

  Directory "tiles"

    The "tiles" images contains images of the manually reassembled tiles:

        Bytes  File                    Contents
      -------  ----------------------  ---------------------------------------
      7821075  assembled.pgm           The five tiles, manually reassembled.
      2169088  assembled.jpg           Lossy version of assembled.pgm

       191631  assembled-small.pgm     A reduced version of assembled.pgm.
       389473  assembled-small.eps     Postscript version of the same.

       211468  assembled-detail.pgm    Full-res detail of assembled.pgm.       
       430004  assembled-detail.eps    Postscript version of the same.
       
         1717  assembled.ctrs          Center of each piece in "assembled.pgm"
      -------  ----------------------  -----------------------------------------

FILE FORMATS

  General information

    Data files produced by our programs start with a a line of the form
    "begin PZXxxx.T (format of YY-MM-DD)" where "Xxxx" identifies the
    file type, and "YY-MM-DD" specifies a particular version of the
    format.

    Following the "begin" line are some comments, identified by "|" on
    column 1, usually written by the program(s) that produced the file.

    After the comments there are some file-specific parameters; two
    common ones are "samples = NNN" (the number of sample points in the
    file), and "unit = N.NNNN" (a scaling factor for the sample data).

    After the parameters comes the data samples, one sample per line. 
    The number of coordinates in each sample depends on the file type. 
    To save I/O time, each sample coordinate is usually written as an
    integer, that is implicitly scaled by the "unit" parameter. So, for
    example, if "unit = 0.001", then the coordinate "9542" actually
    means "9.542".

    The data file is closed by a line of the form "end PZXxxx.T".

  File-specific information

    The names [PZXxxx] in brackets are the program modules where the 
    contents and/or the file format is defined.

    .flc   Coordinates (X,Y,Z) of sample points along the boundary of a
           fragment. The Z coordinate is always 0 in these samples. The
           unit of measure is the scanner's pixel (1/300 of an inch)
           times the "unit" parameter. Thus "unit = 0.001" means the
           unit is actually 0.000084 mm.  [PZLR3Chain,PZFilter]

    .lbl   A numeric `label' attached to each sample point.  The PZFilter
           program tries to preserve labels when filtering and resampling the 
           outline, so that a point labeled, say, 0.65 in the filtered curve
           corresponds to some point between samples labeled 0.64 and 0.67 
           in the original curve.  The labels are stored in the file
           as integer multiples of the "unit" parameter. [PZLRChain,PZFilter]

    .flv   Tangent direction (velocity vector) of the curve at each of the
           sample points in the corresponding ".flc" file. The unit of
           measure is the "unit" parameter times a scaner pixel. The
           curve is approximately parametrized by arc length, so the
           length of the vector should be close to 1. The Z component of
           the velocity is always 0.  [PZLR3Chain,PZComputeVelAcc]

    .fla   Acceleration vector of the curve at each of the sample points 
           in the corresponding ".flc" file. The unit of measure is the "unit"
           parameter times a scaner pixel.  The vector should be approximately
           normal to the curve. The Z component of the velocity is always 0.
           [PZLR3Chain,PZComputeVelAcc]

    .fcv   Estimated numerical curvature of the outline at each sample 
           point.  The unit is pixel^{-1}, times the "unit" parameter.
           [PZLRChain,PZComputeCurvature]

    .cvc   Estimated curvature of the outline at each sample point, scaled
           and compressed to the range [-26..+26] and encoded as a
           letter: z-a for [-26..-1], 0 for 0, A-Z for [+1..+26].
           The "sigma" parameter defines the compression function
           [PZSymbolChain,PZEncodeCurvature]

    .seg   List of fragment outline segments.  The "segments" parameter
           is the number of entries.  Each segment is described by one 
           line with five fields: the fragment number, the number of 
           samples in the whole outline, the index of the first sample
           belonging to the segment, the number of samples in the
           segment, and the reading direction (`+' for increasing
           sample indices, `-' for the opposite direction)

    .can   List of `candidates'. A `candidate' is a pair of outline
           segments that are claimed to have been adjacent in the
           original object. The parameter "candidates" defines the
           number of candidates in the file.
           
           Each candidate is specified by a line with either 13 or 19
           fields. The first 10 fields specify the two outline
           segments, 5 fields each, as in the ".seg" file. After that
           there are 3 fields for the candidate: a `mismatch' measure
           (often 0, meaning `not available'), the number of steps
           averaged between the two segments, and the number of
           samples that were considered `matched' (often 0, meaning
           `not available').

           After those 13 fields, there may an optional description of
           a proposed pairing of the samples between the two segments.
           The pairing is a sequence of pairs of sample indices, one
           in each segment, that increase at most by one at each step
           --- on only one segment, or on both at the same time. The
           pairing, if present, is defined by 6 additional fields ---
           the last 5 enclosed in parentheses --- giving: the number
           of pairs in the pairing; the indices of the first and last
           samples of the first segment; ditto for the second segment;
           and a string of caracters that provides a graphical
           description of the pairing, using "/" for a step only on
           segment 1, "\" for a step only on segment 2, and "|" for a
           step on both segments simultaneously.
           [PZCandidate,PZMatch,PZMapCands,PZRefineCands]

    .ctrs  This file has no headers.  The first two lines specify the dimensions
           "width = NNN" and "height = NNN" of the image showing the
           manually reassembled fragments. After that comes one line
           for each fragment, giving the fragment number (0000 to
           0112) and the H and V coordinates of its approximate center
           in that image.

    .ps    Postscript (printer-ready, not encapsulated) plot of the outline.
           Grid lines, when shown, are 50(??) scanner pixels (4.23 mm) apart.
           [PZDraw,PZDrawCand,PSPlot]