# Last edited on 2010-08-07 19:31:06 by stolfilocal Dataset of text images for text segmentation and tracking. The dataset consists of three parts: full/${TYPE}/${SOURCE}/${IMGNUM}.png Original full images containing multiple texts with various fonts, colors and sizes. They are are either digital photos of objects with text on them, or scanned documents (no synthetic images). crop/${TYPE}/${SOURCE}/${IMGNUM}-${TXTNUM}.png Cropped sub-images of the above, each containing a single text snippet (from a single character to multiple lines) with homogeneous size, font, and colors. If the original text was imaged in perspective, it is broken into separate snippets that vary by a {sqrt(2)} factor in size, at most. The text may be in any orientation and may follow curved baselines. cnat/${TYPE}/${SOURCE}/${IMGNUM}-${TXTNUM}.png Same as the files in "crop", but scaled down to the text's "natural scale" (where the stroke period is ~1.4 pixels). These images are also geometrically corrected to remove any distortions due to perspective, character rotation, and curved surfaces or baselines. In these file names, the subdirectory ${TYPE} can be: orig -- original color or grayscale image (photo or scan). true -- ground-truth image. mask -- binary mask (only in the "crop" and "cnat" versions). Each subdirectory ${SOURCE} gathers images obtained from smilar source(s) and/or processed in the same way; e.g. "dibc2009" for the DIBCO 2009 Challenge Images, "gg100806" for images obtained through Google search on 2010-08-06, and so on. The ${IMGNUM} and ${TXTNUM} are three-digit, zero-padded decimal numbers. The ${IMGNUM} identifies the full image within the ${SOURCE} directory. The ${TXTNUM} identifies the text sub-image within a full image. They need not be consecutive. The groundtruth images, in "full/true", "crop/true", and "cnat/true", use the encoding 0 = stroke, 1 = background. They may be grayscale (antialiased). The mask images are always bilevel, with 1 over the strokes and a narrow strip (about ~1 stroke's worth) of background flanking the strokes.