You must specify the MIL OCR font to read/verify the character strings in target images. MIL uses fonts (or typesets) to specify the style and size of characters in the images to be read or verified.
An OCR font contains the following information:
The grayscale representations of the characters.
Codes identifying each character (ASCII codes for the characters).
The number of characters in the OCR font.
Character dimensions.
The above information can be calibrated (with MocrCalibrateFont() or MocrControl()), modified (with MocrModifyFont()), and saved (with MocrSaveFont()) for later restoration.
Unless using one of the two provided predefined OCR fonts, you must define a custom OCR font. You can create a user-defined OCR font from scratch, or make minor modifications to an existing OCR font, already saved to disk.
To create a custom font:
Allocate an OCR font context, using MocrAllocFont().
When allocating an OCR font context, you must specify the maximum number of characters that can be stored in the font, and the dimensions of the font's character representations and their character cells.
Grab/create the grayscale character representations of the font in a MIL image buffer and then copy them from the image buffer to the OCR font context, using MocrCopyFont(). Alternatively, you can import grayscale character representations from a text file or an image file (for example, a TIFF) into an OCR font context using MocrImportFont().
You can use the MIL OCR Reader utility or an interactive graphics tool, such as Matrox Inspector, to create new characters to add to an existing font.
When importing, or copying, character representations to the OCR font context, the OCR font context must have sufficient space to hold the representations of all the specified characters. You can use MocrInquire() to determine the maximum number of characters that can be stored in the font and the size of each character.
Once copied or imported into an OCR font context, the entire OCR font context can be saved on disk using MocrSaveFont(), and then later restored using MocrRestoreFont().
The following is an example of a character in its MIL OCR font character cell and the dimensions that you will have to specify during OCR font context allocation. Values are to be specified in pixels. Each square in the grid represents one pixel.
The parameters CharCellSizeX, CharCellSizeY, CharOffsetX, CharOffsetY, CharSizeX, and CharSizeY of MocrAllocFont() must comply with the following restrictions:
2* CharOffsetX + CharSizeX <= CharCellSizeX.
2* CharOffsetY + CharSizeY <= CharCellSizeY.
CharCellSizeY, CharCellSizeX, CharSizeY, CharSizeX must be >= 6 pixels and <= 256 pixels.
When the characters in a font do not have a uniform width or height, CharSizeX should specify the width of the widest character in the font and CharSizeY should specify the height of the tallest character in the font. Also, to be able to search for a string over a range of angles, the font context's character size must be greater than 16x16. You can use the MIL OCRReader utility, or an interactive graphics tool, such as Matrox Inspector, to determine font character widths and heights.
When copying the character representations from an image buffer, or importing them from an image file, the characters must have the dimensions specified during OCR font context allocation.
The following is an example of how to create a user-defined MIL OCR font using a font definition image and MocrCopyFont(). In this case, the CharImageForFontDefinition.mim file contains the grayscale character representations. The image is loaded into an image buffer and then the character representations are copied to an allocated OCR font context using MocrCopyFont().
/* Define a font from an image using MocrCopyFont */
MIL_ID TheFontDefinitionImage;
MbufRestore(MIL_TEXT("CharImageForFontDefinition.mim"), MilSystem, &TheFontDefinitionImage);
MIL_ID TheFont;
/* Allocate an OCR font context */
MocrAllocFont(MilSystem, M_DEFAULT, 6, 21, 33, 3, 3, 15, 27, 3, 6, M_FOREGROUND_BLACK, &TheFont);
/* Copy font characters from the image to the font context */
MocrCopyFont(TheFontDefinitionImage, TheFont, M_COPY_TO_FONT, MIL_TEXT("ABC123"));
/* Save the OCR font context to disk */
MocrSaveFont(MIL_TEXT("TheFontMocrCopyFont.mfo"), M_SAVE, TheFont);
MocrFree(TheFont);
MbufFree(TheFontDefinitionImage);
The following is an example of how to create a user-defined MIL OCR font directly from a font definition image file, using MocrImportFont(). In this case, the CharImageForFontDefinition.mim file contains the grayscale character representations. which are imported into an allocated OCR font context using MocrImportFont(). MocrImportFont() imports into an allocated OCR font context.
/* Define a font directly from an image file using MocrImportFont */
MIL_ID TheFont;
/* Allocate an OCR font context */
MocrAllocFont(MilSystem, M_DEFAULT, 6, 21, 33, 3, 3, 15, 27, 3, 6, M_FOREGROUND_BLACK, &TheFont);
/* Import character representations from the font definition image */
MocrImportFont(MIL_TEXT("CharImageForFontDefinition.mim"), M_MIL_TIFF, M_LOAD_CHARACTER, MIL_TEXT("ABC123"), TheFont);
/* Save the OCR font context to disk */
MocrSaveFont(MIL_TEXT("TheFontMocrImportFont.mfo"), M_SAVE, TheFont);
MocrFree(TheFont);
When importing the character representations from an ASCII file, font character representations must be presented as follows:
Note that in this format, 'pixels' are delimited by a blank space. So '00' counts as one pixel.
This information breaks down into the following:
Row |
Description |
01 |
Specifies ASCII file format. |
02 |
Blank row. |
03 |
Specifies the start of a new character representation and its associated (generally ASCII) character. |
04 to 36 |
Specifies the alpha-numerical representation of the character. |
37 |
Blank row. |
38 |
Specifies the start of a new character representation and its associated (generally ASCII) character. |
39 to 71 |
Specifies the alpha-numerical representation of the character. |
etc. |
This pattern is repeated for every character in the font. |
The following is an example of how to create a user-defined MIL OCR font using character representations from an ASCII file, using MocrImportFont(). In this case, the AsciiFileForFontDefinition.txt file contains the ASCII character representations, in the format above. The character representations are imported into an allocated OCR font context using MocrImportFont().
/* Define a font from an ASCII file using MocrImportFont */
MIL_ID TheFont;
/* Allocate an OCR font context */
MocrAllocFont(MilSystem, M_DEFAULT, 6, 21, 33, 3, 3, 15, 27, 3, 6, M_FOREGROUND_BLACK, &TheFont);
/* Import character representations from the font definition ASCII file */
MocrImportFont(MIL_TEXT("AsciiFileForFontDefinition.txt"), M_FONT_ASCII, M_LOAD_CHARACTER, M_NULL, TheFont);
/* Save the OCR font context to disk */
MocrSaveFont(MIL_TEXT("TheFontMocrImportFromASCIIFile.mfo"), M_SAVE, TheFont);
MocrFree(TheFont);
Once created, a MIL OCR font can be saved and restored as needed. Restoring this information (using MocrRestoreFont()) rather than creating the MIL OCR font from scratch saves time, especially if the restored font requires no further modifications. Note that the entire OCR font context is restored when restoring the font using MocrRestoreFont(). MIL comes with three predefined Semi fonts; for more information, see the next subsection.
MIL OCR comes with two ISO compatible SEMI fonts (M_SEMI_M12_92 or M_SEMI_M13_88) and one generic SEMI font that has no constraints and no checksum (SEMI.mfo). These can be used directly or modified to suit your needs.
To use a SEMI font directly, restore it using MocrRestoreFont() with the FileName parameter set to "SEMI_M12-92.mfo", "SEMI-M13-88.mfo", or "SEMI.mfo". These files are located in directory "\contexts\" under the MIL installation folder. Once restored, the font can be modified using the MIL OCR functions.
To create a new font based on a SEMI font:
Create an OCR font context using MocrAllocFont() with:
The FontType parameter set to either M_SEMI_M12_92 or M_SEMI_M13_88.
The StringLength parameter set to 12 when using M_SEMI_M12_92 and 18 when using M_SEMI_M13_88.
The CharNumber parameter set to 38. This allows for capital letters (A-Z), digits (0-9), hyphen (-), and period (.).
Use either MocrCopyFont() or MocrImportFont() to add character representations from an existing SEMI font.
Using high-quality character representations will produce the best results. OCR processing relies on using the cleanest font characters possible.
When using an M_GENERAL OCR font context, broken characters and spaces, even if expected in the target string, should not be defined in the font. Instead, you should enable the ability to read broken characters using MocrControl() with M_BROKEN_CHAR, and/or enable the ability to read spaces using MocrControl() with M_BLANK_CHARACTERS.
When using an M_GENERAL font context type, the threshold between the characters and the background must preserve the shape of the characters and have a clearly-visible point of differentiation (binarization).
If the characters in the target image are brighter than the background (for example, white on black), then the character representations included in your font must also be of characters that are brighter than the background. The foreground is specified at context allocation time (MocrAllocFont()) and can be changed later using MocrModifyFont() with M_INVERT. This changes both the character representations and the setting specified at allocation time.
If the size of the character representations in the font is not the same as those in the target string, you can calibrate the font (discussed later). Alternatively, when the physical size of the character representations of the OCR font differ from those in the target image, changing the size of the character representations of the OCR font could improve the robustness of the search. To change the size, use MocrModifyFont() with M_RESIZE. Changing the size of the font permanently in the OCR font can be faster than resizing the font before each read/verify operation, as is done when the font is calibrated.
It might be necessary, at some point during application development, to display the character representations of your MIL OCR font. To do so, use MocrCopyFont() to copy the character representations to a displayable image buffer.
To remove a character from the OCR font, use MocrControl() with M_CHAR_ERASE and specify the ASCII code associated with the character representation to remove. An OCR font can contain a limited number of characters; this number is set during OCR font context allocation. Removing unused or erroneously added characters is the easiest way to assure that these characters will not be used when looking for matches in the target string and that there is space for new characters to be added.