| Customize Help

Training: CNN



Training a CNN requires allocating a specific one defined by Matrox and, if necessary, modifying the training mode controls, and then calling MclassTrain().

Training mode controls are specified for the training context; the actual CNN (the classifier context) is not controlled in this way. That is, the predefined CNN classifier context gets trained, and you are using the training context to control that training.

Note that, during the training process, you can observe the evolution of the classifier to see if it is progressing as expected. Usually, after training, you would analyze your results, and train again to make improvements, until you consider your classifier properly trained. For more information, see the Training: analyze and adjust section later in this chapter.

By default, MIL trains using the best available GPU, whenever possible. Although GPU training is highly recommended, you can train using the CPU. To specify the engine with which to train (GPU or CPU), call MclassControl() with M_CNN_TRAIN_ENGINE. For more information about how to configure your training set-up to increase efficiency and reduce the likelihood of any issues arising, see the Requirements, recommendations, and troubleshooting section later in this chapter.

Predefined CNN classifiers to train

Matrox provides different predefined CNN classifiers to address different problems. The classifiers vary from those designed for simple cases to those designed to learn highly-complex problems.

Different aspects of the problem need to be considered for selecting (and training) the right network; the amount of data available, required speed, and problem complexity all lead to a specific architecture and training mode.

The predefined CNN classifier contexts that you can allocate (and that must be trained) are divided into categories of small, medium, and extra large. The main difference between these is the image sizes that they are designed to handle (their receptive field); more complicated problems usually require CNNs that handle larger images (these generally take longer to train).

When allocating a predefined CNN (M_CLASSIFIER_CNN_PREDEFINED), you must choose a specific one, by specifying M_FCNET_.... The term FCNet refers to fully convolutional network, and the suffix implies the size; for example, M_FCNET_M refers to a medium sized classifier (this can also be seen as a medium sized network, for problems with a medium-type of difficulty).

Predefined classifiers usually support a complete training mode, though not necessarily other modes (transfer learning and fine tuning), when you first use them. The mode generally refers to the process by which MIL trains a predefined CNN classifier and the extent to which it should use previously learned information. For more information, see the Training modes subsection of this section.

Note, the size of the training images (the classifier's receptive field) establishes the size of the classifier's source layer, which you can inquire using M_SIZE_X and M_SIZE_Y. For more information, see the CNN subsection of the Classifiers and how they work section earlier in this chapter.

FCNet - small

The small FCNet (M_FCNET_S) is a compact and fast CNN that is designed for small images, with a minimum image size (receptive field) of 43 pixels and a step size of 4 pixels. The capacity of the network is low, so it might not be suitable for highly complex problems. M_FCNET_S is for a complete training. You can use grayscale (1-band) or color (3-band) images with this classifier.

FCNet - medium

The medium FCNet (M_FCNET_M) is designed to solve the majority of the problems. Its minimum image size (receptive field) is 83 pixels and its step size is 8 pixels. The capacity of the network is high enough to cover many problems with a fairly fast prediction speed. M_FCNET_M is for a complete training. You can use grayscale (1-band) or color (3-band) images with this classifier.

FCNet - extra large

The extra large FCNets for monochrome and color images (M_FCNET_MONO_XL and M_FCNET_COLOR_XL) are intended for transfer learning (to build on training done by Matrox). The general extra large FCNet (M_FCNET_XL) is for a complete training.

These extra large classifiers have a minimum image size (receptive field) of 195 pixels and a step size of 32 pixels. The capacity of these classifiers is high but prediction is slower than the other classifiers.

With M_FCNET_MONO_XL, you must use grayscale (1-band) images. With M_FCNET_COLOR_XL, you must use color (3-band) images. With M_FCNET_XL, you can use either grayscale or color images.

Matrox has partly pretrained the M_FCNET_MONO_XL and M_FCNET_COLOR_XL classifiers to properly extract the generally important features suitable for most applications. You must typically continue the training process, with a transfer learning mode and your own labeled image, to complete the training for your specific needs.

Selecting a predefined CNN

Choosing the proper predefined CNN typically depends on your training scenario. The following represents some guidelines for making this choice.

  1. The medium FCNet (M_FCNET_M) is typically the most appropriate classifier in the majority of cases, provided you have enough training images. If your images are smaller than the classifier's minimum image size (receptive field), you can resize them. You can also down-sample (shrink) your images to help reduce memory requirements and decrease prediction time.

    Down-sampling is not recommended if it noticeably affects important features; for example, if it causes a defect to disappear or become faint.

  2. If the input images are small, from 43 to 99 pixels, or when prediction speed is a critical factor, you can use the small FCNet (M_FCNET_S). This classifier has a smaller capacity, runs faster, and supports images as small as 43 pixels (if necessary, you can increase the size of smaller images).

  3. If collecting data in quantity is difficult, you can perform a transfer learning type of training. In this case, you should use the extra large FCNets that are intended for transfer learning (M_FCNET_MONO_XL and M_FCNET_COLOR_XL).

In general, you should use the medium FCNet as the starting point for your training. If the resulting prediction speed is not fast enough and you can make the input image size (training images) compatible with the minimum image size (receptive field) of the small FCNet, then you can try that classifier instead. If your training results are unsuccessful due to insufficient data (not enough images) or the complexity of the problem, you should try the extra large FCNets.

Input image sizes

Different predefined CNN classifiers accept different input image sizes (for training and also prediction). The possible sizes, independent in X and Y, are equal to the minimum size of the image required by the classifier (also known as the classifier's receptive field) + k or j * step size, where k and j are integers that are greater than or equal to 0 and represent the increment, in X(k) and Y(j), at which you can increase the image size. Specifically, the following formulas establish valid image sizes for X(k) and Y(j):

These training image and step values are retrievable using MclassInquire().

The following are examples of different image sizes you can use, given that the classifier's minimum image size in X and in Y is 43 pixels, and the step size is 4 pixels:

Min size

+

K

*

Step

=

Valid size

43

+

0

*

4

=

43

43

+

1

*

4

=

47

43

+

2

*

4

=

51

43

+

3

*

4

=

55

43

+

4

*

4

=

59

Theoretically, if you adhere to the given image size pattern, there is no maximum image size, although available memory will at some point limit it.

Training modes

Depending on the problem definition, different training modes exist. MIL offers the following modes:

  • Complete (the default).

    • Typically, this is for completely restarting the training of a CNN classifier context, or for training a CNN classifier context that is not trained.

  • Transfer learning.

    • Typically, this is for a CNN classifier context that was already trained on a specific classification problem, and that you must train on a similar (but new) problem.

  • Fine tuning.

    • Typically, this is for a CNN classifier context that was already trained on a specific classification problem, and that you must fine tune with new data.

These modes differ in their input and output properties as well as in their initialization. To set these training modes, call MclassControl() with M_RESET_TRAINING_VALUES. When you specify a training mode, MIL resets all related training mode values accordingly. However, you can also change these values explicitly. For more information, see the Training mode controls subsection of this section.

The default values for these training settings should be sufficient for typical cases. To modify them, call MclassControl() with the training context. You can either modify the individual settings (for example, M_INITIAL_LEARNING_RATE), or you can reset them all based on the type of training you want to do (M_RESET_TRAINING_VALUES).

Complete

In this mode (M_COMPLETE), a complete model training is performed. The classifier's source layer adapts to the size (and number of channels) of the input training images; the output layer adopts the number of target classes, and all the network weights are randomly re-initialized. This is the preferred mode when having access to a significant amount of training data. This is the default training mode.

Note, once a classifier that requires a complete training (for example, M_FCNET_M) was trained once, you can then continue the training process as transfer learning or fine tuning. This requires copying the trained classifier result into a classifier context, using MclassCopyResult(), and retraining it with MclassTrain(). For more information, see the Steps to train subsection of the Training: in general section earlier in this chapter.

Transfer learning

This mode (M_TRANSFER_LEARNING) performs the transfer learning technique. Similar to complete, the source layer adapts to the size of the input training images and the output layer adapts to the number of target classes. Note, the classifier's source layer only adapts to the size of input training images (since, the number of bands will not change, the input training images should have the same bands as the CNN classifier).

The classifier's weights for the feature extraction layers however are those of the pretrained classifier. During a transfer learning type of training, only the classifier layers are trained to classify the images. You should use this mode when the quantity of training data is limited. For more information about the classifier's internal layers, see the Classifiers and how they work section earlier in this chapter.

Fine tuning

This mode (M_FINE_TUNING) is used to fine-tune a classifier (model). Fine tuning is used to improve a previously-trained model with the addition of new training data.

In this mode, the input and output layers as well as all the network weights are those from the previously-trained model. As a consequence, the number of target classes, as well as the size and bands of the input images, cannot be changed in this mode.

Typical reasons to fine-tune are:

  • You captured more data and updated the training dataset for all or some specific classes.

  • You must now account for new image conditions like illumination or camera perspective.

Summary and comparison

The following table summarizes and compares the training modes.

Complete

Transfer learning

Fine tuning

Input images

Adapts size and bands of input images

Adopts size of input images

Uses existing sizes

Feature extraction layers

Resets internal parameters (weights)

Starts from existing internal parameters (weights)

Uses existing internal parameters (weights)

Classification layers

Resets internal parameters (weights)

Resets internal parameters (weights)

Uses existing internal parameters (weights)

Output layers

Adapts number of target classes

Adapts number of target classes

Uses existing classes

Usage

New application with lots of training data

New application with limited training data

Improve existing application

Training mode controls

When you specify a specific training mode, MIL automatically sets the related training mode controls to the required settings. You can also modify these controls yourself, to adjust the training process. The training mode controls let you adjust the:

  • Learning rate.

  • Maximum number of epochs.

  • Mini-batch size.

  • Schedule type.

Such training mode controls are also known as hyperparameters.

Learning rate

The learning rate controls the factor by which the optimizer updates the network weights after each iteration. The higher the learning rate, the bigger the steps the optimizer takes to adjust the network parameters, and vice versa.

Training begins with an initial learning rate (M_INITIAL_LEARNING_RATE) which then decreases after each epoch according to the decay value (M_LEARNING_RATE_DECAY). As an example, if the decay is set to 0.2, the learning rate loses 20% of its amplitude value after each epoch.

At the beginning, higher learning rate helps the network to converge faster to a desirable state. After several iterations, the network's weights are updated more and more carefully with lower learning rate.

Maximum number of epochs

The maximum number of epochs (M_MAX_EPOCH) sets the maximum number of epochs to complete the training process. An epoch refers to one complete cycle through the dataset that the classifier must learn during training.

The more complex the application, the more epochs might be needed.

Mini-batch size

Datasets cannot usually fully reside in memory during training. Mini-batches are used to break down the dataset that each epoch cycles through. MIL loads the data and processes it one mini-batch after another for each epoch.

The mini-batch size (M_MINI_BATCH_SIZE) determines the number of images that are part of a mini-batch. The larger the mini-batch size the faster the training process and potentially the more accurate the resulting network but the more memory is required.

A larger mini-batch size tends to make the learning smoother and faster. The maximum batch size is limited by the available memory for calculations. In general, a bigger batch size improves the network accuracy, however, depending on data, it is not systematically the case. The batch size typically varies from 32 to 512. Batch sizes smaller than 32 is not recommended as they typically perform poorly.

Batch size plays a key role in the consumption of resources. It is recommended to observe memory (GPU or CPU) during training, with the help of the operating system's performance monitor.

Schedule type

The schedule type (M_SCHEDULER_TYPE) sets the schedule with which to adjust the learning rate. You can either specify to decay the learning rate at a cyclical schedule or to decay the learning rate as the internal parameters (weights) are updated.

When specifying a cyclical schedule (M_CYCLICAL_DECAY), it is expected that you have a minimum number of mini-batches (for example, a dozen) per epoch.

Comparing default training mode controls

The following table shows the default settings for the training mode controls and how the defaults differ depending on whether the training is complete, transfer learning, or fine tuning.

Complete

Transfer learning

Fine tuning

Initial learning rate

0.005

0.005

0.001

Learning rate decay

0.1

0.1

0.1

Maximum number of epochs

60

60

60

Mini-batch size

32

32

32

Schedule type

M_CYCLICAL_DECAY

M_CYCLICAL_DECAY

M_DECAY

Each training mode control must be set to its default if you truly want to train in the corresponding mode. For example, the learning rate is adjusted on a cyclical schedule when you are performing a complete training. However, if you specify a complete training, and then change the schedule, you are no longer performing a true complete training.