imageclassification

imageclassification / ImageClassificationTrainer / 0.2.1

README.md

Overview

This algorithm implements an image classifier training system that can learn from a small number of labeled images. The algorithm uses a residual network(ResNet)[1], which is one of the most successful methods used in image classification tasks. More specifically, utilizing a transfer learning approach, this algorithm fine-tunes a ResNet pre-trained on ImageNet dataset[2] by modifying the final layer of the neural network to the problem of interest, and running backpropagation algorithm to re-adjust weights of the whole network.

The output of the network can be predictions on a test set, information on the test classification performance, or a trained model, depending on the user choices.

Applicable Scenarios and Problems

This algorithm can be used for any image classification task. One of our main goals with this algorithm is to make it possible for users to train neural networks for image classification with a very small number of examples. Further, models trained with this algorithm can be used in the future on new images by providing those models as input to our ResNetModelRunner algorithm.

Structure

This algorithm uses a ResNet[1], removes the last layer of the network, and replaces it with a fully-connected layer in which the number of output units is the same with the number of classes in the training data. This is a classic example of transfer learning, where the information learned by pre-trained models is utilized to make the task of distinguishing among a new set of classes easier. This makes it possible for the users to train an accurate classifier by using as few as 5-10 samples per class in some cases.

Data Format

There are two main ways of providing data to the training algorithm: (i) Single folder option, in which all images are provided under a single folder, and (ii) three folder option, in which standard protocol of train-validation-test can be performed by providing a folder for each of these sets.

Single folder option

The first option involves providing a single folder with all images. In this case, the default behavior of the algorithm is to split the images in this folder to train, validation and test sets. The exact number of images in each set depends on the split_ratio parameter. By default, this parameter is set to [0.7, 0.1, 0.2], meaning 70% of the images are used for training, while 10% of the images are used for validation and the remaining 20% are used for testing. Data needs to be provided in a particular format, where the class name and filename are separated by a "-". For example, a sample belonging to cat class may have the following names: cat-25.jpg, cat-mycat.jpg or cat-russian_blue.jpg. The algorithm does not need a separate list of classes, the classes are inferred from the list of training images provided.

Below is a sample file hierarchy for single folder option.

/images-dir
|-- cat-image1.jpg
|-- cat-image2.jpg
|-- dog-image1.jpg
|-- dog-image2.jpg

Effectively, this option means that you can run a training task by only providing a set of images as input, and get an idea about the training performance. This can be very useful in the cases where user may want to see whether transfer learning with a deep-learning based method can be helpful at all: With right naming, and depending on the number of training images, having an idea about the classification result takes only minutes.

Three folder option

In the three folder option, the algorithm expects three separate folders with train, validation and test data, respectively.

Below is a sample file hierarchy for a training task involving two classes(cat and dog), and two images for each class in training, validation and test sets. Notice that the naming of the files after - is completely up to user, only rule is that the part before - needs to be the class name for that particular image.

/train-dir
|-- cat-image1.jpg
|-- cat-image2.jpg
|-- dog-image1.jpg
|-- dog-image2.jpg

/validation-dir
|-- cat-image1.jpg
|-- cat-image2.jpg
|-- dog-image1.jpg
|-- dog-image2.jpg

/test-dir
|-- cat-image1.jpg
|-- cat-image2.jpg
|-- dog-image1.jpg
|-- dog-image2.jpg

In both cases, the folders can be Algorithmia Data API collections, S3 collections, or Dropbox folders.

Important note on the cost

The cost of the algorithm highly depends on the amount of time it takes for training, number of images used for training and the augmentation factor used.

The royalty for this algorithm is 50 credits. This is independent of the amount of credits you will spend during computation. Since this algorithm involves training a neural network on the platform, it is reasonable to expect that a significant portion of the credits will be spent for the training time.

For example, when the training algorithm is executed without image augmentation, the user will be charged the royalty credits, 1 credit for each second of training, and 1 credit for each second spent by the helper algorithms to move the files for training. For example, for a training task involving 80 images, no augmentation, 5 epochs and a batch size of 10, we found out that the process takes anywhere from one to three minutes. This means roughly x credits (50 + 180 + (cost of smart image downloader)).

With image augmentation, we need to factor in the cost of image augmentation as well. Our algorithm uses Image Augmentation algorithm on our platform to augment the provided set of images. In this algorithm, there is roughly a linear relationship between the number of samples used in the training and the amount of credits spent. You can expect a usage of 2 credit per 10 samples, plus the amount of time spent on computation. Referring to the example we provided at the Image Augmentation algorithm page, for 100 images and an augmentation factor of 10, we found out that the number of credits spent is ~80. This cost of calling the augmentation algorithm itself will be added to the total cost of the algorithm call. Also, since the Image Augmentation algorithm itself takes some time to run, this will add up to the overall time of the training algorithm, which will also increase the algorithm cost.

Usage

Input

The input to this algorithm is highly flexible, and highly dependent on the use cases. Similar to the previous section, we consider required and optional arguments under one folder and three folder scenarios.

One Folder Case

Required Arguments

  • (Required) A string denoting the folder where train images are located.

Optional Arguments

  • (Required) A string denoting the folder where validation images are located.
  • (Required) A string denoting the folder where test images are located.
  • (Required) An integer denoting the number of training epochs. (Default: 10)
  • (Required) An integer denoting the number of samples per each batch. (Default: 10)
  • (Optional) An augmentation factor denoting the number of augmented images that is to be generated for each input image. (Default: 1(e.g., no augmentation))
  • (Optional) An output folder where the augmented images will be written.
  • (Optional) A boolean flag denoting whether the output should be in verbose mode or the normal mode. (Default: False)
  • (Optional) Learning rate used by the gradient-based optimizer. (Default: 1e-3)
  • (Optional) Momentum used by gradient-based optimizer. (Default: 0.9)
  • (Optional) Weight decay used by gradient-based optimizer. (Default: 5e-4)
  • (Optional) Depth of the ResNet fine-tuned by the algorithm. Possible values are 18, 34, 50, 101 and 152.(Default: 50)

Three Folder Case

Required Arguments

  • (Required) A string denoting the folder where train images are located.
  • (Required) A string denoting the folder where validation images are located.
  • (Required) A string denoting the folder where test images are located.

Optional Arguments

  • (Required) An integer denoting the number of training epochs. (Default: 10)
  • (Required) An integer denoting the number of samples per each batch. (Default: 10)
  • (Optional) An augmentation factor denoting the number of augmented images that is to be generated for each input image.
  • (Optional) An output folder where the augmented images will be written.
  • (Optional) A boolean flag denoting whether the output should be in verbose mode or the normal mode. (Default: False)
  • (Optional) Learning rate used by the gradient-based optimizer. (Default: 1e-3)
  • (Optional) Momentum used by gradient-based optimizer. (Default: 0.9)
  • (Optional) Weight decay used by gradient-based optimizer. (Default: 5e-4)
  • (Optional) Depth of the ResNet fine-tuned by the algorithm. Possible values are 18, 34, 50, 101 and 152.(Default: 50)

Output

In verbose mode

  • Predictions for the test data, paired with the images in the test set.
  • Test classification accuracy
  • Time spent for training
  • A list of validation accuracies across epochs

In normal mode

Predictions for the test data, paired with the images in the test set. If a save_path specified, the algorithm saves the trained model to the specified path for future use.

Examples

Example 1.

The most basic example for the image classifier training algorithm would be the case where there is a single folder. By default, the algorithm will create splits of 70%, 10% and 20% for training, validation and testing data, respectively. The splits can be provided as a list of probabilities(non-negative real numbers summing up to 1, e.g., [0.6, 0.2, 0.2]).

{
        "train_dir": "data://.my/train_path"
}

Output:

{
    test_predictions":
    [["cat-image_1.jpg", "cat"], ["dog-image_1.jpg", "dog"],
    ["boat-image_1.jpg", "yacht"], ["baseball-image_1.jpg", "baseball"],
    ...
    ...
    ...
    ]
}

Example 2.

Many of the standard machine learning algorithms for image classification tasks involve pre-defined train, validation and test sets. Below is the most simple example of this:

{
        "train_dir": "data://.my/train_path",
        "val_dir": "data://.my/val_path",
        "test_dir": "data://.my/test_path"
}

Since num_epochs and batch_size are not provided in above example, they are assigned with the default value. However, this could result in exceeding the maximum runtime allowed by the platform, depending on the size of training, validation and test sets. Hence, one of the first things we may want to adjust would be num_epochs and batch_size as follows:

{
        "train_dir": "data://.my/train_path", 
        "val_dir": "data://.my/val_path",
        "test_dir": "data://.my/test_path",
        "num_epochs": 5,
        "batch_size": 10
}

Output:

{
    test_predictions":
    [["cat-image_1.jpg", "cat"], ["dog-image_1.jpg", "dog"],
    ["boat-image_1.jpg", "yacht"], ["baseball-image_1.jpg", "baseball"],
    ...
    ...
    ...
    ]
}

On the other hand, if verbose flag was set to true, the output would have been as follows:

{
    "test_accuracy": 88.8,
    "test_predictions":
    [["cat-image_1.jpg", "cat"], ["dog-image_1.jpg", "dog"],
    ["boat-image_1.jpg", "yacht"], ["baseball-image_1.jpg", "baseball"],
    ...
    ...
    ...
    ],
    "train_time_elapsed": 163.98720526695251,
    "validation_accuracies": [0.444, 0.756, 0.81, 0.866, 0.894]
}

Example 3.

A more complete example would involve providing a value for optional parameters as well.

{
        "train_dir": "data://.my/train_path",
        "val_dir": "data://.my/val_path",
        "test_dir": "data://.my/test_path",
        "num_epochs": 5,
        "batch_size": 10,
        "augmentation_factor": 1,
        "validation_threshold": 0,
        "save_path": "data://.my/saved_models/",
        "resnet_depth": 50,
        "verbose": True,
        "learning_rate": 1e-3,
        "momentum": 0.99,
        "weight_decay": 9e-5
    }

Output:

{
    "test_accuracy": 67.08333333333333,
    "test_predictions":
    [["cat-image_1.jpg", "cat"], ["dog-image_1.jpg", "dog"],
    ["boat-image_1.jpg", "yacht"], ["baseball-image_1.jpg", "baseball"],
    ...
    ...
    ...
    ],
    "train_time_elapsed": 35.159491777420044,
    "validation_accuracies": [0.20625, 0.26875, 0.30625, 0.48125, 0.6375]
}

Note on save_path

If the save_path parameter is provided, the algorithm will save the trained model under name resnet-X.t7, where X is the resnet_depth parameter. This algorithm saves the state_dict of the model, which is an OrderedDict that contains model parameters(weights). This makes future use easier, as users can implement a ResNet in PyTorch themselves and initialize this ResNet with the trained network's weights. We make it this easy for our users with our ResNetModelRunner algorithm, which takes a pre-trained model and a set of images as input and performs inference on the images.

Attributions

The algorithm uses a ResNet[1] pre-trained on ImageNet dataset. The ResNet implementation and the pre-trained ResNet are parts of torchvision library, and implemented in PyTorch.

torchvision library is copyrighted under BSD 3-Clause license(A copy of the license document can be found at: https://github.com/pytorch/vision/blob/master/LICENSE).

PyTorch library is copyrighted under BSD 3-Clause license(A copy of the license document can be found at: https://github.com/pytorch/pytorch/blob/master/LICENSE).

Parts of this algorithm has been modified from fine-tuning.pytorch library by Bumsoo Kim. This library is copyrighted under MIT License (A copy of the license document can be found at: https://github.com/meliketoy/fine-tuning.pytorch/blob/master/LICENSE)

References

[1] https://arxiv.org/abs/1512.03385

[2] http://www.image-net.org/