imageclassification

imageclassification / ResNetFeatureExtraction / 1.0.0

README.md

Overview

This algorithm can be used to gather pre-trained ResNet[1] representations of arbitrary images. This is a standard feature extraction technique that can be used in many vision applications.

Applicable Scenarios and Problems

Imagine you want to train an image classifier, but you want to go with a linear model instead of a neural network. However, linear models like SVM often fail to learn from raw image data. A common thing to do in such cases is to create a feature vector from raw image, which provides a low-dimensional and noise-resistant way to represent these images. One common feature extraction technique is to feed the image to a conventional pre-trained neural network, and use the representation for that particular image in the intermediate layers of the neural network. To this end, our algorithm extracts representation of given images at the penultimate layer on ResNet models pre-trained on ImageNet[2] dataset.

Usage

Input

  • (Required) A string or list denoting the folder or list of paths where the images are stored.
  • (Optional) Depth of the ResNet used by the algorithm. Possible values are 18, 34, 50, 101 and 152.(Default: 50)

Output

  • Feature vectors as a JSON list of dictionary objects, where the keys are image names, and the values are the vector representations.

Note on output size

The size of the output for image depends on the exact nature of the network used. Here is a table showing different depth options and their corresponding feature vector sizes.

ResNet DepthOutput Vector Size
18512
34512
502048
1012048
1522048

Examples

Example 1

Following is the bare-minimum example with a an image containing as a list

{
        "image_path": "data://.my/training_data"
}

The JSON output would be:

[
    {
        "image_path" :  "data://.my/training_data/image1.jpg"
        "feature_vector": [0.5157874226570129, 0.12963859736919403, 0.8657864928245544, 0.7377209663391113, ... ]
    },
    {
        "image_path" :  "data://.my/training_data/image2.jpg"
        "feature_vector": [0.11067578196525574, 0.5561771988868713, 0.9497724175453186, 0.6040754318237305, ... ]
    },
    {
        "image_path" :  "data://.my/training_data/image3.jpg"
        "feature_vector": [0.7888457775115967, 0.5351834297180176, 0.14679302275180817, 0.818615734577179, ... ]
    },
    {
        "image_path" :  "data://.my/training_data/image4.jpg"
        "feature_vector": [0.4363996684551239, 0.3034343123435974, 0.19017453491687775, 0.6557456851005554, ... ]
    }
]

Example 2

A more detailed example featuring the optional resnet_depth argument and providing a list of images is demonstrated below:

{
        "image_path": ["data://.my/training_data/image1.jpg",
                     "data://.my/training_data/image2.jpg"
                     "data://.my/training_data/image3.jpg"],
        "resnet_depth": 152
}

Similar to the first case, the output will be a list of image and vector pairs.

Attributions

The algorithm uses a ResNet[1] pre-trained on ImageNet[2] dataset. The ResNet implementation and the pre-trained ResNet are parts of torchvision library, and implemented in PyTorch.

torchvision library is copyrighted under BSD 3-Clause license(A copy of the license document can be found at: https://github.com/pytorch/vision/blob/master/LICENSE).

PyTorch library is copyrighted under BSD 3-Clause license(A copy of the license document can be found at: https://github.com/pytorch/pytorch/blob/master/LICENSE).

Parts of this algorithm has been modified from fine-tuning.pytorch library by Bumsoo Kim. This library is copyrighted under MIT License (A copy of the license document can be found at: https://github.com/meliketoy/fine-tuning.pytorch/blob/master/LICENSE)

[1] https://arxiv.org/pdf/1512.03385.pdf

[2] http://www.image-net.org/