ML Zoomcamp 2023 – Deep Learning – Part 4

In this part we’ll use a pre-trained convolutional neural network to understand what is on the image we load previously.

Pre-trained convolutional neural networks

This time we want to take an image and an off-the-shelf neural network that was already trained by somebody, so we can use it. Now we want to use a special model called “Xception” from Keras which was trained on ImageNet. You can find more pre-trained models on Keras. Before defining the model we need some imports first.

from tensorflow.keras.applications.xception import Xception
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.applications.xception import decode_predictions

# weights = "imagenet" means we want to use pre-trained network that was trained on imagenet

model = Xception(
    weights="imagenet",
    input_shape=(299, 299, 3)
)

Now we want to use this model to classify the image, we used before. But this time the model.predict function expects a bunch of images. So let’s create an array with possibly multiple images. In this case it is just one.

X = np.array([x])
X.shape

# Output: (1, 299, 299, 3)

To do the prediction, we need some preprocessing before. This model expects inputs in a certain way using the preprocess_input function.

X = preprocess_input(X)
X[0]

# Output:
# array([[[ 0.4039216 ,  0.3411765 , -0.2235294 ],
#            [ 0.4039216 ,  0.3411765 , -0.2235294 ],
#            [ 0.41960788,  0.35686278, -0.20784312],
#            ...,
#            [ 0.96862745,  0.9843137 ,  0.94509804],
#            [ 0.96862745,  0.9843137 ,  0.94509804],
#            [ 0.96862745,  0.99215686,  0.9372549 ]],
#
#            [[ 0.47450984,  0.4039216 , -0.12156862],
#            [ 0.4666667 ,  0.39607847, -0.12941176],
#            [ 0.45882356,  0.38823533, -0.15294117],
#             ...,
#            [ 0.96862745,  0.9764706 ,  0.9372549 ],
#            [ 0.96862745,  0.9764706 ,  0.9372549 ],
#            [ 0.96862745,  0.9764706 ,  0.92941177]],
#
#            [[ 0.56078434,  0.48235297, -0.00392157],
#            [ 0.5686275 ,  0.4901961 ,  0.00392163],
#            [ 0.5686275 ,  0.49803925, -0.01176471],
#            ...,
#            [ 0.9607843 ,  0.96862745,  0.92156863],
#            [ 0.9607843 ,  0.96862745,  0.92156863],
#            [ 0.9607843 ,  0.96862745,  0.92156863]],
#
#             ...,
#      ...
#            [ 0.3411765 ,  0.2313726 , -0.35686272],
#            ...,
#            [ 0.41960788,  0.04313731, -0.827451  ],
#            [ 0.4039216 ,  0.02745104, -0.84313726],
#            [ 0.427451  ,  0.05098045, -0.81960785]]], dtype=float32)
pred = model.predict(X)

# 1/1 [==============================] - 2s 2s/step
pred.shape

# (1, 1000)

This 1000 means that there are 1000 different classes and 1 means there is one image.

pred

# Output:
# array([[3.23712389e-04, 1.57383955e-04, 2.13493346e-04, 1.52370616e-04,
#            2.47626507e-04, 3.05036228e-04, 3.20592342e-04, 1.47499406e-04,
#    ...
#            2.07101941e-04, 2.05870383e-04, 4.28847765e-04, 1.33218389e-04,
#            1.12896028e-04, 1.57900504e-04, 1.94431108e-04, 2.63790804e-04,
#            3.20827705e-04, 2.70084536e-04, 3.43746680e-04, 2.48680328e-04,
#            2.78319319e-04, 3.25885747e-04, 1.71753796e-04, 1.73037348e-04]],
#           dtype=float32)

Each value is the probability that this image belongs to some class. To be able to make sense from this output, we need to know what are the classes. Therefor we need another function called decode_predictions to make the prediction human readable.

decode_predictions(pred)

# Output:
# [[('n03595614', 'jersey', 0.6819631),
#   ('n02916936', 'bulletproof_vest', 0.038140077),
#   ('n04370456', 'sweatshirt', 0.034324776),
#   ('n03710637', 'maillot', 0.011354236),
#   ('n04525038', 'velvet', 0.0018453619)]]

In real this image is a t-shirt, but ImageNet is not very good when it comes to clothes detection. That means it doesn’t really work for our purpose here. That means we need to train a different model with the classes we need for our case. Good point here, we don’t have to retrain the model from scratch. We can reuse this model. That means we can build on top of what big companies or universities have provided and adapt to our specific use case.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.