Step 3: Enjoy the images for a while
And while you’re at it, take a look on the wonderful Instagram account where I found these cat fusions! There are a lot of images like these.
Step 4: Predict!
This section shows the code for creating the predictions for the images, i.e. what does the model think that it is looking at when presented with these cat-fusion images?
Alright, let’s get down to business! Easy peezy lemon squeezy, we read in our images into the shape that our net expects. For our VGG16, that would be tensors of shape (224, 224, 3). That means images that are 224 pixels wide and high, and 3 channels deep for the three colours in RGB.
import tensorflow as tf import numpy as np import os import keras from tensorflow.keras.preprocessing import image from PIL import Image from tensorflow.keras.applications import VGG16 from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions import matplotlib.pyplot as plt import matplotlib.image as mpimg from collections import defaultdict# Read paths into list dirr = 'Cat-animals' imgs_paths = [os.path.join(dirr,path) for path in os.listdir(dirr) if path.endswith('jpg')]# Read images and resize them (yes, I do love list comprehensions) imgs = [image.img_to_array(image.load_img( path, target_size=(224, 224))) for path in imgs_paths]# Normalize the colour channels imgs= np.asarray([preprocess_input(img) for img in imgs])# Print final tensor shape = (20,224,224,3) print(imgs.shape)
Now, we have our images saved in our imgs-tensor. Let’s predict!
preds = model.predict(imgs) all_top_3_preds = defaultdict()# Pick the top 3 predictions for each image for img_pred in range(20): top_3_preds = decode_predictions(np.expand_dims( preds[img_pred], axis=0), top=3) # Save the top 3 predictions along with the file name all_top_3_preds[imgs_path[img_pred]] = [[top_3_preds[i][j+1] for i in range(3) for j in range(2)]]
Step 5: Evaluate
So, let us see how the VGG16 net classified these images (and remember that the percentages can be interpreted as how sure the model is that the prediction is correct):
My personal favourites are the black fish classified as Hummingbird and Bow Tie, and the giraffe that was classified as a Cheetah.
While it is very easy for us humans to distinguish exactly what has happened in the images, it proved difficult for our dear VGG16 net. None of the top 3 predictions for each image contained “cat”, which actually surprised me! But, when thinking about it a bit more, it makes sense — the model does take a lot of information into consideration when classifying, and while it is easy for us to see “cat” and “squirrel” as two separate objects that have been fused, the model will try to make sense of these things together.
In fact, the model assumes that all the data that it sees is correct. It would never dream of us trying to fool it, and as such, it tries to make sense out of all the information that it has. For the model, the majority of the information in most of the pictures point to one animal, or at least one type of animal. The fact that these photoshopped animals are completely out of the realms of reality is not considered, although in some cases the predictions get a bit confused (such as the cat-giraffe, where it sees a cat face and the giraffe pattern and draws the conclusion Cheetah).
An important consideration is that the model that you have built, only knows what it has previously seen. And that leads me to my point:
It is so important to keep track of the data quality, not only when building your model, but also when using the model. Just because you trained a model on clean data, does not mean that it will perform well on dirty data afterwards. Continuous controls of data, model performance and model verification is of the utmost importance — we cannot have a model that predicts “hummingbird” and “bow tie” where the obvious answers are “fish” and possibly “cat”.
Step 6 (optional): Create heatmaps
One of the perks of working with CNNs is that we actually can visualize what the model has activated on when classifying the image. Let us look closer at that!
This section is very much inspired by Chollet’s notebook on heatmaps, that you can find on this link. This is mostly for fun, and the analysis above persists.
Essentially what happens in this code, is that we are looking at the gradients of the convolutional layers in the model. These gradients give us a hint on how much the model looks at the different part of an image when classifying. Intuitively, this can be thought of as “of how much importance is this specific part of the image for the model prediction?”, where the colour red indicates important.
# We start by looking at the summary of the model. We do this partly # as a reminder, but also to later get the names of the different # layers model.summary() layer = 'block5_conv3'# Let us loop through each of the cat images for i,img_path in enumerate(imgs_paths): idx = np.argmax(preds[i]) # This is the output entry in the prediction vector output = model.output[:, idx] # The is the output feature map of the `block5_conv3` layer, # the last convolutional layer in VGG16 last_conv_layer = model.get_layer(layer) # This is the gradient of the "african elephant" class with # regard to the output feature map of `block5_conv3` grads = K.gradients(output, last_conv_layer.output) # This is a vector of shape (512,), where each entry # is the mean intensity of the gradient over a specific feature # map channel pooled_grads = K.mean(grads, axis=(0, 1, 2)) # This function allows us to access the values of the # quantities we just defined: # `pooled_grads` and the output feature map of `block5_conv3`, # given a sample image iterate = K.function([model.input], [pooled_grads, last_conv_layer.output]) # These are the values of these two quantities, as Numpy arrays # given our sample image of two elephants x = imgs[i] x = np.expand_dims(x, axis=0) pooled_grads_value, conv_layer_output_value = iterate([x]) # We multiply each channel in the feature map array # by "how important this channel is" with regard to the elephant # class for i in range(pooled_grads.shape): conv_layer_output_value[:, :, i] *= pooled_grads_value[i] # The channel-wise mean of the resulting feature map # is our heatmap of class activation heatmap = np.mean(conv_layer_output_value, axis=-1) heatmap = np.maximum(heatmap, 0) heatmap /= np.max(heatmap) # We use cv2 to load the original image, I loaded it as a # gray-scale image as I had a hard time seeing the heatmap # otherwise img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) img = np.expand_dims(img, axis=2) # We resize the heatmap to have the same size as the original # image heatmap = cv2.resize(heatmap, (img.shape, img.shape)) # We convert the heatmap to RGB heatmap = np.uint8(255 * heatmap) # We apply the heatmap to the original image heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET) # 0.5 here is a heatmap intensity factor superimposed_img = heatmap * 0.5 + img # Save the image to disk cv2.imwrite('Cat-animals/heatmaps/'+ img_path[12:-4]+'_heatmap_'+ layer+'.jpg', superimposed_img)
Alright, now we only looked at the last convolutional layer in the model. If we repeat these steps for each layer, and combine the resulting images into a GIF, we get a better overview of the model and how it activates during the iterations! Let us look at a few.
We now see that the model does indeed consider several parts of each image during the layer iterations, starting with edges and moving on to larger areas. The first layers extract local features, such as edges and patterns, and with each layer, more global features and patterns are taken into consideration. At the end, it seems like the model in the majority of the cases discards the “cat-face”-feature as unimportant for the classification, and focuses on the areas around it instead. This shows that the model essentially sees what it wants to see — it does not care that a detail such as the face does not cohere with what is has seen before.
Finishing this little project, I was curious of how some of the predictions were wrong even when the model ignored the cat face. As an example, a quick Google image search for Flatworm, that the Octopus was predicted as, shows that the model must have confused the Octopus arms with the sides of a Flatworm— it is possible that this would happen with the original image as well. And the obvious cat faces? Apparently not so obvious for the VGG16 net.