Data Augmentation
In the last section we talked about how to stabilize the network performance while we use dropout. This way we focus on overall shape instead of focussing on details like logos. It works because in each epoch another part of the network (not of the image) is hided/frozen. So small details like logos become more irrelevant.
Data augmentation is another approach for solving this problem which involves generating more images from existing ones. Let’s imagine we take our t-shirt image and generate 10 more images, so then the neural network will not see exactly the same image every time.
Different data augmentations
There are different possible image transformations, that are also combinable:
- Flip an image vertically and horizontally
- Rotate an image
- Shift an image
- Shear ab image (for example only move the upper right and lower right corners)
- Zoom in or out a bit (is like shrinking and extending)
- Change an image in other ways like brightness or contrast
- black patch (This means what I used to illustrate dropout. There is really a black patch that is randomly put on an image. That really hides a part of the image)
In Keras there is an image data generator. There is also a Jupyter notebook in mlbookcamp-code repository under chapter-07-neural-nets/07-augmentations.ipynb.
Training a model with augmentations
train_gen = ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=10,
height_shift_range=10,
shear_range=10,
zoom_range=0.1,
cval=0.0,
horizontal_flip=False,
vertical_flip=True,
)
How to select data augmentations?
- Use your own judgement (does is make sense?) –> For example if you don’t expect to see horizontally flipped images, then this wouldn’t make sense here as well.
- Look at the dataset, what kind of variations are there?
- Are the objects always centered? –> If not you can think about shifting and rotation.
- Tune it as a hyperparameter – Try different augmentations and see what works and what doesn’t.
- Train it with new augmentation for 10-20 epochs. If it’s better then use it, if not then don’t use it. If it’s the same or similar result then train for some more epochs (like 20) and make the comparison again.
Playing around with the parameters of the previous snippet, the working parameters decrease a bit. The parameters can be applied to the training dataset, but we leave the validation dataset unchanged, because we want to have consistent results. Remember the parameter changes happen randomly and cannot be reproduced.
train_gen = ImageDataGenerator(
preprocessing_function=preprocess_input,
shear_range=10,
zoom_range=0.1,
vertical_flip=True,
)
train_ds = train_gen.flow_from_directory(
'./clothing-dataset-small/train',
target_size=(150, 150),
batch_size=32
)
val_gen = ImageDataGenerator(preprocessing_function=preprocess_input)
val_ds = val_gen.flow_from_directory(
'./clothing-dataset-small/validation',
target_size=(150, 150),
batch_size=32,
shuffle=False
)
# Output:
# Found 3068 images belonging to 10 classes.
# Found 341 images belonging to 10 classes.
train_gen = ImageDataGenerator(
preprocessing_function=preprocess_input,
# vertical_flip=True,
)
train_ds = train_gen.flow_from_directory(
'./clothing-dataset-small/train',
target_size=(150, 150),
batch_size=32
)
val_gen = ImageDataGenerator(preprocessing_function=preprocess_input)
val_ds = val_gen.flow_from_directory(
'./clothing-dataset-small/validation',
target_size=(150, 150),
batch_size=32,
shuffle=False
)
learning_rate = 0.001
size = 100
droprate = 0.2
model = make_model(
learning_rate=learning_rate,
size_inner=size,
droprate=droprate
)
history = model.fit(train_ds, epochs=50, validation_data=val_ds)
# Output:
# Epoch 1/50
# 96/96 [==============================] - 146s 1s/step - loss: 1.3221 - accuracy: 0.5645 - val_loss: 0.7635 - val_accuracy: 0.7478
# Epoch 2/50
# 96/96 [==============================] - 145s 2s/step - loss: 0.9227 - accuracy: 0.6868 - val_loss: 0.6985 - val_accuracy: 0.7742
# Epoch 3/50
# 96/96 [==============================] - 140s 1s/step - loss: 0.8057 - accuracy: 0.7210 - val_loss: 0.6810 - val_accuracy: 0.7742
...
# Epoch 49/50
# 96/96 [==============================] - 115s 1s/step - loss: 0.1671 - accuracy: 0.9426 - val_loss: 0.7041 - val_accuracy: 0.8035
# Epoch 50/50
# 96/96 [==============================] - 120s 1s/step - loss: 0.1563 - accuracy: 0.9439 - val_loss: 0.7622 - val_accuracy: 0.7859
When doing augmentation it can happen, that the utilization of the GPU is not >90%. The reason could be that the augmentation is done by the CPU. After generating the new images, the GPU can fit the model. This is done for every epoch. To avoid that there are some complex things. While doing augmenting and fitting in a sequential order (first CPU: augmentation, second GPU: fitting) the CPU can start the second augmentation process right after the first one, then CPU and GPU are more utilized over time. Keras doesn’t do this, but there are ways of doing this. (Google: tensorflow training pipeline with preprocessing -> tensorflow.org/guide/data_performance, and tensorflow.org/guide/data
hist = history.history
plt.plot(hist['val_accuracy'], label='val')
plt.plot(hist['accuracy'], label='train')
plt.legend()

Doing this testing with data augmentation we realize that this is not really helpful in this case, usually it is. Alexey said “Tuning neural networks is more art than science”. For this case here we can go with our untuned network that has an accuracy of around 84% which is sufficient for most of the use cases.