Regularization and Dropout
When training a model for 10 epochs, then the model will see the same image 10 times. If there is a special sign like a logo, the model recognize it and tend to declare everything as t-shirt that has that logo. That means the model could make many mistakes in classification in validation data. That leads to a worse generalization capability. Instead the model should look at more general things like shapes instead of specific details.
What if we could randomly hide a part of the image? That is the main idea behind dropout. But in dropout we’re not really hiding a part of the image but a part of the input. That means it applies the idea to inner layers.
Let’s assume one fully connected dense layer with four inputs and three outputs. Dropout means that we freeze a part of this layer. That means that the frozen part does not get updated when running the current iteration. In the next iteration another part is frozen. By doing this we force the neural network to focus on the bigger picture (shape instead of details). But the output of the neural network still gets all flares. That means the output layer still looks at all parts – also the frozen ones.
Regularization means that we introduce something that doesn’t let the neural network overfit to some patterns that might not exist. droprate=0.5 means that in each iteration we freeze 50% of this layer. Dropout keeps the dimensionality of the layer.
We’ll look at these points:
- Regularizing by freezing a part of the network
- Adding dropout to our model
- Experimenting with different values
def make_model(learning_rate=0.01, size_inner=100, droprate=0.5):
base_model = Xception(
weights='imagenet',
include_top=False,
input_shape=(150, 150, 3)
)
base_model.trainable = False
#########################################
inputs = keras.Input(shape=(150, 150, 3))
base = base_model(inputs, training=False)
vectors = keras.layers.GlobalAveragePooling2D()(base)
inner = keras.layers.Dense(size_inner, activation='relu')(vectors)
drop = keras.layers.Dropout(droprate)(inner)
outputs = keras.layers.Dense(10)(drop)
model = keras.Model(inputs, outputs)
#########################################
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(
optimizer=optimizer,
loss=loss,
metrics=['accuracy']
)
return model
The downside of dropout is that you’ll need more iterations to learn something. Therefor we change the value from 10 to 30.
learning_rate = 0.001
size = 100
scores = {}
for droprate in [0.0, 0.2, 0.5, 0.8]:
print(droprate)
model = make_model(
learning_rate=learning_rate,
size_inner=size,
droprate=droprate
)
history = model.fit(train_ds, epochs=30, validation_data=val_ds)
scores[droprate] = history.history
print()
print()
# Output:
# 0.0
# Epoch 1/30
# 96/96 [==============================] - 128s 1s/step - loss: 0.9759 - accuracy: 0.6659 - val_loss: 0.6152 - val_accuracy: 0.7859
# Epoch 2/30
# 96/96 [==============================] - 102s 1s/step - loss: 0.5078 - accuracy: 0.8233 - val_loss: 0.6221 - val_accuracy: 0.7859
# Epoch 3/30
# 96/96 [==============================] - 104s 1s/step - loss: 0.3430 - accuracy: 0.8908 - val_loss: 0.5946 - val_accuracy: 0.7977
...
# Epoch 29/30
# 96/96 [==============================] - 103s 1s/step - loss: 0.0145 - accuracy: 0.9971 - val_loss: 0.7599 - val_accuracy: 0.8299
# Epoch 30/30
# 96/96 [==============================] - 103s 1s/step - loss: 0.0158 - accuracy: 0.9974 - val_loss: 0.7727 - val_accuracy: 0.8182
# 0.5
# Epoch 1/30
# 96/96 [==============================] - 110s 1s/step - loss: 1.2975 - accuracy: 0.5593 - val_loss: 0.7498 - val_accuracy: 0.7537
# Epoch 2/30
# 96/96 [==============================] - 101s 1s/step - loss: 0.8512 - accuracy: 0.7053 - val_loss: 0.6457 - val_accuracy: 0.8006
# Epoch 3/30
# 96/96 [==============================] - 102s 1s/step - loss: 0.7035 - accuracy: 0.7487 - val_loss: 0.5934 - val_accuracy: 0.8123
...
# Epoch 29/30
# 96/96 [==============================] - 100s 1s/step - loss: 0.6695 - accuracy: 0.7402 - val_loss: 0.5769 - val_accuracy: 0.8211
# Epoch 30/30
# 96/96 [==============================] - 100s 1s/step - loss: 0.6575 - accuracy: 0.7324 - val_loss: 0.5613 - val_accuracy: 0.8240
for droprate, hist in scores.items():
plt.plot(hist['val_accuracy'], label=('val=%s' % droprate))
plt.ylim(0.78, 0.86)
plt.legend()

hist = scores[0.0]
plt.plot(hist['val_accuracy'], label=0.0)
hist = scores[0.2]
plt.plot(hist['val_accuracy'], label=0.2)
plt.legend()
#plt.plot(hist['accuracy'], label=('val=%s' % droprate))
