ROC AUC – Area under the ROC curve
Useful metric
One way to quantify how close we are to the ideal point is by measuring the area under the ROC curve (AUC). AUC equals 0.5 for a random baseline and 1.0 for an ideal curve. Therefore, our model’s AUC should fall between 0.5 and 1.0. When AUC is less than 0.5, we’ve made a mistake. AUC = 0.8 is considered good, while 0.9 is great, but 0.6 is considered poor. We can calculate AUC using the scikit-learn package. This package is not specifically for roc curves, this is for any curve. It can calculate area under any curve.
from sklearn.metrics import auc
# auc needs values for x-axis and y-axis
auc(fpr, tpr)
# Output: 0.843850505725819
auc(df_scores.fpr, df_scores.tpr)
# Output: 0.8438732975754537
auc(df_ideal.fpr, df_ideal.tpr)
# Output: 0.9999430203759136
fpr, tpr, thresholds = roc_curve(y_val, y_pred)
auc(fpr, tpr)
# Output: 0.843850505725819
There is a shortcut in scikit-learn package
from sklearn.metrics import roc_auc_score
roc_auc_score(y_val, y_pred)
# Output: 0.843850505725819
AUC interpretation
AUC tells us the probability that a randomly selected positive example has a score that is higher than a randomly selected negative example.
neg = y_pred[y_val == 0]
pos = y_pred[y_val == 1]
import random
pos_ind = random.randint(0, len(pos) -1)
neg_ind = random.randint(0, len(neg) -1)
We want to compare the score of this positive example with the score of the negative example.
pos[pos_ind] > neg[neg_ind]
# Output: True
So, for this random example, this is true. We can do this 100,000 times and evaluate the performance.
n = 100000
success = 0
for i in range(n):
pos_ind = random.randint(0, len(pos) -1)
neg_ind = random.randint(0, len(neg) -1)
if pos[pos_ind] > neg[neg_ind]:
success += 1
success / n
# Output: 0.84389
That result is quite close to roc_auc_score(y_val, y_pred) = 0.843850505725819.
Instead of implementing this manually, we can use NumPy. Be aware that in np.random.randint(low, high, size, dtype), ‘low’ is inclusive, and ‘high’ is exclusive.
n = 50000
np.random.seed(1)
pos_ind = np.random.randint(0, len(pos), size=n)
neg_ind = np.random.randint(0, len(neg), size=n)
pos[pos_ind] > neg[neg_ind]
# Output: array([False, True, True, ..., True, True, True])
(pos[pos_ind] > neg[neg_ind]).mean()
# Output: 0.84646
Because of this interpretation, AUC is quite popular as a way of measuring the performance of binary classification models. It’s quite intuitive, and we can use it to assess how well our model ranks positive and negative examples and separates positive examples from negative ones.