Precision & Recall
Precision and Recall are essential metrics for evaluating binary classification models.
Precision measures the fraction of positive predictions that were correct. In other words, it quantifies how accurately the model predicts customers who are likely to churn.
Precision = True Positives / (# Positive Predictions) = True Positives / (True Positives + False Positives)
Recall, on the other hand, quantifies the fraction of actual positive cases that were correctly identified by the model. It assesses how effectively the model captures all customers who are actually churning.
Recall = True Positives / (# Positive Observations) = True Positives / (True Positives + False Negatives)
In summary, precision focuses on the accuracy of positive predictions, while recall emphasizes the model’s ability to capture all positive cases. These metrics are crucial for understanding the trade-offs between correctly identifying churning customers and minimizing false positives.
| Actual Values | Negative Predictions | positive Predictions |
|---|---|---|
| g(xi)<t | g(xi)>=t | |
| Negative Example y=0 | TN | FP |
| Positive Example y=1 | FN | TP |
accuracy = (tp + tn) / (tp + tn + fp + fn)
accuracy
# Output: 0.8034066713981547
precision = tp / (tp + fp)
precision
# Output: 0.6752411575562701
# --> promotional email goes to 311 people, but 210 are actually going to churn (--> 33% are mistakes)
tp + fp
# Output: 210
recall = tp / (tp + fn)
recall
# Output: 0.5440414507772021
# --> For 46% of people who are churning we failed to identify them
tp + fn
# Output: 386
While accuracy can give a misleading impression of a model’s performance, metrics like precision and recall are much more informative, especially in situations with class imbalance. Precision and recall provide a more detailed understanding of how well the model is performing in identifying positive cases (in this case, churning customers).
In scenarios where correctly identifying specific cases is critical, such as identifying churning customers to prevent loss, precision and recall help us make more informed decisions and assess the trade-offs between correctly identifying positives and minimizing false positives or false negatives. So, relying solely on accuracy may not provide a complete picture of a model’s effectiveness for a particular task.