Confusion table / matrix
1. Different types of errors and correct decisions
2. Arranging them in a table

Confusion table / matrix

Different types of errors and correct decisions

In this section, we’ll discuss the confusion matrix, a vital tool for evaluating the performance of binary classification models. The confusion matrix allows us to examine the various errors and correct decisions made by our model.

As we’ve previously discussed, class imbalance can significantly impact the accuracy metric. To address this issue, we need alternative evaluation methods that provide a more comprehensive view of our model’s performance.

The confusion matrix breaks down the model’s predictions into four categories:

True Positives (TP): These are cases where the model correctly predicted the positive class (churning customers).
True Negatives (TN): These are cases where the model correctly predicted the negative class (non-churning customers).
False Positives (FP): These are cases where the model incorrectly predicted the positive class when the true class was negative. This is also known as a Type I error.
False Negatives (FN): These are cases where the model incorrectly predicted the negative class when the true class was positive. This is also known as a Type II error.

g(xi) < t	NEGATIVE – NO CHURN	G(xi) >= t	POSITIVE – CHURN
C didn’t churn	C churned	C didn’t churn	C churned
correct	incorrect	incorrect	correct
TRUE NEGATIVE TN	FALSE NEGATIVE FN	FALSE POSITIVE FP	TRUE POSITIVE TP
g(x_i) < t & y = 0	g(x_i) < t & y = 1	g(x_i) >= t & y = 0	g(x_i) >= t & y = 1

TN, FN, FP, TP for churn project

# people who are going to churn
actual_positive = (y_val == 1)
# people who are not going to churn
actual_negative = (y_val == 0)

t = 0.5
predict_positive = (y_pred >= t)
predict_negative = (y_pred < t)

We examine the cases where both “predict_positive” and “actual_positive” are true. This is precisely what the “&” operator represents, indicating a logical AND operation.

predict_positive & actual_positive
# Output: array([False, False, False, ..., False,  True,  True])

tp = (predict_positive & actual_positive).sum()
tp
# Output: 210
tn = (predict_negative & actual_negative).sum()
tn
# Output: 922

fp = (predict_positive & actual_negative).sum()
fp
# Output: 101
fn = (predict_negative & actual_positive).sum()
fn
# Output: 176

Arranging them in a table

That was preparation for understanding the confusion matrix. The confusion matrix is a way to consolidate all these values (tp, tn, fp, fn) into a single table. This table comprises 4 cells, forming a 2×2 matrix.

In the columns of this table, we have the predictions (NEGATIVE: g(xi) < t and POSITIVE: g(xi) >= t).
In the rows, we have the actual values (NEGATIVE: y=0 and POSITIVE: y=1).

Now, let’s proceed to implement this confusion matrix in NumPy.

confusion_matrix = np.array([
    [tn, fp],
    [fn, tp]
])

confusion_matrix
# Output:
# array([[922, 101],
#             [176, 210]])

	NO CHURN g(x_i)<t NEGATIVE	CHURN g(x_i)>=t POSITIVE
NO CHURN y=0 NEGATIVE	True Negative TN 922 65%	False Positive FP 101 8%
CHURN y=1 POSITIVE	False Negative FN 176 12%	True Positive TP 210 15%

x-axis = Prediction, y-axis = Actual values / Accuracy = 65+15 = 80%

We observe that we have more false negatives than false positives. False positives represent customers who receive the email even though they are not likely to churn, resulting in a loss of money due to unnecessary discounts. False negatives are customers who do not receive the email and end up leaving, causing financial losses as well. Both situations are undesirable.

Instead of using absolute numbers, we can also express these values in relative terms to gain a better perspective on the model’s performance.

(confusion_matrix / confusion_matrix.sum()).round(2)
# Output:
# array([[0.65, 0.07],
#             [0.12, 0.15]])

ML Zoomcamp 2023 – Evaluation metrics for classification– Part 3

Confusion table / matrix

Different types of errors and correct decisions

Arranging them in a table

Leave a comment Cancel reply

Confusion table / matrix

Different types of errors and correct decisions

Arranging them in a table

Teilen mit:

Related

Leave a comment Cancel reply