Model interpretation
Look at the coefficients
Now, we want to combine each feature with its corresponding coefficient. This involves associating each feature with the weight (coefficient) assigned to it by the logistic regression model.
dv.get_feature_names_out()
# Output:
# array(['contract=month-to-month', 'contract=one_year',
# 'contract=two_year', 'dependents=no', 'dependents=yes',
# 'deviceprotection=no', 'deviceprotection=no_internet_service',
# 'deviceprotection=yes', 'gender=female', 'gender=male',
# 'internetservice=dsl', 'internetservice=fiber_optic',
# 'internetservice=no', 'monthlycharges', 'multiplelines=no',
# 'multiplelines=no_phone_service', 'multiplelines=yes',
# 'onlinebackup=no', 'onlinebackup=no_internet_service',
# 'onlinebackup=yes', 'onlinesecurity=no',
# 'onlinesecurity=no_internet_service', 'onlinesecurity=yes',
# 'paperlessbilling=no', 'paperlessbilling=yes', 'partner=no',
# 'partner=yes', 'paymentmethod=bank_transfer_(automatic)',
# 'paymentmethod=credit_card_(automatic)',
# 'paymentmethod=electronic_check', 'paymentmethod=mailed_check',
# 'phoneservice=no', 'phoneservice=yes', 'seniorcitizen',
# 'streamingmovies=no', 'streamingmovies=no_internet_service',
# 'streamingmovies=yes', 'streamingtv=no',
# 'streamingtv=no_internet_service', 'streamingtv=yes',
# 'techsupport=no', 'techsupport=no_internet_service',
# 'techsupport=yes', 'tenure', 'totalcharges'], dtype=object)
model.coef_[0].round(3)
# Output:
# array([ 0.475, -0.175, -0.408, -0.03 , -0.078, 0.063, -0.089, -0.081,
# -0.034, -0.073, -0.335, 0.316, -0.089, 0.004, -0.258, 0.141,
# 0.009, 0.063, -0.089, -0.081, 0.266, -0.089, -0.284, -0.231,
# 0.124, -0.166, 0.058, -0.087, -0.032, 0.07 , -0.059, 0.141,
# -0.249, 0.215, -0.12 , -0.089, 0.102, -0.071, -0.089, 0.052,
# 0.213, -0.089, -0.232, -0.07 , 0. ])
To combine both sets of information, you can use the zip function. This function allows you to pair each feature with its corresponding coefficient.
list(zip(dv.get_feature_names_out(), model.coef_[0].round(3)))
# Output:
# [('contract=month-to-month', 0.475),
# ('contract=one_year', -0.175),
# ('contract=two_year', -0.408),
# ('dependents=no', -0.03),
# ('dependents=yes', -0.078),
# ('deviceprotection=no', 0.063),
# ('deviceprotection=no_internet_service', -0.089),
# ('deviceprotection=yes', -0.081),
# ('gender=female', -0.034),
# ('gender=male', -0.073),
# ('internetservice=dsl', -0.335),
# ('internetservice=fiber_optic', 0.316),
# ('internetservice=no', -0.089),
# ('monthlycharges', 0.004),
# ('multiplelines=no', -0.258),
# ('multiplelines=no_phone_service', 0.141),
# ('multiplelines=yes', 0.009),
# ('onlinebackup=no', 0.063),
# ('onlinebackup=no_internet_service', -0.089),
# ('onlinebackup=yes', -0.081),
# ('onlinesecurity=no', 0.266),
# ('onlinesecurity=no_internet_service', -0.089),
# ('onlinesecurity=yes', -0.284),
# ('paperlessbilling=no', -0.231),
# ('paperlessbilling=yes', 0.124),
# ...
# ('techsupport=no', 0.213),
# ('techsupport=no_internet_service', -0.089),
# ('techsupport=yes', -0.232),
# ('tenure', -0.07),
# ('totalcharges', 0.0)]
Train a smaller model with fewer features
Let’s take a subset of features to train a model with fewer features.
small = ['contract', 'tenure', 'monthlycharges']
df_train[small].iloc[:10]
| contract | tenure | monthlycharge | |
|---|---|---|---|
| 0 | two_year | 72 | 115.50 |
| 1 | month-to-month | 10 | 95.25 |
| 2 | month-to-month | 5 | 75.55 |
| 3 | month-to-month | 5 | 80.85 |
| 4 | two_year | 18 | 20.10 |
| 5 | month-to-month | 4 | 30.50 |
| 6 | month-to-month | 1 | 75.10 |
| 7 | month-to-month | 1 | 70.30 |
| 8 | two_year | 72 | 19.75 |
| 9 | month-to-month | 6 | 109.90 |
df_train[small].iloc[:10].to_dict(orient='records')
# Output:
# [{'contract': 'two_year', 'tenure': 72, 'monthlycharges': 115.5},
# {'contract': 'month-to-month', 'tenure': 10, 'monthlycharges': 95.25},
# {'contract': 'month-to-month', 'tenure': 5, 'monthlycharges': 75.55},
# {'contract': 'month-to-month', 'tenure': 5, 'monthlycharges': 80.85},
# {'contract': 'two_year', 'tenure': 18, 'monthlycharges': 20.1},
# {'contract': 'month-to-month', 'tenure': 4, 'monthlycharges': 30.5},
# {'contract': 'month-to-month', 'tenure': 1, 'monthlycharges': 75.1},
# {'contract': 'month-to-month', 'tenure': 1, 'monthlycharges': 70.3},
# {'contract': 'two_year', 'tenure': 72, 'monthlycharges': 19.75},
# {'contract': 'month-to-month', 'tenure': 6, 'monthlycharges': 109.9}]
dicts_train_small = df_train[small].to_dict(orient='records')
dicts_val_small = df_val[small].to_dict(orient='records')
dv_small = DictVectorizer(sparse=False)
dv_small.fit(dicts_train_small)
# three binary features for the contract variable and two numerical features for
# monthlycharges and tenure
dv_small.get_feature_names_out()
# Output:
# array(['contract=month-to-month', 'contract=one_year',
#. 'contract=two_year', 'monthlycharges', 'tenure'], dtype=object)
X_train_small = dv_small.transform(dicts_train_small)
model_small = LogisticRegression()
model_small.fit(X_train_small, y_train)
w0 = model_small.intercept_[0]
w0
# Output: -2.476775661122344
w = model_small.coef_[0]
w.round(3)
# Output: array([ 0.97 , -0.025, -0.949, 0.027, -0.036])
dict(zip(dv_small.get_feature_names_out(), w.round(3)))
# Output:
# {'contract=month-to-month': 0.97,
# 'contract=one-year': -0.025,
# 'contract=two_year': -0.949,
# 'monthlycharges': 0.027,
# 'tenure': -0.036}
Model interpretation
Now let’s use the coefficients and score a customer
M 1Y 2Y
-2.47 + ( 1*0.97 + 0*(-0.025) + 0*(-0.949)) CONTRACT (customer has monthly contract)
+ 50*0.027 MONTHLYCHARGES (customer pays $50 per month)
+ 5*(-0.036) TENURE (tenure is 5 months)
= -0.33
sigmoid(-2.47)
# Output: 0.07798823512936635
sigmoid(-2.47+0.97)
# Output: 0.18242552380635632
sigmoid(-2.47 + 0.97 + 50*0.027)
# Output: 0.46257015465625034
sigmoid(-2.47 + 0.97 + 50*0.027 + 5*(-0.036))
# Output: 0.41824062315816374
-2.47 + 0.97 + 50*0.027 + 5*(-0.036)
# Output: -0.3300000000000001
# '_' is a magic variable in Jupyter and means take the output of the last cell
sigmoid(_)
# Output: 0.41824062315816374
We see for this customer the probability of churning is 41.8%.
Let’s calculate the score for another example where the result before applying the sigmoid function is greater than 0, indicating that this customer is more likely to churn. As mentioned, a score greater than 0 implies a higher likelihood of churning, and sigmoid(0) corresponds to a 50% likelihood of churning.
-2.47 + 0.97 + 60*0.027 + 1*(-0.036)
# Output: 0.08399999999999966
sigmoid(_)
# Output: 0.5209876607065322
Let’s calculate the score for one last example.
-2.47 + (-0.949) + 30*0.027 + 24*(-0.036)
# Output: -3.473
sigmoid(_)
# Output: 0.030090303318277657
The actual probability of this customer churning is very low, only 3%.