ML Zoomcamp 2023 – Machine Learning for Classification– Part 11

Model interpretation

  1. Model interpretation
    1. Look at the coefficients
    2. Train a smaller model with fewer features
    3. Model interpretation

Look at the coefficients

Now, we want to combine each feature with its corresponding coefficient. This involves associating each feature with the weight (coefficient) assigned to it by the logistic regression model.

dv.get_feature_names_out()
# Output:
# array(['contract=month-to-month', 'contract=one_year',
#       'contract=two_year', 'dependents=no', 'dependents=yes',
#       'deviceprotection=no', 'deviceprotection=no_internet_service',
#       'deviceprotection=yes', 'gender=female', 'gender=male',
#       'internetservice=dsl', 'internetservice=fiber_optic',
#       'internetservice=no', 'monthlycharges', 'multiplelines=no',
#       'multiplelines=no_phone_service', 'multiplelines=yes',
#       'onlinebackup=no', 'onlinebackup=no_internet_service',
#       'onlinebackup=yes', 'onlinesecurity=no',
#       'onlinesecurity=no_internet_service', 'onlinesecurity=yes',
#       'paperlessbilling=no', 'paperlessbilling=yes', 'partner=no',
#       'partner=yes', 'paymentmethod=bank_transfer_(automatic)',
#       'paymentmethod=credit_card_(automatic)',
#       'paymentmethod=electronic_check', 'paymentmethod=mailed_check',
#       'phoneservice=no', 'phoneservice=yes', 'seniorcitizen',
#       'streamingmovies=no', 'streamingmovies=no_internet_service',
#       'streamingmovies=yes', 'streamingtv=no',
#       'streamingtv=no_internet_service', 'streamingtv=yes',
#       'techsupport=no', 'techsupport=no_internet_service',
#       'techsupport=yes', 'tenure', 'totalcharges'], dtype=object)

model.coef_[0].round(3)
# Output: 
# array([ 0.475, -0.175, -0.408, -0.03 , -0.078,  0.063, -0.089, -0.081,
#       -0.034, -0.073, -0.335,  0.316, -0.089,  0.004, -0.258,  0.141,
#        0.009,  0.063, -0.089, -0.081,  0.266, -0.089, -0.284, -0.231,
#        0.124, -0.166,  0.058, -0.087, -0.032,  0.07 , -0.059,  0.141,
#       -0.249,  0.215, -0.12 , -0.089,  0.102, -0.071, -0.089,  0.052,
#        0.213, -0.089, -0.232, -0.07 ,  0.   ])

To combine both sets of information, you can use the zip function. This function allows you to pair each feature with its corresponding coefficient.

list(zip(dv.get_feature_names_out(), model.coef_[0].round(3)))

# Output:
# [('contract=month-to-month', 0.475),
# ('contract=one_year', -0.175),
# ('contract=two_year', -0.408),
# ('dependents=no', -0.03),
# ('dependents=yes', -0.078),
# ('deviceprotection=no', 0.063),
# ('deviceprotection=no_internet_service', -0.089),
# ('deviceprotection=yes', -0.081),
# ('gender=female', -0.034),
# ('gender=male', -0.073),
# ('internetservice=dsl', -0.335),
# ('internetservice=fiber_optic', 0.316),
# ('internetservice=no', -0.089),
# ('monthlycharges', 0.004),
# ('multiplelines=no', -0.258),
# ('multiplelines=no_phone_service', 0.141),
# ('multiplelines=yes', 0.009),
# ('onlinebackup=no', 0.063),
# ('onlinebackup=no_internet_service', -0.089),
# ('onlinebackup=yes', -0.081),
# ('onlinesecurity=no', 0.266),
# ('onlinesecurity=no_internet_service', -0.089),
# ('onlinesecurity=yes', -0.284),
# ('paperlessbilling=no', -0.231),
# ('paperlessbilling=yes', 0.124),
# ...
# ('techsupport=no', 0.213),
# ('techsupport=no_internet_service', -0.089),
# ('techsupport=yes', -0.232),
# ('tenure', -0.07),
# ('totalcharges', 0.0)]

Train a smaller model with fewer features

Let’s take a subset of features to train a model with fewer features.

small = ['contract', 'tenure', 'monthlycharges']

df_train[small].iloc[:10]
contracttenuremonthlycharge
0two_year72115.50
1month-to-month1095.25
2month-to-month575.55
3month-to-month580.85
4two_year1820.10
5month-to-month430.50
6month-to-month175.10
7month-to-month170.30
8two_year7219.75
9month-to-month6109.90
df_train[small].iloc[:10].to_dict(orient='records')
# Output:
# [{'contract': 'two_year', 'tenure': 72, 'monthlycharges': 115.5},
# {'contract': 'month-to-month', 'tenure': 10, 'monthlycharges': 95.25},
# {'contract': 'month-to-month', 'tenure': 5, 'monthlycharges': 75.55},
# {'contract': 'month-to-month', 'tenure': 5, 'monthlycharges': 80.85},
# {'contract': 'two_year', 'tenure': 18, 'monthlycharges': 20.1},
# {'contract': 'month-to-month', 'tenure': 4, 'monthlycharges': 30.5},
# {'contract': 'month-to-month', 'tenure': 1, 'monthlycharges': 75.1},
# {'contract': 'month-to-month', 'tenure': 1, 'monthlycharges': 70.3},
# {'contract': 'two_year', 'tenure': 72, 'monthlycharges': 19.75},
# {'contract': 'month-to-month', 'tenure': 6, 'monthlycharges': 109.9}]
dicts_train_small = df_train[small].to_dict(orient='records')
dicts_val_small = df_val[small].to_dict(orient='records')

dv_small = DictVectorizer(sparse=False)
dv_small.fit(dicts_train_small)

# three binary features for the contract variable and two numerical features for 
# monthlycharges and tenure
dv_small.get_feature_names_out()
# Output:
# array(['contract=month-to-month', 'contract=one_year',
#.             'contract=two_year', 'monthlycharges', 'tenure'], dtype=object)
X_train_small = dv_small.transform(dicts_train_small)
model_small = LogisticRegression()
model_small.fit(X_train_small, y_train)

w0 = model_small.intercept_[0]
w0
# Output: -2.476775661122344

w = model_small.coef_[0]
w.round(3)
# Output: array([ 0.97 , -0.025, -0.949,  0.027, -0.036])
dict(zip(dv_small.get_feature_names_out(), w.round(3)))

# Output:
# {'contract=month-to-month': 0.97,
#  'contract=one-year': -0.025,
#  'contract=two_year': -0.949,
#  'monthlycharges': 0.027,
#  'tenure': -0.036}

Model interpretation

Now let’s use the coefficients and score a customer

M 1Y 2Y
-2.47 + ( 1*0.97 + 0*(-0.025) + 0*(-0.949)) CONTRACT (customer has monthly contract)
+ 50*0.027 MONTHLYCHARGES (customer pays $50 per month)
+ 5*(-0.036) TENURE (tenure is 5 months)
= -0.33

sigmoid(-2.47)
# Output: 0.07798823512936635

sigmoid(-2.47+0.97)
# Output: 0.18242552380635632

sigmoid(-2.47 + 0.97 + 50*0.027)
# Output: 0.46257015465625034

sigmoid(-2.47 + 0.97 + 50*0.027 + 5*(-0.036))
# Output: 0.41824062315816374

-2.47 + 0.97 + 50*0.027 + 5*(-0.036)
# Output: -0.3300000000000001

# '_' is a magic variable in Jupyter and means take the output of the last cell
sigmoid(_)
# Output: 0.41824062315816374

We see for this customer the probability of churning is 41.8%.

Let’s calculate the score for another example where the result before applying the sigmoid function is greater than 0, indicating that this customer is more likely to churn. As mentioned, a score greater than 0 implies a higher likelihood of churning, and sigmoid(0) corresponds to a 50% likelihood of churning.

-2.47 + 0.97 + 60*0.027 + 1*(-0.036)
# Output: 0.08399999999999966

sigmoid(_)
# Output: 0.5209876607065322

Let’s calculate the score for one last example.

-2.47 + (-0.949) + 30*0.027 + 24*(-0.036)
# Output: -3.473

sigmoid(_)
# Output: 0.030090303318277657

The actual probability of this customer churning is very low, only 3%.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.