Feature importance: Mutual information
Indeed, the risk ratio provides valuable insights into the importance of different categorical variables, particularly when examining the likelihood of churn for each value within a variable. For example, when analyzing the “contract” variable with values like “month-to-month,” “one_year,” and “two_years,” we can observe that customers with a “month-to-month” contract are more likely to churn compared to those with a “two_years” contract. This suggests that the “contract” variable is likely to be an important factor in predicting churn. However, without a way to compare this importance with other variables, we may not have a clear understanding of its relative significance.
Mutual information, a concept from information theory, addresses this issue by quantifying how much we can learn about one variable when we know the value of another. The higher the mutual information, the more information we gain about churn by observing the value of another variable. In essence, it provides a means to measure the importance of categorical variables and their values in predicting churn, allowing us to compare their significance relative to one another.
from sklearn.metrics import mutual_info_score
mutual_info_score(df_full_train.churn, df_full_train.contract)
# order is not important
#mutual_info_score(df_full_train.contract, df_full_train.churn)
# Output: 0.0983203874041556
mutual_info_score(df_full_train.churn, df_full_train.gender)
# Output: 0.0001174846211139946
mutual_info_score(df_full_train.churn, df_full_train.partner)
# Output: 0.009967689095399745
Once more: The intuition here is how much we learn about churn by observing the value of the contract variable or any other variable and vice versa. We observe, for example, that the gender variable is not particularly informative.
At all we can learn about the relative importance of the features. What we can do now, we can apply this metric to all the categorical variables and see which one has the highest mutual information.
The apply function takes a function with one argument, but mutual_info_score requires two arguments. That’s why we need to implement the mutual_info_churn_score function, which can be applied to the dataframe to compute mutual information scores column-wise.
def mutual_info_churn_score(series):
return mutual_info_score(series, df_full_train.churn)
mi = df_full_train[categorical].apply(mutual_info_churn_score)
mi
# Output:
# gender 0.000117
# seniorcitizen 0.009410
# partner 0.009968
# dependents 0.012346
# phoneservice 0.000229
# multiplelines 0.000857
# internetservice 0.055868
# onlinesecurity 0.063085
# onlinebackup 0.046923
# deviceprotection 0.043453
# techsupport 0.061032
# streamingtv 0.031853
# streamingmovies 0.031581
# contract 0.098320
# paperlessbilling 0.017589
# paymentmethod 0.043210
# dtype: float64
To arrange this list in such a way that the most important variables come first, we can sort the variables based on their mutual information scores in descending order. This way, the most important variables will appear at the top of the list.
mi.sort_values(ascending=False)
# Output:
# contract 0.098320
# onlinesecurity 0.063085
# techsupport 0.061032
# internetservice 0.055868
# onlinebackup 0.046923
# deviceprotection 0.043453
# paymentmethod 0.043210
# streamingtv 0.031853
# streamingmovies 0.031581
# paperlessbilling 0.017589
# dependents 0.012346
# partner 0.009968
# seniorcitizen 0.009410
# multiplelines 0.000857
# phoneservice 0.000229
# gender 0.000117
# dtype: float64
Using this approach, we can gain a better understanding of which variables are highly informative for our analysis and which are less so.