ML Zoomcamp 2023 – Machine Learning for Classification– Part 1

Churn prediction project

In this project, we’re focusing on churn prediction for a telecommunications company. Consider a telecommunications company with a diverse customer base. Some of these customers are satisfied with the services they receive, while others are not. Those who are dissatisfied are contemplating terminating their contracts and switching to another service provider.

What we want to do?

We want to identify that clients that are willing to churn (= stop using this services)
How could we do this? We give every customer a score between 0 and 1 that tells how likely this customer is going to leave. Then we can send promotional e-mails to this customers – giving some discount to keep this customer.

The way we approach this with Machine Learning is Binary Classification.

Remember the formula for one single observation: g(xi) ~ yi

In this particular case we’re dealing with our target variable yi (tells us whether customer xi left the company or not) and xi that is a feature vector that is describing the ith customer.

Output of our model is yi that is a value between {0, 1} which is the likelihood that this particular customer number i is going to churn.
1 means positive example, customer left the company, 0 means negative example, customer didn’t left the company. So in general 1 means that there is something present (the effect we want to predict is present) and 0 means that it’s not present.

The way we do this is as follows: Let’s say we take the customers from last month, and for each customer, we know who actually left. For those customers, we can set the target label to 1; for the rest, we can set it to 0 because they stayed. All the target labels together become ‘y.’ The information we have about the customers, including demographics, where they live, how much they pay, what kind of services they have, and the type of contract, becomes ‘X.’

So, we want to gather information to determine which factors lead to churn. That’s the main idea of this project. We aim to build a model using historical data, including existing customers that we can score.

For that, we are using a dataset from Kaggle titled ‘Telco Customer Churn – Focused Customer Retention Programs.’ The column we intend to predict is the ‘Churn’ column.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.