ML Zoomcamp 2023 – Decision Trees and Ensemble Learning– Part 1

Credit Risk Scoring Project

The project for this week involves credit risk scoring. Imagine you want to buy a mobile phone, so you visit your bank to apply for a loan. You fill out an application form that requests various details, such as your income, the price of the phone, and the loan amount you need. The bank evaluates your application and assigns a score, ultimately deciding whether to approve or decline your request with a ‘yes’ or ‘no’ response.

In this chapter, our goal is to build a model that the bank can utilize to make informed decisions about lending money to customers. The bank can provide the model with customer information, and in return, the model will generate a risk score, indicating the likelihood of a customer defaulting on the loan. This risk score enables the bank to make well-informed lending decisions.

Our approach involves analyzing historical data from various customers and their loan applications. For each case, we have information about the requested loan amount and whether the customer successfully repaid the loan or defaulted.

For instance:

  • Customer A –> OK
  • Customer B –> OK
  • Customer C –> DEFAULT
  • Customer D –> DEFAULT
  • Customer E –> OK

This problem can be framed as binary classification, where ‘y’ represents the target variable, and it can take on two values: 0 (OK) or 1 (DEFAULT). Our objective is to train a model to predict, for each new customer, the probability that they will default:

g(xi) –> PROBABILITY OF DEFAULT

We have ‘X,’ which encompasses all the customer information, and the target variable ‘y,’ which indicates the default probability.

Dataset: You can find the dataset at this link. The ‘Status’ variable in the dataset denotes whether the customer defaulted or not.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.