Credit Risk Scoring Project The project for this week involves credit risk scoring. Imagine you want to buy a mobile phone, so you visit your bank to apply for a loan. You fill out an application form that requests various details, such as your income, the price of the phone, and the loan amount youContinue reading “ML Zoomcamp 2023 – Decision Trees and Ensemble Learning– Part 1”
Tag Archives: Classification
ML Zoomcamp 2023 – Evaluation metrics for classification– Part 2
Accuracy and Dummy Model In the last article, we calculated that our model achieved an accuracy of 80% on the validation data. Now, let’s determine whether this is a good value or not. Accuracy measures the fraction of correct predictions made by the model. In our evaluation, we checked each customer in the validation datasetContinue reading “ML Zoomcamp 2023 – Evaluation metrics for classification– Part 2”
ML Zoomcamp 2023 – Evaluation metrics for classification– Part 1
Overview Today’s post recaps all the important lines of code that are crucial for the rest of this chapter. This includes the necessary imports, data preparation, data splitting for training, validation, and testing, separating the target variable ‘churn’, training the logistic regression model, and finally, validating the model on the validation data and outputting theContinue reading “ML Zoomcamp 2023 – Evaluation metrics for classification– Part 1”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 12
Using the model customerid gender seniorcitizen partner dependents tenure phoneservice multiplelines internetservice onlinesecurity … deviceprotection techsupport streamingtv streamingmovies contract paperlessbilling paymentmethod monthlycharges totalcharges churn 0 5442-pptjy male 0 yes yes 12 yes no no no_internet_service … no_internet_service no_internet_service no_internet_service no_internet_service two_year no mailed_check 19.70 258.35 0 1 6261-rcvns female 0 no no 42 yes noContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 12”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 11
Model interpretation Look at the coefficients Now, we want to combine each feature with its corresponding coefficient. This involves associating each feature with the weight (coefficient) assigned to it by the logistic regression model. To combine both sets of information, you can use the zip function. This function allows you to pair each feature withContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 11”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 10
Training logistic regression with Scikit-Learn Train a model with Scikit-Learn When you want to train a logistic regression model, the process is quite similar to training a linear regression model. You can use the ‘coef_’ attribute to display the weights (coefficients) in a logistic regression model. The ‘coef_’ attribute in logistic regression returns a 2-dimensionalContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 10”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 9
Logistic Regression As mentioned earlier, classification problems can be categorized into binary problems and multi-class problems. Binary problems are the types of problems that logistic regression is typically used to solve. In binary classification, the target variable yiyi belongs to one of two classes: 0 or 1. These classes are often referred to as “negative”Continue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 9”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 8
One-hot encoding One-hot encoding is a technique used in machine learning to convert categorical (non-numeric) data into a numeric format that can be used by machine learning algorithms. It’s particularly useful when working with algorithms that require numerical input, such as many classification and regression models. Scikit-Learn, a popular machine learning library in Python, providesContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 8”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 7
Feature importance: Correlation For measuring feature importance for numerical variables, one common approach is to use the correlation coefficient, specifically Pearson’s correlation coefficient. The Pearson correlation coefficient quantifies the degree of linear dependency between two numerical variables. The correlation coefficient (often denoted as “r”) has a range of -1 to 1: The strength of theContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 7”
ML Zoomcamp 2023 – Machine Learning for Classification– Part 6
Feature importance: Mutual information Indeed, the risk ratio provides valuable insights into the importance of different categorical variables, particularly when examining the likelihood of churn for each value within a variable. For example, when analyzing the “contract” variable with values like “month-to-month,” “one_year,” and “two_years,” we can observe that customers with a “month-to-month” contract areContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 6”