Classification – knowMLedge.com

ML Zoomcamp 2023 – Machine Learning for Classification– Part 12

Using the model customerid gender seniorcitizen partner dependents tenure phoneservice multiplelines internetservice onlinesecurity … deviceprotection techsupport streamingtv streamingmovies contract paperlessbilling paymentmethod monthlycharges totalcharges churn 0 5442-pptjy male 0 yes yes 12 yes no no no_internet_service … no_internet_service no_internet_service no_internet_service no_internet_service two_year no mailed_check 19.70 258.35 0 1 6261-rcvns female 0 no no 42 yes noContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 12”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 11

Model interpretation Look at the coefficients Now, we want to combine each feature with its corresponding coefficient. This involves associating each feature with the weight (coefficient) assigned to it by the logistic regression model. To combine both sets of information, you can use the zip function. This function allows you to pair each feature withContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 11”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 10

Training logistic regression with Scikit-Learn Train a model with Scikit-Learn When you want to train a logistic regression model, the process is quite similar to training a linear regression model. You can use the ‘coef_’ attribute to display the weights (coefficients) in a logistic regression model. The ‘coef_’ attribute in logistic regression returns a 2-dimensionalContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 10”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 9

Logistic Regression As mentioned earlier, classification problems can be categorized into binary problems and multi-class problems. Binary problems are the types of problems that logistic regression is typically used to solve. In binary classification, the target variable yiyi belongs to one of two classes: 0 or 1. These classes are often referred to as “negative”Continue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 9”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 8

One-hot encoding One-hot encoding is a technique used in machine learning to convert categorical (non-numeric) data into a numeric format that can be used by machine learning algorithms. It’s particularly useful when working with algorithms that require numerical input, such as many classification and regression models. Scikit-Learn, a popular machine learning library in Python, providesContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 8”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 7

Feature importance: Correlation For measuring feature importance for numerical variables, one common approach is to use the correlation coefficient, specifically Pearson’s correlation coefficient. The Pearson correlation coefficient quantifies the degree of linear dependency between two numerical variables. The correlation coefficient (often denoted as “r”) has a range of -1 to 1: The strength of theContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 7”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 6

Feature importance: Mutual information Indeed, the risk ratio provides valuable insights into the importance of different categorical variables, particularly when examining the likelihood of churn for each value within a variable. For example, when analyzing the “contract” variable with values like “month-to-month,” “one_year,” and “two_years,” we can observe that customers with a “month-to-month” contract areContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 6”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 5

Feature importance: Churn rate and risk ratio Feature importance analysis is a part of exploratory data analysis (EDA) and involves identifying which features affect our target variable. Churn rate Last time, we examined the global churn rate. Now, we are focusing on the churn rate within different groups. For example, we are interested in determiningContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 5”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 4

EDA – Exploratory Data Analysis The topics that we cover in this section are: Checking missing values The following snippet indicates that the dataset ‘df_full_train’ contains no missing values: Looking at the target variable (churn) First what we can check is the distribution of our target variable ‘churn’. How many customers are churning and howContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 4”

ML Zoomcamp 2023 – Machine Learning for Classification– Part 3

Setting up the validation framework Perform the train/validation/test split with Scikit-Learn You can utilize the train_test_split function from the sklearn.model_selection package to automate the splitting of your data into training, validation, and test sets. Before you can use it, make sure to import it first as follows: The train_test_split function divides the dataframe into twoContinue reading “ML Zoomcamp 2023 – Machine Learning for Classification– Part 3”