Logistic Regression In Machine Learning

Winklix LLC
4 min readJul 30, 2020

--

Hello readers, in this article, we will study about a very commonly used classification algorithm in machine learning, known as logistic regression. Logistic regression is used in predictions where we have a binary output for the dependent variable. It involves modelling the dependent variables as a linear combination of the independent variables. Some of the common use cases of classification algorithms involve — given a set of parameters, predicting employee attrition in an organisation, whether a person has a disease or not, whether it will rain on a given day etc. The list of possible use cases is endless. The point to take away from this is — We use logistic regression to model a binary (yes/no) output to the data.

In order to understand this article better, we suggest you understand the basics of logistic regression and then go through the implementation. We have listed down the core concepts required to understand the concept behind logistic regression below -

  1. Conditional Probability and odds
  2. Bayes Theorem
  3. Basic mathematical functions like logarithm and exponent
  4. Sigmoid function
  5. Decision boundary
  6. Cost function
  7. Confusion Matrix
  8. ROC and AUC curves

Assuming that you have a base clarity on the terms mentioned above, we will now deep dive into how do we go about implementing a logistic regression model in python. For the analysis of the data, we would use “Employee Attrition” data for analysis.

Also Read : What Is Machine Learning & Its Applicaiton ?

So without any further ado, let us begin -

  1. Importing Python Packages
  1. Loading the data

Once we are done importing the desired packages, we will now load the data.

  1. Exploratory Analysis

As we can see, the data shows that roughly 80% of the employees in the organisation stayed and 20% left.

Upon further exploration, we see that the attrition is maximum in the HR department of the organisation, followed by accounting and technical departments

  1. Data Encoding

Since logistic regression does not work on categorical data, we need to encode them in order to fit the model. There are several ways to encode a categorical data. However, we have used label encoding in the code.

  1. Feature Elimination

Data contains a lot of features. However, not all of them may be good predictors. In other words, they may not influence the output. In such a scenario, we need to eliminate such features. One such technique to do the same is called — Recursive Feature Elimination (RFE). Understanding the concept behind RFE will go beyond the scope of this article, hence we will stick to its implementation alone.

  1. Model Fitting

Once we have a list of features that we need to use, we will now fit our logistic regression model to the data.

We again fit the data to the model, this time using sklearn package. To do that, we split the data into 70% train and 30% test data.

We then use the model to predict the values.

Once we gave fit the data into the model, we would use 10-fold cross validation method to increase the accuracy of our model

  1. Confusion Matrix

Printing the confusion matrix to evaluate model results.

  1. ROC and AUC Curve

A Receiver Operating Characteristic curve (ROC) is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

AUC stands for Area Under Curve. It is the integration sum of all the values inside a ROC. The higher this value, the better the model.

That’s it!

You have successfully implemented a logistic regression model in python.

Until next time, bye bye!

--

--

Winklix LLC

Digital Transformation | Mobile App Development | SAP Consultant | Salesforce Consultancy Service