Model Building Process In Python
Hello readers, keeping up with the series of articles on learning machine learning with Python, I realised that while we are covering all essential algorithms and concepts, it is important to completely understand the model building process.
In other words, how we build a model, how we go about testing and validating our model on the train and test data. Knowing the process to build a machine learning model is important for any python developer or a python development company. By having a standard set of steps to follow, the model building process can be standardised.
However, having said that model building can be done using a set of steps, it is very important to understand that model building is a highly subjective and individualistic process.
2 people can have different outlooks on the type of algorithm to use and the process to follow for the same dataset. Someone might prefer logistic regression while someone may prefer to use a neural network. Hence, model building is an extremely diverse field.
Through this article we would try and understand the basic steps involved in any model building process. Please note that these steps are not sacrosanct and may vary depending on the kind of application that we are trying to build.
Also Read : Understanding Python For Machine Learning
So, without any further delay, let us begin -
Typically, any machine learning model building involves the following steps -
- Load the data
- Split it into train and test sets
- Build a model
- Fit the model
- Evaluate the model
- Make predictions
As a python developer, you would have understood the basic essence of the steps. However, if you did not understand the above points completely, do not worry. We have got you covered. Let us try and understand each step in detail. Let us begin
LOAD THE DATA
This is the first step in any machine learning algorithm. If you do not load the data, how will you process what is there in it. There are several ways to load data into the code. You can load the data from -
- Local machine
- Google drive
- Kaggle
- 3rd party source
Additionally, you can also access a wide range of file types, including csv, xls etc. Popularly, pandas is used extensively to read the data. (Unless the data is too complex)
SPLIT INTO TRAIN AND TEST
The next step in the model building process is, once the data is loaded, is to split the data into train and test sets. This is required to test how well the model runs on the data that it has not seen. It also helps prevent the problem of overfitting. In this step, we split the data into train and test sets, generally in the ratio 70:30. We train our model on the 70% of the data and then test it on the remaining 30%. By doing so, we assure that the model is not overfitted or underfitted. We can use hyperparameters to change the way our model behaves and can subsequently use the same model for deployment
BUILD A MODEL
By now, we have loaded the data and cleaned it to obtain the train and test sets. The next step in the process is to build a model around the given data. Choosing which model to build is a different ball game and requires different analysis. But for the scope of this article, we will not cover it. We will assume that we know which model we need to build. So once we are done finalising the kind of model we need to build, we will then build the model.
A good practice while building a machine learning model for a python developer is to follow the principles of Object Oriented Programming
FIT THE MODEL
Often, people get confused between building the model and fitting the model. Well, the difference is pretty simple. When you build a model, you are just letting the machine know that this is the model that you will be going ahead with. What the configuration of the model will be and its hyperparameters. When you fit the model, you actually plug the values from the training and test data in order to arrive at the finalised model. The values provided to the models are used to train the model and come up with the coefficients.
So, once we are done building the model, we then use the training data to fit the model.
There are several ways to fit the model. One such popularly known method is called gradient descent. It uses differentials to estimate the minima of the cost function. The value at which the cost is minimum is said to be the optimum model for the given problem. Several other algorithms can also be used to fit the model.
EVALUATE THE MODEL
So far, we have loaded the data, split it into train and test. We have then built the model and then did the fitting of the data into the model. But, how do we now evaluate how well our model is performing. What is the metric we consider in order to understand that the model is performing the way it was meant to be. All such questions get answered at this stage in the model building process. In this stage, we use several different metrics based on the kind of algorithm that we use, in order to evaluate the performance of our machine learning model. There are a lot of metrics being used to do the same. However, some of the most commonly used metrics are -
- Root Mean Squared Error (RMSE)
- Mean Squared Error (MSE)
- Mean Absolute Percentage Error (MAPE)
- Accuracy
- MAKING PREDICTIONS
We now have reached the stage where the actual magic happens. After carefully selecting the parameters for the model and fitting the data to it, we selected the model which aligned with the problem very well, using the metric for evaluation. We now fit the test data in our model to see how well the model is able to predict on the data it has never seen. We check for possible deviations from the normal behavior and make changes to the model if the accuracy is too low (probably a case of overfitting on training data). We try and consider all possibilities to check that we did not miss any case.
Voila, we have reached the end of this article.
We hope that this article will help you understand the basics of model building and can help you in your machine learning journey.
If you want us to cover any specific topic in the posts to come, let us know in the comment section below.
Until next time, bye bye!