Simple Linear Regression Algorithm in Python using Numpy,Pandas and Sci-kit learn


Share this post



Simple Linear Regression is a basic Machine Learning algorithm that deals with a variety of output based on only one independent variable.  Before go deep into Simple Linear Regression, Let us understand some basics of coordinate geometry.

Image Courtesy of Unsplash


Independent Variable (IV)

        It is variable, that didn't depend on any other value or parameter. It is represented using the x.

Dependent Variable (DV)

          It is variable, that depends on many other parameters or values. It is represented by y.

Mathematical Equation for Simple Linear Regression

                     m=kx+l
             - Dependent variable
                   x- Independent variable
                   k- The slope of the curve
The following diagram shows how simple linear regression fits the data, which are distributed along with different directions on the X-axis and Y-axis.

                          

Simple Linear Regression Algorithm in python

   Requirements:
      Python Environment(Spyder or Pycharm) or Google Colab Environment, Numpy, Pandas.sci-kit learn libraries.

How to install Numpy and Pandas Libraries in python?

For Google Colab:

  1. Goto Google Colab link provided in step 2 of the algorithm
  2.  Sign with your Google account
  3. create New Notebook
  4. Goto runtime section and select runtime type (like GPU or TPU  or None)
  5. Write code  in cells and execute 

Step 1(Importing Required Libraries)

   Import Required Libraries like NumPy, Pandas, and Matplotlib

Step 2 (Making setup for Coding)

Here I am using Google Colab to demonstrate purpose, you can use offline environments like Anaconda, Pycham., etc
 
         Open Google Colab by clicking on below link

             Goto Google Colab

Step 3 (Dataset Downloading)

After downloading the dataset goto dataset available under simple linear regression.

          

Step 4 (Importing Dataset and choosing independent and dependent variables)

Dataset is in the form of Comma Separated Values(CSV). There are two columns in the dataset one is years of experience and the other is the salary amount based on that experience.

Here the dependent variable is salary and the independent variable is years of experience.

Step 5(Splitting data into training and testing data)

Python code to split the data into 77% training and 33% testing data.


Step 6(Feature Scaling)

Performing feature scaling to equalize distribution into a particular range to learn more insights from data.

Step 7(Fitting to Linear Regression or Model training)

This is the main heart of the algorithm. Implementation of this in python

Step 8(Predicting with new inputs)

With a given new value, our trained model can able to predict salary corresponding to that experience.

Step 9(Visualization os Data)

  Visualization of data points in 2D  and exploring fitting of the curve to model.



That's great! you come to an end.
hit clap button and subscribe to our blog
Thank you 










Comments