Midterm

For my midterm project, I looked at statistics and datasets revolving around sunflowers. Some questions I had were: Where do they grow best in the United States? How many are typically harvested…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Simple Linear Regression Implementation From Scratch

Part 4/5 in Linear Regression

Part 1 : Linear Regression From Scratch.

Part 2 : Linear Regression Line Through Brute Force.

Part 3 : Linear Regression Complete Derivation.

Part 4 : Simple Linear Regression Implementation From Scratch.

Part 5 : Simple Linear Regression Implementation Using Scikit-Learn.

In the last article we derived a formula to calculate the “best fit” regression line. Now it’s time to implement it using python. Keep in mind that in this article we are not going to use python libraries such as “scikit-learn” to find the parameters such as slope and intercept of line, but instead we will implement Simple Linear Regression from basic python code and user defined functions. Here we’ll utilize the formula we derived in the last article to find slope and intercept. We’ll use python libraries such as “matplotlib” to visualize the data and the regression line, but if you don’t want to visualize the data and just want to find the regression line you can skip the code where we use matplotlib to visualize the data.

(1) Importing the required libraries :

(2) Read the csv file :

There are more columns in our data but due to limited space I can only show a few here.

Read CSV File

(3) Find out the columns in our data :

Columns

(4) Find additional information about our data :

(5) Print various statistical data of our dataset :

Analysis of Data

(6) Select useful features from our dataset :

(7) Plot the data with it’s value count :

Histogram
Histogram

(8) Plot the data on scatter plot to find out which feature can be used to make the predictions.

Plotted Data

Here we can see that we can easily plot a regression line in ENGINE SIZE VS CO2 EMISSION plot.

(9) Now we will divide our dataset into 2 parts. One for training data and another for testing data. We’ll use 80% of the data for training and 20% of data to test our predictions.

(10) Finding the mean of CO2-EMISSION :

Average

(11) Main function to find slope and intercept. Go check out my last article to understand the derivation of formula used here.

Main Function

(12) Testing our function with basic data :

Testing our Function

Voila! It works perfectly!!

(13) Finding the Slope and Intercept for our actual data :

Finding parameters

(14) Now that we have our Slope and Intercept with us we can make our regression line :

(15) Plot the regression line to visualize it :

Data Visualization

(16) Now we’ll predict the values with our model. But first we need to make a function for that :

Prediction

(17) Can we predict the engine-size from co2-emission? Of course!! Here’s how to it.

Reverse Prediction

Now it’s time to check how well our model performed in predicting the testing values. There are many methods to calculate the error/accuracy of a model. Here we’ll cover a few of them

RSS Accuracy

(2) R-Squared :

MAE Accuracy
MSE Accuracy
MAPE Accuracy

In summary, in this article we saw how we can implement simple linear regression without scikit-learn. It’s a lot of work right? But wait..!! There is an easy way to perform the same calculations with same output using some python libraries. In the next article we’ll see how we can perform such complex calculations in minutes with scikit-learn.

In my future articles I will try to show which accuracy model is best for different kind of datasets.

You can follow me on medium.

Add a comment

Related posts:

A Hypothetical Eulogy for Ron DeSantis

Ron DeSantis is gone now, and I am much happier for it. That isn’t exactly true, on several counts: one, Ron DeSantis is very much alive; two, Ron DeSantis is running for President of these United…

TechCrunch Shanghai

We enjoyed pitching our TeacherBot chatbot powered by Artificial Intelligence to a room of interested investors and technologists at TechCrunch Shanghai 2017.