The Linear Regression !!!

  • Linear Regression is a supervised learning algorithm that depicts a relationship between the dependent variable(Y) and one or more independent variables(X)
  • It is used to analyze continuous numeric data. It is used to predict quantitative variables by establishing a relationship between X and Y.
  • Independent variables should be linearly related to the dependent variable.
    We can determine these by many visualization techniques like scatter plot, Heatmap.
  • Every feature(Attributes) in the data is Normally Distributed.
    this can also be checked by the histogram(Visualization)
  • There should be little or no multi-collinearity in the data.
    The best way to check the presence of multi-collinearity is to perform VIF(Variance Inflation Factor).
  • The mean of the residual is zero.
    The residual value is the distance between the data point and line. If this difference comes nearer to zero that means our model is working properly and accurately.
    → If the observed points are far from the regression line, then the residual will be high, and so the cost function will high.
  • No autocorrelations:
    If there will be any correlation in the error term, then it will dramatically reduce the accuracy of the model. Autocorrelation usually occurs if there is a dependency between residual errors.
  • Homoscedasticity Assumption (Same Variance):
    The assumption of equal variances (i.e. assumption of homoscedasticity) assumes that different samples have the same variance, even if they came from different populations. The assumption is found in many statistical tests, including Analysis of Variance (ANOVA) and Student’s T-Test.
  1. Simple Linear Regression:-
    This type of regression helps to find a linear relationship between only two variables where one is dependent(Y) and the other is an independent variable(X).
    → The formula of Simple Linear Regression like Stright line formula y=mx+c.
    Our main goal is to find the value of ‘m’ and ‘c’ in such a way that it gives us the smallest sum of squared(SSE)
    →The formula of Simple Linear Regression:
  1. R Square/Adjusted R Square
    R-squared is a statistical method that determines the goodness of fit.
    → It measures the strength of the relationship between the dependent and independent variables on a scale of 0–100% or 0–1.
    → The high value of R-square represents there is less difference between the predicted value and the actual value means the model is good.
    → R-Squared, also known as the Coefficient of Determination
    →If we get 60% or 0.6 we can say that there is a 60 % reduction in variation when we take a particular independent variable in the calculation.
    → or 60% of the sum of squared of the independent variable explain by that independent variable.

Thanks for Reading

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium here.
  2. Follow me on GitHub here.
  3. Follow me on LinkedIn here.
  4. Be one of the FIRST to follow me on Instagram here.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Prakhar Patel

Prakhar Patel

61 Followers

Hello, I’m a computer student passionate about data science. I believe the best way to broaden our knowledge is to share it with people.