Technology: Regression Analysis

Inferential Statistics

Regression Analysis is common method of prediction. It is used when ever there is causal relationship between variables.

Points to note

Correlation doesn't imply causation. Some variables are strangely correlated while few are unexpectedly not correlated

Linear Regression is a linear approximation of a causal relationship between two or more variables.

Regression models are highly valuable as they are one of the most common ways to make inferences and predictions.
Process of Linear regression

Get sample data
Design model that works for the sample
Make predictions for the whole population
Dependent variables are predicted from Independent variables.
Y=F(x1,x2,x3.....)
Dependent variable Y is a function of the independent variables x1,x2,...

Simple Linear Regression is the simplest regression model.

y hat=b0 +b1x1 (hat stands for estimated or predicted value)

b0 is intercept on the line graph
b1 is the slope of the line

Correlation vs Regression

Correlation doesn't imply causation. It is degree of relationship between two variables.
Correlation is degree of inter relation between two variables.
Regression Analysis is about how one variable effects another.
Regression is based on causaulity. It shows no degree of connection but cause and effect.
Correlation P(x,y) is same as P(y,x)
Regression is one way
Correlation is single point on graph.
Regression is best fitting line between the data points that minimizes distance between them.

Decomposing Linear model

Sum of squares total(SST) - sigma(yi - ymu)^2 - diff between actual/actual value & mean
sum of squares regression(SSR) - sigma(yhat - ymu)^2 - diff between predicted value & mean
sum of squares error(SSE) - diff between observed value & predicted value

SST=SSR+SSE
Total variability = Explained variability + unexplained variability

R ^2(R squared ) = SSR/SST = variability explained by the regression/total variability
The R-squared shows how much of the total variability of the dataset is explained by your regression model. This may be expressed as: how well your model fits your data. It is incorrect to say your regression line fits the data, as the line is the geometrical representation of the regression equation. It also incorrect to say the data fits the model or the regression line, as you are trying to explain the data with a model, not vice versa.

R Squared measures the goodness of fit of your model
More factors you include, higher the R Squared
R Squared ranges between 0 & 1. 1 means the model explains entire variability of data.

Ordinary Least squares (min SSE)
    =min sigma ei^2
    s(b) is the OLS estimator of beta for a simple linear regression
    s(b) = sigma(yi - xi^Tb)^2 = (y-Xb)^T(y-Xb)

Regression Tables

Model summary

Multiple R
R square
Adjusted R Square
Standared error - sqrt(SSE/(n-2))
Observations

Anova table (Analysis of Variance)

Table with coeffecients(This is heart of regressions)
intercept(beta 0)
independent variable(beta 1)

Adjusted R Square

It penalizes execessive use of variables
The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.

Technology

Sunday, 3 March 2019

Regression Analysis

No comments:

Post a Comment

Blog Archive