Chapter 4 Linear Regression

Concepts:

  • Regression Line
  • Correlation
  • Mean Squared Errors and Root Mean Squared Errors
  • Estimate \(\hat{\sigma}\)
  • Ordinary Least Squares (OLS) Estimator \(\hat{\beta}\) BLUE
  • Covariance matrix \(\widehat{cov}(\hat{\beta})\)
  • Standard Error
  • Model output (t-values, p-values, F-statistics, p-value of F-statistic)
  • OLS Assumptions

4.1 Simple Linear Regression

Lets look at the Old faithful eruptions data again. In the dataset have eruption duration (mins) and waiting time between each eruption. We can look at the scatterplot and the line that best describes the pattern

eruptions waiting
3.600 79
1.800 54
3.333 74
2.283 62
4.533 85
2.883 55

Here we can see that the length of the eruptions are “positively correlated” with the waiting time i.e. longer the waiting time, longer the eruptions. We can also see that the points are clustered around two areas. The line that best fits the data is known as regression line.

We can do some basic calculations to build this linear model using r.

Mean Eruption times = 3.4877831
Variance Eruption times = 1.3027283
Standard Deviation of Eruption times = 1.1413713
Mean wait times = 70.8970588
Variance wait times = 184.8233124
Standard Deviation of wait times = 13.5949738
Correlation coefficient = 0.8974994
Slope of regression line =
Mean Square Errors

Running inbuilt function to run linear regression

## 
## Call:
## lm(formula = eruptions ~ waiting, data = faithful)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29917 -0.37689  0.03508  0.34909  1.19329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.874016   0.160143  -11.70   <2e-16 ***
## waiting      0.075628   0.002219   34.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared:  0.8115,	Adjusted R-squared:  0.8108 
## F-statistic:  1162 on 1 and 270 DF,  p-value: < 2.2e-16

Explanation of the model here {residuals, coefficients, Standard-errors (including residuals), significance, F-statistic}

Residual plot


4.2 Multiple Linear Regression

Extend simple linear regression to multiple linear regression.

Functional form \[y_i = \beta_1x_{i1} + ... \beta_px_{ip} + \epsilon_i \] We do not know true beta

\[Y = X\beta + \epsilon\] So we use \(\hat\beta\) to estimate \(\beta\) and OLS gives us the best result. Sum of squred of errors is minimized when \(\hat\beta = (X'X)^{-1}X'Y\)

Residuals \(e= Y - X\hat\beta\) and e is orthogonal to X

Ordinary Least Squares (involves some Matrix Algebra and mathematical proofs). Assumptions:
- Design matrix X has full rank. ie. there are more rows of data than the number of independent variables (features) in the dastaset - The unobservable random vector \(\epsilon\) is Independent and Identically distributed (IID) with mean 0 and variance of more than 0. - If design matrix X is random then \(\epsilon\) is independent of X.

Clarify the the difference between beta/betahat and e/epsilon

When the disturbance terms epsilon are normally distributed, the beta hat are the Maximum likely estimator for beta

Gradient Decent method Elaborate later

Define Loss Function Loss = residual . Calculate the Mean Squared of Errors

\[ E = \frac 1 n \sum_{i=0}^n(y_i - \bar y_i)^2 = \frac 1 n \sum_{i=0}^n(y_i - (mx_i +c))^2 \] Calculate the Gradient ie. partial derivative with respect to m and c to find the constants where the \(E\) is the smallest. Which are

\(D_m = \delta_m/\delta_y = \frac {-2} n \sum_{i=0}^n x_i(y_i - \bar y_i)\)

\(D_c = \delta_c/\delta_y = \frac {-2} n \sum_{i=0}^n (y_i - \bar y_i)\)

Update \(m\) and \(c\) using \(m = m - L \times D_m\) and \(c=c-L \times D_c\)

Repeat the \(m\) and \(c\) caluclation, plug the value into Loss function with value of x and y. The required \(m\) and \(c\) are when \(E = 0\).

Also need to avoid the local minima values so use multiple random starting point for values of \(m\) and \(c\) and do calculation multiple times (epochs)

Few things to test to ensure model fit

  • plot residual vs x (should have no trend)
  • Mean of residul should center around zero
  • Assumption that residual are normally distributed

4.2.1 Consequences of Incorrect Functional Form

Creating data

## 
## Call:
## lm(formula = y ~ x + I(x^2), data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6521 -3.2214 -0.1858  2.9871 11.4882 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.453364   1.428003   8.021 2.43e-12 ***
## x           -1.208440   0.065264 -18.516  < 2e-16 ***
## I(x^2)       0.020854   0.000626  33.311  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.665 on 97 degrees of freedom
## Multiple R-squared:  0.9774,	Adjusted R-squared:  0.9769 
## F-statistic:  2098 on 2 and 97 DF,  p-value: < 2.2e-16

Residuals for Quadratic Model

Fitting a linear model

## 
## Call:
## lm(formula = y ~ x, data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.077 -14.502  -3.712  12.844  33.444 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -24.35330    3.29860  -7.383 5.13e-11 ***
## x             0.89783    0.05671  15.833  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.37 on 98 degrees of freedom
## Multiple R-squared:  0.7189,	Adjusted R-squared:  0.7161 
## F-statistic: 250.7 on 1 and 98 DF,  p-value: < 2.2e-16

f-test Function

## [1] 6.724646e-55
## 
## Call:
## lm(formula = y ~ x + I(x^2), data = mydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6521 -3.2214 -0.1858  2.9871 11.4882 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.453364   1.428003   8.021 2.43e-12 ***
## x           -1.208440   0.065264 -18.516  < 2e-16 ***
## I(x^2)       0.020854   0.000626  33.311  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.665 on 97 degrees of freedom
## Multiple R-squared:  0.9774,	Adjusted R-squared:  0.9769 
## F-statistic:  2098 on 2 and 97 DF,  p-value: < 2.2e-16

Plots involving Yhat for Quadratic Model

Plots involving Yhat for Linear Model

4.2.2 Design Matrix Assumptions

Rank of Design Matrix for Model2

##   (Intercept) x I(x^2)
## 1           1 1      1
## 2           1 2      4
## 3           1 3      9
## 4           1 4     16
## 5           1 5     25
## 6           1 6     36
## [1] 100   3
## [1] 3
## attr(,"method")
## [1] "tolNorm2"
## attr(,"useGrad")
## [1] FALSE
## attr(,"tol")
## [1] 2.220446e-14
##   (Intercept) x I(x^2) z
## 1           1 1      1 1
## 2           1 2      4 2
## 3           1 3      9 3
## 4           1 4     16 4
## 5           1 5     25 5
## 6           1 6     36 6
## [1] 100   4
## [1] 3
## attr(,"method")
## [1] "tolNorm2"
## attr(,"useGrad")
## [1] FALSE
## attr(,"tol")
## [1] 2.220446e-14
## 
## Call:
## lm(formula = y ~ x + I(x^2) + z)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6521 -3.2214 -0.1858  2.9871 11.4882 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.453364   1.428003   8.021 2.43e-12 ***
## x           -1.208440   0.065264 -18.516  < 2e-16 ***
## I(x^2)       0.020854   0.000626  33.311  < 2e-16 ***
## z                  NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.665 on 97 degrees of freedom
## Multiple R-squared:  0.9774,	Adjusted R-squared:  0.9769 
## F-statistic:  2098 on 2 and 97 DF,  p-value: < 2.2e-16

## [1] 0.9993595
## 
## Call:
## lm(formula = y ~ x + I(x^2) + w)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5002 -3.2165 -0.2137  2.9738 10.7425 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.5987222  1.4380446   8.066 2.06e-12 ***
## x           -1.6267839  0.4624638  -3.518 0.000667 ***
## I(x^2)       0.0208978  0.0006284  33.256  < 2e-16 ***
## w            0.4130417  0.4520250   0.914 0.363133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.669 on 96 degrees of freedom
## Multiple R-squared:  0.9776,	Adjusted R-squared:  0.9769 
## F-statistic:  1397 on 3 and 96 DF,  p-value: < 2.2e-16

4.2.3 NORMALITY OF ERROR TERMS

Normal Data

## 
## 	Shapiro-Wilk normality test
## 
## data:  e
## W = 0.99737, p-value = 0.1053

## 
## 	Shapiro-Wilk normality test
## 
## data:  e
## W = 0.95765, p-value < 2.2e-16

## 
## 	Shapiro-Wilk normality test
## 
## data:  e
## W = 0.9762, p-value = 0.06698

Achieving Normality with Box-Cox Transformations

##               [,1]       [,2]
##   [1,] -2.00000000 -397.92136
##   [2,] -1.95959596 -391.07290
##   [3,] -1.91919192 -384.25863
##   [4,] -1.87878788 -377.47860
##   [5,] -1.83838384 -370.73285
##   [6,] -1.79797980 -364.02145
##   [7,] -1.75757576 -357.34444
##   [8,] -1.71717172 -350.70173
##   [9,] -1.67676768 -344.09323
##  [10,] -1.63636364 -337.51874
##  [11,] -1.59595960 -330.97801
##  [12,] -1.55555556 -324.47075
##  [13,] -1.51515152 -317.99653
##  [14,] -1.47474747 -311.55492
##  [15,] -1.43434343 -305.14532
##  [16,] -1.39393939 -298.76710
##  [17,] -1.35353535 -292.41952
##  [18,] -1.31313131 -286.10171
##  [19,] -1.27272727 -279.81275
##  [20,] -1.23232323 -273.55156
##  [21,] -1.19191919 -267.31696
##  [22,] -1.15151515 -261.10768
##  [23,] -1.11111111 -254.92228
##  [24,] -1.07070707 -248.75925
##  [25,] -1.03030303 -242.61694
##  [26,] -0.98989899 -236.49355
##  [27,] -0.94949495 -230.38725
##  [28,] -0.90909091 -224.29600
##  [29,] -0.86868687 -218.21775
##  [30,] -0.82828283 -212.15037
##  [31,] -0.78787879 -206.09166
##  [32,] -0.74747475 -200.03947
##  [33,] -0.70707071 -193.99168
##  [34,] -0.66666667 -187.94628
##  [35,] -0.62626263 -181.90162
##  [36,] -0.58585859 -175.85624
##  [37,] -0.54545455 -169.80932
##  [38,] -0.50505051 -163.76098
##  [39,] -0.46464646 -157.71182
##  [40,] -0.42424242 -151.66482
##  [41,] -0.38383838 -145.62388
##  [42,] -0.34343434 -139.59592
##  [43,] -0.30303030 -133.59176
##  [44,] -0.26262626 -127.62421
##  [45,] -0.22222222 -121.71491
##  [46,] -0.18181818 -115.88857
##  [47,] -0.14141414 -110.17968
##  [48,] -0.10101010 -104.63395
##  [49,] -0.06060606  -99.30216
##  [50,] -0.02020202  -94.25458
##  [51,]  0.02020202  -89.56714
##  [52,]  0.06060606  -85.32617
##  [53,]  0.10101010  -81.62837
##  [54,]  0.14141414  -78.56756
##  [55,]  0.18181818  -76.22513
##  [56,]  0.22222222  -74.67758
##  [57,]  0.26262626  -73.96405
##  [58,]  0.30303030  -74.09086
##  [59,]  0.34343434  -75.04712
##  [60,]  0.38383838  -76.77031
##  [61,]  0.42424242  -79.18669
##  [62,]  0.46464646  -82.20836
##  [63,]  0.50505051  -85.73659
##  [64,]  0.54545455  -89.67670
##  [65,]  0.58585859  -93.94605
##  [66,]  0.62626263  -98.46543
##  [67,]  0.66666667 -103.17232
##  [68,]  0.70707071 -108.01550
##  [69,]  0.74747475 -112.95015
##  [70,]  0.78787879 -117.94625
##  [71,]  0.82828283 -122.97679
##  [72,]  0.86868687 -128.02325
##  [73,]  0.90909091 -133.07213
##  [74,]  0.94949495 -138.11241
##  [75,]  0.98989899 -143.13797
##  [76,]  1.03030303 -148.14365
##  [77,]  1.07070707 -153.12676
##  [78,]  1.11111111 -158.08593
##  [79,]  1.15151515 -163.02045
##  [80,]  1.19191919 -167.93077
##  [81,]  1.23232323 -172.81755
##  [82,]  1.27272727 -177.68197
##  [83,]  1.31313131 -182.52546
##  [84,]  1.35353535 -187.34951
##  [85,]  1.39393939 -192.15579
##  [86,]  1.43434343 -196.94595
##  [87,]  1.47474747 -201.72166
##  [88,]  1.51515152 -206.48454
##  [89,]  1.55555556 -211.23618
##  [90,]  1.59595960 -215.97808
##  [91,]  1.63636364 -220.71171
##  [92,]  1.67676768 -225.43839
##  [93,]  1.71717172 -230.15943
##  [94,]  1.75757576 -234.87603
##  [95,]  1.79797980 -239.58931
##  [96,]  1.83838384 -244.30033
##  [97,]  1.87878788 -249.01002
##  [98,]  1.91919192 -253.71923
##  [99,]  1.95959596 -258.42886
## [100,]  2.00000000 -263.13980
## [1] 57

## 
## 	Shapiro-Wilk normality test
## 
## data:  e.tilde
## W = 0.98912, p-value = 0.594

Using Transformed Model to Make Predictions

4.2.4 Constancy of Error Variance (Homoscedasticity)

## 
## 	Modified robust Brown-Forsythe Levene-type test based on the absolute deviations from
## 	the median
## 
## data:  e
## Test Statistic = 7.3656, p-value = 0.007856

## 
## 	Modified robust Brown-Forsythe Levene-type test based on the absolute deviations from
## 	the median
## 
## data:  e.tilde
## Test Statistic = 0.24102, p-value = 0.6246

A Permutation Test Example

Creating Data

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.954 -25.658  -3.725  26.312  54.757 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 206.5557     7.7986   26.49   <2e-16 ***
## x1            2.8750     0.1112   25.85   <2e-16 ***
## x2            4.9691     0.1091   45.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.47 on 97 degrees of freedom
## Multiple R-squared:  0.9695,	Adjusted R-squared:  0.9689 
## F-statistic:  1543 on 2 and 97 DF,  p-value: < 2.2e-16

## 
## 	Shapiro-Wilk normality test
## 
## data:  e
## W = 0.94432, p-value = 0.000357

Model is y = beta0 + beta1*x1 + beta2*x2

Permutation Test for H0: beta2=0

##       x2 
## 4.969105

How do they compare?

## [1] -1.800433  1.622146
## [1] 0