Chapter 14

Simple Linear Regression Analysis

Copyright ©2018 McGraw-Hill Education. All rights reserved.

1

Chapter Outline

14.1 The Simple Linear Regression Model and the Least Square Point Estimates

14.2 Simple Coefficients of Determination and Correlation

14.3 Model Assumptions and the Standard Error

14.4 Testing the Significance of the Slope and

y-Intercept

14.5 Confidence and Prediction Intervals

14-2

2

Chapter Outline Continued

14.6 Testing the Significance of the Population Correlation Coefficient (Optional)

14.7 Residual Analysis

14-3

3

14.1 The Simple Linear Regression Model and the Least Squares Point Estimates

The dependent (or response) variable is the variable we wish to understand or predict

The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable

Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables

The objective is to build a regression model that can describe, predict and control the dependent variable based on the independent variable

LO14-1: Explain the simple linear regression model.

14-4

4

Form of The Simple Linear

Regression Model

y = β0 + β1x + ε

y = β0 + β1x + ε is the mean value of the dependent variable y when the value of the independent variable is x

β0 is the y-intercept; the mean of y when x is zero

β1 is the slope; the change in the mean of y per unit change in x

ε is an error term that describes the effect on y of all factors other than x

LO14-1

14-5

5

Regression Terms

β0 and β1 are called regression parameters

β0 is the y-intercept

β1 is the slope

We do not know the true values of these parameters

So, we must use sample data to estimate them

b0 is the estimate of β0

b1 is the estimate of β1

LO14-1

14-6

6

LO14-1

The Simple Linear Regression Model Illustrated

Figure 14.3

14-7

7

The Least Squares Point Estimates

LO14-2: Find the least squares point estimates of the slope and y-intercept.

14-8

8

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates

LO14-2

14-9

9

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates

From last slide,

Σyi = 8,603.1

Σxi = 434.1

Σx2i = 20,757.41

Σxiyi = 403,296.96

Once we have these values, we no longer need the raw data

Calculation of b0 and b1 uses these totals

LO14-2

14-10

10

Example 14.2 The Tasty Sub Shop Case (Slope b1)

LO14-2

14-11

11

Example 14.2 The Tasty Sub Shop Case (y-Intercept b0)

Prediction (x = 20.8)

ŷ = b0 + b1x = 183.31 + (15.59)(20.8)

ŷ = 507.69

Residual is 527.1 – 507.69 = 19.41

LO14-2

14-12

Figure 14.5

12

14.2 Simple Coefficients of

Determination and Correlation

How useful is a particular regression model?

One measure of usefulness is the simple coefficient of determination

It is represented by the symbol r2

LO14-3: Calculate and interpret the simple coefficients of determination and correlation.

14-13

13

The Simple Coefficient of Determination,

Total variation is (yi-ȳ)2

Explained variation is (ŷi-ȳ)2

Unexplained variation is (yi-ŷ)2

Total variation is the sum of explained and unexplained variation

Simple coefficient of determination is

is the proportion of explained variation

LO14-3

14-14

14

The Simple Correlation Coefficient,

The simple correlation coefficient between y and x is denoted by r

It is…

if b1 is positive

if b1 is negative

Where b1 is the slope of the least squares line

Simple correlation coefficient measures the strength of the linear relationship between y and x and is denoted by r

LO14-3

14-15

15

LO14-3

Different Values of the Correlation Coefficient

Figure 14.8

14-16

16

14.3 Model Assumptions and the Standard Error

Mean of Zero: At any given value of x, the population of potential error term values has a mean equal to zero

Constant Variance Assumption: At any value of x, the population of potential error term values has a variance that does not depend on the value of x

Normality Assumption: At any given value of x, the population of potential error term values has a normal distribution

Independence Assumption: Any one value of the error term ε is statistically independent of any other value of ε

LO14-4: Describe the assumptions behind simple linear regression and calculate the standard error.

14-17

Figure 14.9

17

LO14-4

The Mean Square Error and the Standard Error

Sum of squared errors

Mean square error

Point estimate of the residual variance σ2

Standard error

Point estimate of the residual standard deviation σ

14-18

18

14.4 Testing the Significance of the Slope and y-Intercept

A regression model is not likely to be useful unless there is a significant relationship between x and y

To test significance, we use the null hypothesis:

H0: β1 = 0

Versus the alternative hypothesis:

Ha: β1 ≠ 0

LO14-5: Test the significance of the slope and y-intercept.

14-19

19

Testing the Significance of the Slope and y-Intercept Continued

LO14-5

14-20

20

An F Test for the Significance of the Slope (Optional)

H0: β1 = 0

Ha: β1 0

Reject H0 in favor of Ha at if either

F(model) > F

p-value <
F based on one numerator and n - 2 denominator degrees of freedom
LO14-6: Test the significance of a simple linear regression model by using an F test (Optional).
14-21
14.5 Confidence and Prediction Intervals
The point on the regression line corresponding to a particular value of x0 of the independent variable x is ŷ = b0 + b1x0

It is unlikely that this value will equal the mean value of y when x equals x0

Therefore, we need to place bounds on how far the predicted value might be from the actual value

We can do this by calculating a confidence interval mean for the value of y and a prediction interval for an individual value of y

LO14-7: Calculate and interpret a confidence interval for a mean value and a prediction interval for an individual value.

14-22

22

Distance Value

Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value

The distance value is a measure of the distance between the value x0 of x and

Notice that the further x0 is from , the larger the distance value

LO14-7

14-23

23

A Confidence Interval and Prediction Interval

Assume that the regression assumption holds

The formula for a 100 (1 – ) confidence interval for the mean value of y is

The formula for a 100 (1 – ) prediction interval for an individual value of y is

This is based on n – 2 degrees of freedom

LO14-7

14-24

24

Which to Use?

The prediction interval is useful if it is important to predict an individual value of the dependent variable

A confidence interval is useful if it is important to estimate the mean value

The prediction interval will always be wider than the confidence interval

LO14-7

14-25

25

14.6 Testing the Significance of the Population Correlation Coefficient (Optional)

The simple correlation coefficient (r) measures the linear relationship between the observed values of x and y from the sample

The population correlation coefficient (ρ) measures the linear relationship between all possible combinations of observed values of x and y

r is an estimate of ρ

LO14-8: Test hypotheses about the population correlation coefficient (Optional).

14-26

26

Testing ρ

We can test to see if the correlation is significant using the hypotheses

H0: ρ = 0

Ha: ρ ≠ 0

The statistic is

This test will give the same results as the test for significance on the slope coefficient b1

LO14-8

14-27

27

14.7 Residual Analysis

Checks of regression assumptions are performed by analyzing the regression residuals

Residuals () are defined as the difference between the observed value of y and the predicted value of y, = y – ŷ

Note that is the point estimate of ε

If regression assumptions valid, the population of potential error terms will be normally distributed with mean zero and variance σ2

Different error terms will be statistically independent

LO14-9: Use residual analysis to check the assumptions of simple linear regression.

14-28

28

Residual Analysis Continued

Residuals are randomly and independently selected from normal populations with mean zero and variance σ2

With any real data, assumptions will not hold exactly

Mild departures do not affect our ability to make statistical inferences

In checking assumptions, we are looking for pronounced departures from the assumptions

So, only require residuals to approximately fit the description above

LO14-9

14-29

29

LO14-9

Example 14.9 The QHIC Case: Constructing Residual Plots

Figure 14.18b

Quality Home Improvement Center (QHIC) operates five stores

Studies the relationship between home value and yearly expenditure on home upkeep

Random sample of 40 homeowners

Intercept = –348.3921

Slope 7.2583

14-30

30

Residual Plots

Residuals versus independent variable

Residuals versus predicted y’s

Residuals in time order (if the response is a time series)

LO14-9

14-31

31

Constant Variance Assumptions

To check the validity of the constant variance assumption, examine residual plots against

The x values

The predicted y values

Time (when data is time series)

A pattern that fans out says the variance is increasing rather than staying constant

A pattern that funnels in says the variance is decreasing rather than staying constant

A pattern that is evenly spread within a band says the assumption has been met

LO14-9

14-32

32

LO14-9

Constant Variance Visually

Figure 14.19

14-33

33

Assumption of Correct Functional Form

If the relationship between x and y is something other than a linear one, the residual plot will often suggest a form more appropriate for the model

For example, if there is a curved relationship between x and y, a plot of residuals will often show a curved relationship

LO14-9

14-34

34

Normality Assumption

If the normality assumption holds, a histogram or stem-and-leaf display of residuals should look bell-shaped and symmetric

Another way to check is a normal plot of residuals

Order residuals from smallest to largest

Plot (i) on vertical axis against (i)

(i) is the point on the horizontal axis under the curve so the area under this curve to the left is (3i – 1)/(3n + 1)

If the normality assumption holds, the plot should have a straight-line appearance

LO14-9

14-35

35

Independence Assumption

Independence assumption most likely violated by time-series data

If the data is not time series, it can be reordered without affecting it

For time-series data, the time-ordered error terms can be autocorrelated

Positive autocorrelation is when a positive error term in time period i tends to be followed by another positive value in i + k

Negative autocorrelation is when a positive error term tends to be followed by a negative value

Either one will cause a cyclical error term over time

LO14-9

14-36

36

LO14-9

Independence Assumption Visually

Figure 14.26 a and b

14-37

37

(

)

(

)

(

)

n

x

x

n

y

y

x

b

y

b

n

x

x

x

x

SS

n

y

x

y

x

y

y

x

x

SS

SS

SS

b

x

b

b

y

i

i

i

i

i

xx

i

i

i

i

i

i

xy

xx

xy

å

å

å

å

å

å

å

å

å

=

=

–

=

–

=

–

=

–

=

–

–

=

=

+

=

and

where

0

β

intercept

–

y

the

of

estimate

point

squares

Least

)

(

)

(

)

(

1

β

slope

the

of

estimate

point

squares

Least

ˆ

equation

n

/predictio

Estimation

1

0

2

2

2

1

1

0

596.15

129.913,1

389.836,29

129.913,1

10

)1.434(

41.757,120

389.836,29

10

)1.603,8)(1.434(

96.296,403

1

2

2

2

xx

xy

i

ixx

ii

iixy

SS

SS

b

n

x

xSS

n

yx

yxSS

31.183

)41.43)(596.15(31.860

41.43

10

1.434

31.860

10

1.603,8

10

xbyb

n

x

x

n

y

y

i

i

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

- Free title page and bibliography
- Unlimited revisions
- Plagiarism-free guarantee
- Money-back guarantee
- 24/7 support

On-demand options

- Writer’s samples
- Part-by-part delivery
- Overnight delivery
- Copies of used sources
- Expert Proofreading

Paper format

- 275 words per page
- 12 pt Arial/Times New Roman
- Double line spacing
- Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Delivering a high-quality product at a reasonable price is not enough anymore.

That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more