# bowerman_9e_chap_142.pptx

Chapter 14
Simple Linear Regression Analysis

1

Chapter Outline
14.1 The Simple Linear Regression Model and the Least Square Point Estimates
14.2 Simple Coefficients of Determination and Correlation
14.3 Model Assumptions and the Standard Error
14.4 Testing the Significance of the Slope and
y-Intercept
14.5 Confidence and Prediction Intervals
14-2

2

Chapter Outline Continued
14.6 Testing the Significance of the Population Correlation Coefficient (Optional)
14.7 Residual Analysis
14-3

3

14.1 The Simple Linear Regression Model and the Least Squares Point Estimates
The dependent (or response) variable is the variable we wish to understand or predict
The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable
Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables
The objective is to build a regression model that can describe, predict and control the dependent variable based on the independent variable
LO14-1: Explain the simple linear regression model.
14-4

4

Form of The Simple Linear
Regression Model
y = β0 + β1x + ε
y = β0 + β1x + ε is the mean value of the dependent variable y when the value of the independent variable is x
β0 is the y-intercept; the mean of y when x is zero
β1 is the slope; the change in the mean of y per unit change in x
ε is an error term that describes the effect on y of all factors other than x
LO14-1
14-5

5

Regression Terms
β0 and β1 are called regression parameters
β0 is the y-intercept
β1 is the slope
We do not know the true values of these parameters
So, we must use sample data to estimate them
b0 is the estimate of β0
b1 is the estimate of β1
LO14-1
14-6

6

LO14-1
The Simple Linear Regression Model Illustrated
Figure 14.3

14-7

7

The Least Squares Point Estimates
LO14-2: Find the least squares point estimates of the slope and y-intercept.
14-8

8

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates
LO14-2
14-9

9

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates
From last slide,
Σyi = 8,603.1
Σxi = 434.1
Σx2i = 20,757.41
Σxiyi = 403,296.96
Once we have these values, we no longer need the raw data
Calculation of b0 and b1 uses these totals
LO14-2
14-10

10

Example 14.2 The Tasty Sub Shop Case (Slope b1)
LO14-2
14-11

11

Example 14.2 The Tasty Sub Shop Case (y-Intercept b0)
Prediction (x = 20.8)
ŷ = b0 + b1x = 183.31 + (15.59)(20.8)
ŷ = 507.69
Residual is 527.1 – 507.69 = 19.41
LO14-2
14-12

Figure 14.5

12

14.2 Simple Coefficients of
Determination and Correlation
How useful is a particular regression model?

One measure of usefulness is the simple coefficient of determination

It is represented by the symbol r2
LO14-3: Calculate and interpret the simple coefficients of determination and correlation.
14-13

13

The Simple Coefficient of Determination,
Total variation is (yi-ȳ)2
Explained variation is (ŷi-ȳ)2
Unexplained variation is (yi-ŷ)2
Total variation is the sum of explained and unexplained variation
Simple coefficient of determination is
is the proportion of explained variation
LO14-3
14-14

14

The Simple Correlation Coefficient,
The simple correlation coefficient between y and x is denoted by r
It is…
if b1 is positive
if b1 is negative
Where b1 is the slope of the least squares line
Simple correlation coefficient measures the strength of the linear relationship between y and x and is denoted by r
LO14-3
14-15

15

LO14-3
Different Values of the Correlation Coefficient
Figure 14.8

14-16

16

14.3 Model Assumptions and the Standard Error
Mean of Zero: At any given value of x, the population of potential error term values has a mean equal to zero
Constant Variance Assumption: At any value of x, the population of potential error term values has a variance that does not depend on the value of x
Normality Assumption: At any given value of x, the population of potential error term values has a normal distribution
Independence Assumption: Any one value of the error term ε is statistically independent of any other value of ε
LO14-4: Describe the assumptions behind simple linear regression and calculate the standard error.
14-17

Figure 14.9

17

LO14-4
The Mean Square Error and the Standard Error
Sum of squared errors

Mean square error
Point estimate of the residual variance σ2

Standard error
Point estimate of the residual standard deviation σ
14-18

18

14.4 Testing the Significance of the Slope and y-Intercept
A regression model is not likely to be useful unless there is a significant relationship between x and y
To test significance, we use the null hypothesis:

H0: β1 = 0

Versus the alternative hypothesis:

Ha: β1 ≠ 0
LO14-5: Test the significance of the slope and y-intercept.
14-19

19

Testing the Significance of the Slope and y-Intercept Continued

LO14-5
14-20

20

An F Test for the Significance of the Slope (Optional)

H0: β1 = 0
Ha: β1  0
Reject H0 in favor of Ha at  if either
F(model) > F
p-value <  F based on one numerator and n - 2 denominator degrees of freedom LO14-6: Test the significance of a simple linear regression model by using an F test (Optional). 14-21 14.5 Confidence and Prediction Intervals The point on the regression line corresponding to a particular value of x0 of the independent variable x is ŷ = b0 + b1x0
It is unlikely that this value will equal the mean value of y when x equals x0
Therefore, we need to place bounds on how far the predicted value might be from the actual value
We can do this by calculating a confidence interval mean for the value of y and a prediction interval for an individual value of y
LO14-7: Calculate and interpret a confidence interval for a mean value and a prediction interval for an individual value.
14-22

22

Distance Value
Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value
The distance value is a measure of the distance between the value x0 of x and
Notice that the further x0 is from , the larger the distance value
LO14-7
14-23

23

A Confidence Interval and Prediction Interval
Assume that the regression assumption holds

The formula for a 100 (1 – ) confidence interval for the mean value of y is

The formula for a 100 (1 – ) prediction interval for an individual value of y is

This is based on n – 2 degrees of freedom
LO14-7
14-24

24

Which to Use?
The prediction interval is useful if it is important to predict an individual value of the dependent variable

A confidence interval is useful if it is important to estimate the mean value

The prediction interval will always be wider than the confidence interval
LO14-7
14-25

25

14.6 Testing the Significance of the Population Correlation Coefficient (Optional)
The simple correlation coefficient (r) measures the linear relationship between the observed values of x and y from the sample

The population correlation coefficient (ρ) measures the linear relationship between all possible combinations of observed values of x and y

r is an estimate of ρ
LO14-8: Test hypotheses about the population correlation coefficient (Optional).
14-26

26

Testing ρ
We can test to see if the correlation is significant using the hypotheses

H0: ρ = 0
Ha: ρ ≠ 0

The statistic is

This test will give the same results as the test for significance on the slope coefficient b1
LO14-8
14-27

27

14.7 Residual Analysis
Checks of regression assumptions are performed by analyzing the regression residuals
Residuals () are defined as the difference between the observed value of y and the predicted value of y, = y – ŷ
Note that is the point estimate of ε
If regression assumptions valid, the population of potential error terms will be normally distributed with mean zero and variance σ2
Different error terms will be statistically independent
LO14-9: Use residual analysis to check the assumptions of simple linear regression.
14-28

28

Residual Analysis Continued
Residuals are randomly and independently selected from normal populations with mean zero and variance σ2
With any real data, assumptions will not hold exactly
Mild departures do not affect our ability to make statistical inferences
In checking assumptions, we are looking for pronounced departures from the assumptions
So, only require residuals to approximately fit the description above
LO14-9
14-29

29

LO14-9
Example 14.9 The QHIC Case: Constructing Residual Plots
Figure 14.18b
Quality Home Improvement Center (QHIC) operates five stores
Studies the relationship between home value and yearly expenditure on home upkeep
Random sample of 40 homeowners
Intercept = –348.3921
Slope 7.2583
14-30

30

Residual Plots
Residuals versus independent variable
Residuals versus predicted y’s
Residuals in time order (if the response is a time series)
LO14-9
14-31

31

Constant Variance Assumptions
To check the validity of the constant variance assumption, examine residual plots against
The x values
The predicted y values
Time (when data is time series)
A pattern that fans out says the variance is increasing rather than staying constant
A pattern that funnels in says the variance is decreasing rather than staying constant
A pattern that is evenly spread within a band says the assumption has been met
LO14-9
14-32

32

LO14-9
Constant Variance Visually
Figure 14.19

14-33

33

Assumption of Correct Functional Form
If the relationship between x and y is something other than a linear one, the residual plot will often suggest a form more appropriate for the model

For example, if there is a curved relationship between x and y, a plot of residuals will often show a curved relationship
LO14-9
14-34

34

Normality Assumption
If the normality assumption holds, a histogram or stem-and-leaf display of residuals should look bell-shaped and symmetric
Another way to check is a normal plot of residuals
Order residuals from smallest to largest
Plot (i) on vertical axis against (i)
(i) is the point on the horizontal axis under the curve so the area under this curve to the left is (3i – 1)/(3n + 1)
If the normality assumption holds, the plot should have a straight-line appearance
LO14-9
14-35

35

Independence Assumption
Independence assumption most likely violated by time-series data
If the data is not time series, it can be reordered without affecting it
For time-series data, the time-ordered error terms can be autocorrelated
Positive autocorrelation is when a positive error term in time period i tends to be followed by another positive value in i + k
Negative autocorrelation is when a positive error term tends to be followed by a negative value
Either one will cause a cyclical error term over time
LO14-9
14-36

36

LO14-9
Independence Assumption Visually
Figure 14.26 a and b

14-37

37

(
)
(
)
(
)
n
x
x
n
y
y
x
b
y
b
n
x
x
x
x
SS
n
y
x
y
x
y
y
x
x
SS
SS
SS
b
x
b
b
y
i
i
i
i
i
xx
i
i
i
i
i
i
xy
xx
xy
å
å
å
å
å
å
å
å
å
=
=

=

=

=

=

=
=
+
=
and
where
0
β
intercept

y

the
of

estimate
point

squares
Least
)
(
)
(
)
(
1
β

slope

the
of

estimate
point

squares
Least
ˆ
equation
n
/predictio
Estimation
1
0
2
2
2
1
1
0


596.15
129.913,1
389.836,29
129.913,1
10
)1.434(
41.757,120
389.836,29
10
)1.603,8)(1.434(
96.296,403
1
2
2
2







xx
xy
i
ixx
ii
iixy
SS
SS
b
n
x
xSS
n
yx
yxSS
31.183
)41.43)(596.15(31.860
41.43
10
1.434
31.860
10
1.603,8
10






xbyb
n
x
x
n
y
y
i
i

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.