bowerman_9e_chap_152.pptx

Chapter 15
Multiple Regression and Model Building

1

Chapter Outline
15.1 The Multiple Regression Model and the Least Squares Point Estimate
15.2 R2 and Adjusted R2
15.3 Model Assumptions and the Standard Error
15.4 The Overall F Test
15.5 Testing the Significance of an Independent Variable
15.6 Confidence and Prediction Intervals
15-2

2

Chapter Outline Continued
15.7 The Sales Representative Case: Evaluating Employee Performance
15.8 Using Dummy Variables to Model Qualitative Independent Variables (Optional)
15.9 Using Squared and Interaction Variables (Optional)
15.10 Multicollinearity, Model Building and Model Validation (Optional)
15.11 Residual Analysis and Outlier Detection in Multiple Regression (Optional)
15-3

3

15.1 The Multiple Regression Model and the Least Squares Point Estimate
Simple linear regression used one independent variable to explain the dependent variable
Some relationships are too complex to be described using a single independent variable
Multiple regression uses two or more independent variables to describe the dependent variable
This allows multiple regression models to handle more complex situations
There is no limit to the number of independent variables a model can use
Multiple regression has only one dependent variable
LO15-1: Explain the multiple regression model and the related least squares point estimates.
15-4

4

The Multiple Regression Model
The linear regression model relating y to x1, x2,…, xk is y = β0 + β1×1 + β2×2 +…+ βkxk + 
µy = β0 + β1×1 + β2×2 +…+ βkxk is the mean value of the dependent variable y when the values of the independent variables are x1, x2,…, xk
β0, β1, β2,… βk are the unknown regression parameters relating the mean value of y to x1, x2,…, xk
 is an error term that describes the effects on y of all factors other than the independent variables x1, x2,…, xk
LO15-1
15-5

5

The Least Squares Estimates and Point Estimation and Prediction
Estimation/prediction equation
ŷ = b0 + b1x1 + b2x2 + … + bkxk
is the point estimate of the mean value of the dependent variable when the values of the independent variables are x1, x2,…, xk
It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x1, x2,…, xk
b0, b1, b2,…, bk are the least squares point estimates of the parameters β0, β1, β2,…, βk
x1, x2,…, xk are specified values of the independent predictor variables x1, x2,…, xk
LO15-1
15-6

6

LO15-1
Example 15.1 The Tasty Sub Shop Case
Figure 15.4 (a)

15-7

7

15.2 R2 and Adjusted R2
Total variation is given by the formula
Explained variation is given by the formula
Unexplained variation is given by the formula
Total variation is the sum of explained and unexplained variation
LO15-2: Calculate and interpret the multiple and adjusted multiple coefficients of determination.
15-8

8

R2 and Adjusted R2 Continued
The multiple coefficient of determination is the ratio of explained variation to total variation
R2 is the proportion of the total variation that is explained by the overall regression model
Multiple correlation coefficient R is the square root of R2
LO15-2
15-9

9

Multiple Correlation Coefficient R
The multiple correlation coefficient R is just the square root of R2
With simple linear regression, r would take on the sign of b1
There are multiple bi’s with multiple regression
For this reason, R is always positive
To interpret the direction of the relationship between the x’s and y, you must look to the sign of the appropriate bi coefficient
LO15-2
15-10

10

Adjusted R2
Adding an independent variable to multiple regression will raise R2
R2 will rise slightly even if the new variable has no relationship to y
corrects this tendency in R2

As a result, it gives a better estimate of the importance of the independent variables
LO15-2
15-11

11

15.3 Model Assumptions and the Standard Error
The model is y = β0 + β1×1 + β2×2 + … + βkxk + 
Assumptions are stated about the model error terms, ’s

Mean of Zero Assumption: The mean of the error terms is equal to 0
Constant Variance Assumption: The variance of the error terms σ2 is, the same for every combination values of x1, x2,…, xk
Normality Assumption: The error terms follow a normal distribution for every combination values of
x1, x2,…, xk
Independence Assumption: The values of the error terms are statistically independent of each other
LO15-3: Explain the assumptions behind multiple regression and calculate the standard error.
15-12

12

The Mean Square Error and the Standard Error
Sum of squared errors
Mean squared error
Point estimate of the residual variance σ2
Standard error
Point estimate of the residual standard deviation σ
LO15-3
15-13

13

15.4 The Overall F Test
To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0

Test statistic

Reject H0 in favor of Ha if F(model) > F* or
p-value <  *F is based on k numerator and n - (k + 1) denominator degrees of freedom LO15-4: Test the significance of a multiple regression model by using an F test. 15-14 14 15.5 Testing the Significance of an Independent Variable A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y To test significance, we use the null hypothesis H0: βj = 0 Versus the alternative hypothesis Ha: βj ≠ 0 LO15-5: Test the significance of a single independent variable. 15-15 15 Testing the Significance of the Independent Variable xj LO15-5 15-16 16 Testing the Significance of an Independent Variable Continued Customary to test significance of every independent variable If we can reject H0: βj = 0 at =0.05, we have strong evidence the independent variable xj is significantly related to y If we can reject H0: βj = 0 at =0.01, we have very strong evidence the independent variable xj is significantly related to y The smaller the significance level  at which H0 can be rejected, the stronger the evidence that xj is significantly related to y LO15-5 15-17 17 A Confidence Interval for the Regression Parameter βj If the regression assumptions hold, 100 (1 - ) percent confidence interval for βj is [bj ± t/2 Sbj] t/2 is based on n – (k + 1) degrees of freedom LO15-5 15-18 18 15.6 Confidence and Prediction Intervals The point on the regression line corresponding to a particular value of x1, x2,…, xk, of the independent variables is It is unlikely that this value will equal the mean value of y for these x values Therefore, we need to place bounds on how far away the predicted value might be We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y LO15-6: Find and interpret a confidence interval for a mean value and a prediction interval for an individual value. 15-19 19 Distance Value Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value With simple regression, we were able to calculate the distance value fairly easily However, for multiple regression, calculating the distance value requires matrix algebra LO15-6 15-20 20 A Confidence Interval and a Prediction Interval Distance value Assume the regression assumptions hold Confidence interval for the mean value of y Prediction interval for an individual value of y These are based on n - (k + 1) degrees of freedom LO15-6 15-21 21 15.7 The Sales Representative Case: Evaluating Employee Performance yi Yearly sales of the company’s product x1 Number of months the representative has been employed x2 Sales of products in the sales territory x3 Dollar advertising expenditure in the territory x4 Weighted average of the company’s market share in territory for the previous four years x5 Change in the company’s market share in the territory over the previous four years 15-22 22 Partial Excel Output of a Regression Analysis of the Sales Territory Performance Data Figure 15.10a 15-23 Time = 85.42 MktPoten = 35,182.73 Adver = 7,281.65 MktShare = 9.64 Change = .28 Sales Predicted 4,181.74 95% Prediction Interval [3,233.59 to 5,129.89] 23 15.8 Using Dummy Variables to Model Qualitative Independent Variables (Optional) So far, we have only looked at including quantitative data in a regression model However, we may wish to include descriptive qualitative data as well For example, might want to include the gender of respondents We can model the effects of different levels of a qualitative variable by using what are called dummy variables Also known as indicator variables LO15-7: Use dummy variables to model qualitative independent Variables (Optional). 15-24 24 Constructing Dummy Variables A dummy variable always has a value of either 0 or 1 For example, to model sales at two locations, would code the first location as a zero and the second as a 1 Operationally, it does not matter which is coded 0 and which is coded 1 LO15-7 15-25 25 What If We Have More Than Two Categories? Consider having three categories, say A, B and C Cannot code this using one dummy variable A=0, B=1 and C=2 would be invalid Assumes the difference between A and B is the same as B and C We must use multiple dummy variables Specifically, k categories requires k - 1 dummy variables LO15-7 15-26 26 What If We Have Three Categories? For A, B, and C, would need two dummy variables x1 is 1 for A, zero otherwise x2 is 1 for B, zero otherwise If x1 and x2 are zero, must be C This is why the third dummy variable is not needed LO15-7 15-27 27 Interaction Models So far, have only considered dummy variables as stand-alone variables Model so far is y = β0 + β1x + β2DM +  Where D is dummy variable However, can also look at interaction between dummy variable and other variables That model would take the form y = β0 + β1x + β2DM + β3xDM +  With an interaction term, both the intercept and slope are shifted LO15-7 15-28 28 15.9 Using Squared and Interaction Variables (Optional) Quadratic regression model is: y = β0 + β1x + β2x2 ε where β0 + β1x + β2x2 is μy β, β1, and β2 are the regression parameters ε is an error term LO15-8: Use squared and interaction variables. 15-29 29 Using Interaction Variables Regression models often contain interaction variables Formed by multiplying two independent variables together Consider a model where x3 and x4 interact and x3 is used as a quadratic y = β0 + β1x4 + β2x3 + β3x32 + β4x4x3 + ε LO15-8 15-30 30 15.10 Multicollinearity, Model Building, and Model Validation (Optional) Multicollinearity: when “independent” variables are related to one another Considered severe when the simple correlation exceeds 0.9 Even moderate multicollinearity can be a problem Another measurement is variance inflation factors Multicollinearity considered Severe when VIF > 10
Moderately strong for VIF > 5
LO15-9: Describe multicollinearity and build and validate a multiple regression model (Optional).
15-31

31

Effect of Adding Independent Variable
Adding any independent variable will increase R²
Even adding an unimportant independent variable
Thus, R² cannot tell us that adding an independent variable is undesirable

LO15-9
15-32

32

A Better Criterion is the Standard Error
A better criterion is the size of the standard error s
If s increases when an independent variable is added, we should not add that variable
However, decreasing s alone is not enough
An independent variable should only be included if it reduces s enough to offset the higher t value and reduces the length of the desired prediction interval for y
LO15-9
15-33

33

C Statistic
Another quantity for comparing regression models is called the C (a.k.a. Cp) statistic,
First, calculate mean square error for the model containing all p potential independent variables (s2p)
Next, calculate SSE for a reduced model with k independent variables
LO15-9
15-34

34

C Statistic Continued
We want the value of C to be small
Adding unimportant independent variables will raise the value of C
While we want C to be small, we also wish to find a model for which C roughly equals k + 1
A model with C substantially greater than k + 1 has substantial bias and is undesirable
If a model has a small value of C and C for this model is less than k + 1, then it is not biased and the model should be considered desirable
LO15-9
15-35

35

The Partial F Test: An F Test for a Portion of a Regression Model
To test
H0: All of the βj coefficients corresponding to the independent variables in the subset are zero
Ha: At least one of the βj coefficients is not equal to zero

Reject H0 in favor of Ha if:
F(partial) > F or
p-value <  F is based on k - g numerator and n - (k + 1) denominator degrees of freedom LO15-9 15-36 36 15.11 Residual Analysis and Outlier Detection in Multiple Regression (Optional) For an observed value of yi, the residual is i = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik) If the assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance σ2 Residual plots Residuals versus each independent variable Residuals versus predicted y’s Residuals in time order (if the response is a time series) LO15-10: Use residual analysis and outlier detection to check the assumptions of multiple regression (Optional). 15-37 Figure 15.35 37 LO15-10 Outliers Figure 15.37 c, d and e 15-38

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

bowerman_9e_chap_152.pptx

Products

Recent Posts

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee