Chapter 5
Predictive Analytics I: Trees, k‑Nearest Neighbors, Naive Bayes’, and Ensemble Estimates
Copyright ©2018 McGraw-Hill Education. All rights reserved.
1
Chapter Outline
5.1 Decision Trees I: Classification Trees
5.2 Decision Trees II: Regression Trees
5.3 k-Nearest Neighbors
5.4 Naive Bayes’ Classification
5.5 An Introduction to Ensemble Estimates
5-2
2
5.1 Decision Trees I: Classification Trees
Decision trees
Regression tree: predicting a quantitative response variable
Classification tree: predicting a qualitative or categorical response variable
Dummy variable: a quantitative variable used to represent a qualitative variable
Training data: portion of the data used to fit the analytic
Validation data: portion of the data used to assess how well the analytic fitted to the training data fits data different from the training data
LO5-1: Interpret the information provided by classification trees.
5-3
3
Decision Trees I: Classification Trees
Prediction of upgrade for a fee
Studied 40 existing customers
Offer upgrade
Response variables
1 – upgraded
0 – did not upgrade
Purchases
Recorded in thousands of dollars
Predictor variables
1 – fits profile
0 – did not fit profile
LO5-1
5-4
Decision Trees I: Classification Trees Continued
Sample proportion
Examine potential predictor
with purchases ≥ that value who upgraded
with purchases < that value who upgraded
conforming to profile (1) who upgraded
not conforming to profile (0) who upgraded
that upgraded
= 19/40 = .4750 or 47.50 percent
LO5-1
5-5
LO5-1
A JMP classification Tree for the Card Upgrade Data
Figure 5.1 (a)
5-6
Decision Trees I: Classification Trees Continued
Combination of predictor variable and split point produced
Intuitively produces greatest difference between proportion who upgraded and who did not upgrade
Continues searching on two resulting groups
Stops splitting at a leaf (terminal leaf)
Produces a leaf < specified minimum split size
is either 1 or 0
pure leaf – no splitting possible
LO5-1
5-7
Decision Trees I: Classification Trees Continued
Confusion matrix: summarizes a classification analytics' success in classifying observations in the training data set and/or validation data set
Entropy RSquare: the square of the simple correlation coefficient between the observed 0 and 1 upgrade values and the corresponding upgrade probability estimates
LO5-1
5-8
5.2 Decision Trees II: Regression Trees
705 applicants studied to predict college GPA
50% - training data set (352)
50% - validation data set (353)
Compute for each group
Use prediction(s) to calculate three quantities
MSE
RMSE
RSquare
Examine each predictor variable and every possible way of splitting the values of each predictor variable into two groups
LO5-2: Interpret the information provided by regression trees.
5-9
9
LO5-2
Final Regression Tree
Figure 5.12 (c)
5-10
5.3 k-Nearest Neighbors
Nearest neighbors to an observation are determined by measuring the distance between the set of predictor variable values for that observation and the set of predictor variable values for every other observation
Predicting a quantitative response variable using k-nearest neighbors is the same as classifying a qualitative response variable except that we predict the quantitative response variable by averaging the response variable values for the k‑nearest neighbors
LO5-3: Interpret the information provided by k-nearest neighbors.
5-11
11
LO5-3
Nearest Neighbors in the Upgrade Example
Figure 5.26 partial
5-12
LO5-3
Classification Using Nearest Neighbors in the Upgrade Example
Figure 5.27 partial
5-13
5.4 Naive Bayes’ Classification
Uses a “naive“ version of Bayes’ Theorem to classify observations
Full version of Bayes’ Theorem
Naive version of Bayes’ Theorem
LO5-4: Interpret the information provided by naive Bayes’ classification.
5-14
14
5.5 An Introduction to Ensemble Estimates
Ensemble Estimate: combines the estimates or predictions obtained from different analytics to arrive at an overall result
LO5-5: Interpret the information provided by ensemble models.
5-15
Table 5.3
15
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more