bowerman_9e_chap_052.pptx

Chapter 5
Predictive Analytics I: Trees, k‑Nearest Neighbors, Naive Bayes’, and Ensemble Estimates

Chapter Outline
5.1 Decision Trees I: Classification Trees
5.2 Decision Trees II: Regression Trees
5.3 k-Nearest Neighbors
5.4 Naive Bayes’ Classification
5.5 An Introduction to Ensemble Estimates

5-2

5.1 Decision Trees I: Classification Trees
Decision trees
Regression tree: predicting a quantitative response variable
Classification tree: predicting a qualitative or categorical response variable
Dummy variable: a quantitative variable used to represent a qualitative variable
Training data: portion of the data used to fit the analytic
Validation data: portion of the data used to assess how well the analytic fitted to the training data fits data different from the training data
LO5-1: Interpret the information provided by classification trees.
5-3

Decision Trees I: Classification Trees
Prediction of upgrade for a fee
Studied 40 existing customers
Offer upgrade
Response variables
1 – upgraded
0 – did not upgrade
Purchases
Recorded in thousands of dollars
Predictor variables
1 – fits profile
0 – did not fit profile

LO5-1
5-4

Decision Trees I: Classification Trees Continued
Sample proportion
Examine potential predictor
with purchases ≥ that value who upgraded
with purchases < that value who upgraded conforming to profile (1) who upgraded not conforming to profile (0) who upgraded that upgraded = 19/40 = .4750 or 47.50 percent LO5-1 5-5 LO5-1 A JMP classification Tree for the Card Upgrade Data Figure 5.1 (a) 5-6 Decision Trees I: Classification Trees Continued Combination of predictor variable and split point produced Intuitively produces greatest difference between proportion who upgraded and who did not upgrade Continues searching on two resulting groups Stops splitting at a leaf (terminal leaf) Produces a leaf < specified minimum split size is either 1 or 0 pure leaf – no splitting possible LO5-1 5-7 Decision Trees I: Classification Trees Continued Confusion matrix: summarizes a classification analytics' success in classifying observations in the training data set and/or validation data set Entropy RSquare: the square of the simple correlation coefficient between the observed 0 and 1 upgrade values and the corresponding upgrade probability estimates LO5-1 5-8 5.2 Decision Trees II: Regression Trees 705 applicants studied to predict college GPA 50% - training data set (352) 50% - validation data set (353) Compute for each group Use prediction(s) to calculate three quantities MSE RMSE RSquare Examine each predictor variable and every possible way of splitting the values of each predictor variable into two groups LO5-2: Interpret the information provided by regression trees. 5-9 9 LO5-2 Final Regression Tree Figure 5.12 (c) 5-10 5.3 k-Nearest Neighbors Nearest neighbors to an observation are determined by measuring the distance between the set of predictor variable values for that observation and the set of predictor variable values for every other observation Predicting a quantitative response variable using k-nearest neighbors is the same as classifying a qualitative response variable except that we predict the quantitative response variable by averaging the response variable values for the k‑nearest neighbors LO5-3: Interpret the information provided by k-nearest neighbors. 5-11 11 LO5-3 Nearest Neighbors in the Upgrade Example Figure 5.26 partial 5-12 LO5-3 Classification Using Nearest Neighbors in the Upgrade Example Figure 5.27 partial 5-13 5.4 Naive Bayes’ Classification Uses a “naive“ version of Bayes’ Theorem to classify observations Full version of Bayes’ Theorem Naive version of Bayes’ Theorem LO5-4: Interpret the information provided by naive Bayes’ classification. 5-14 14 5.5 An Introduction to Ensemble Estimates Ensemble Estimate: combines the estimates or predictions obtained from different analytics to arrive at an overall result LO5-5: Interpret the information provided by ensemble models. 5-15 Table 5.3 15

Continue to order Get a quote

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.