# bowerman_9e_chap_052.pptx

Chapter 5
Predictive Analytics I: Trees, k‑Nearest Neighbors, Naive Bayes’, and Ensemble Estimates

1

Chapter Outline
5.1 Decision Trees I: Classification Trees
5.2 Decision Trees II: Regression Trees
5.3 k-Nearest Neighbors
5.4 Naive Bayes’ Classification
5.5 An Introduction to Ensemble Estimates

5-2

2

5.1 Decision Trees I: Classification Trees
Decision trees
Regression tree: predicting a quantitative response variable
Classification tree: predicting a qualitative or categorical response variable
Dummy variable: a quantitative variable used to represent a qualitative variable
Training data: portion of the data used to fit the analytic
Validation data: portion of the data used to assess how well the analytic fitted to the training data fits data different from the training data
LO5-1: Interpret the information provided by classification trees.
5-3

3

Decision Trees I: Classification Trees
Prediction of upgrade for a fee
Studied 40 existing customers
Response variables
Purchases
Recorded in thousands of dollars
Predictor variables
1 – fits profile
0 – did not fit profile

LO5-1
5-4

Decision Trees I: Classification Trees Continued
Sample proportion
Examine potential predictor
with purchases ≥ that value who upgraded
with purchases < that value who upgraded conforming to profile (1) who upgraded not conforming to profile (0) who upgraded that upgraded = 19/40 = .4750 or 47.50 percent LO5-1 5-5 LO5-1 A JMP classification Tree for the Card Upgrade Data Figure 5.1 (a) 5-6 Decision Trees I: Classification Trees Continued Combination of predictor variable and split point produced Intuitively produces greatest difference between proportion who upgraded and who did not upgrade Continues searching on two resulting groups Stops splitting at a leaf (terminal leaf) Produces a leaf < specified minimum split size is either 1 or 0 pure leaf – no splitting possible LO5-1 5-7 Decision Trees I: Classification Trees Continued Confusion matrix: summarizes a classification analytics' success in classifying observations in the training data set and/or validation data set
Entropy RSquare: the square of the simple correlation coefficient between the observed 0 and 1 upgrade values and the corresponding upgrade probability estimates

LO5-1
5-8

5.2 Decision Trees II: Regression Trees
705 applicants studied to predict college GPA
50% – training data set (352)
50% – validation data set (353)
Compute for each group
Use prediction(s) to calculate three quantities
MSE
RMSE
RSquare
Examine each predictor variable and every possible way of splitting the values of each predictor variable into two groups
LO5-2: Interpret the information provided by regression trees.
5-9

9

LO5-2
Final Regression Tree
Figure 5.12 (c)

5-10

5.3 k-Nearest Neighbors
Nearest neighbors to an observation are determined by measuring the distance between the set of predictor variable values for that observation and the set of predictor variable values for every other observation
Predicting a quantitative response variable using k-nearest neighbors is the same as classifying a qualitative response variable except that we predict the quantitative response variable by averaging the response variable values for the k‑nearest neighbors
LO5-3: Interpret the information provided by k-nearest neighbors.
5-11

11

LO5-3
Nearest Neighbors in the Upgrade Example
Figure 5.26 partial

5-12

LO5-3
Classification Using Nearest Neighbors in the Upgrade Example
Figure 5.27 partial

5-13

5.4 Naive Bayes’ Classification
Uses a “naive“ version of Bayes’ Theorem to classify observations

Full version of Bayes’ Theorem

Naive version of Bayes’ Theorem

LO5-4: Interpret the information provided by naive Bayes’ classification.
5-14

14

5.5 An Introduction to Ensemble Estimates
Ensemble Estimate: combines the estimates or predictions obtained from different analytics to arrive at an overall result

LO5-5: Interpret the information provided by ensemble models.
5-15

Table 5.3

15

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.