homework/text-book-Introduction to Data Mining-9780134080284 (2nd ed).compressed.pdf
INTRODUCTION TO DATA
MINING
INTRODUCTION TO DATA
MINING
SECOND EDITION
PANG-NING TAN
Michigan State University
MICHAEL STEINBACH
University of Minnesota
ANUJ KARPATNE
University of Minnesota
VIPIN KUMAR
University of Minnesota
330 Hudson Street, NY NY 10013
Director, Portfolio Management: Engineering, Computer Science & Global Editions:
Julian Partridge
Specialist, Higher Ed Portfolio Management: Matt Goldstein
Portfolio Management Assistant: Meghan Jacoby
Managing Content Producer: Scott Disanno
Content Producer: Carole Snyder
Web Developer: Steve Wright
Rights and Permissions Manager: Ben Ferrini
Manufacturing Buyer, Higher Ed, Lake Side Communications Inc (LSC): Maura
Zaldivar-Garcia
Inventory Manager: Ann Lam
Product Marketing Manager: Yvonne Vannatta
Field Marketing Manager: Demetrius Hall
Marketing Assistant: Jon Bryant
Cover Designer: Joyce Wells, jWellsDesign
Full-Service Project Management: Chandrasekar Subramanian, SPi Global
Copyright ©2019 Pearson Education, Inc. All rights reserved. Manufactured in the
United States of America. This publication is protected by Copyright, and
permission should be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any
means, electronic, mechanical, photocopying, recording, or likewise. For
information regarding permissions, request forms and the appropriate contacts
within the Pearson Education Global Rights & Permissions department, please visit
www.pearsonhighed.com/permissions/.
Many of the designations by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and the
publisher was aware of a trademark claim, the designations have been printed in
initial caps or all caps.
Library of Congress Cataloging-in-Publication Data on File
Names: Tan, Pang-Ning, author. | Steinbach, Michael, author. | Karpatne, Anuj,
author. | Kumar, Vipin, 1956- author.
Title: Introduction to Data Mining / Pang-Ning Tan, Michigan State University,
Michael Steinbach, University of Minnesota, Anuj Karpatne, University of
Minnesota, Vipin Kumar, University of Minnesota.
Description: Second edition. | New York, NY : Pearson Education, [2019] |
Includes bibliographical references and index.
Identifiers: LCCN 2017048641 | ISBN 9780133128901 | ISBN 0133128903
Subjects: LCSH: Data mining.
Classification: LCC QA76.9.D343 T35 2019 | DDC 006.3/12–dc23 LC record
available at https://lccn.loc.gov/2017048641
1 18
ISBN-10: 0133128903
ISBN-13: 9780133128901
To our families …
Preface to the Second Edition
Since the first edition, roughly 12 years ago, much has changed in the field of data
analysis. The volume and variety of data being collected continues to increase, as
has the rate (velocity) at which it is being collected and used to make decisions.
Indeed, the term, Big Data, has been used to refer to the massive and diverse data
sets now available. In addition, the term data science has been coined to describe
an emerging area that applies tools and techniques from various fields, such as
data mining, machine learning, statistics, and many others, to extract actionable
insights from data, often big data.
The growth in data has created numerous opportunities for all areas of data
analysis. The most dramatic developments have been in the area of predictive
modeling, across a wide range of application domains. For instance, recent
advances in neural networks, known as deep learning, have shown impressive
results in a number of challenging areas, such as image classification, speech
recognition, as well as text categorization and understanding. While not as
dramatic, other areas, e.g., clustering, association analysis, and anomaly detection
have also continued to advance. This new edition is in response to those advances.
Overview
As with the first edition, the second edition of the book provides a comprehensive
introduction to data mining and is designed to be accessible and useful to students,
instructors, researchers, and professionals. Areas covered include data
preprocessing, predictive modeling, association analysis, cluster analysis, anomaly
detection, and avoiding false discoveries. The goal is to present fundamental
concepts and algorithms for each topic, thus providing the reader with the
necessary background for the application of data mining to real problems. As
before, classification, association analysis and cluster analysis, are each covered in
a pair of chapters. The introductory chapter covers basic concepts, representative
algorithms, and evaluation techniques, while the more following chapter discusses
advanced concepts and algorithms. As before, our objective is to provide the
reader with a sound understanding of the foundations of data mining, while still
covering many important advanced topics. Because of this approach, the book is
useful both as a learning tool and as a reference.
To help readers better understand the concepts that have been presented, we
provide an extensive set of examples, figures, and exercises. The solutions to the
original exercises, which are already circulating on the web, will be made public.
The exercises are mostly unchanged from the last edition, with the exception of
new exercises in the chapter on avoiding false discoveries. New exercises for the
other chapters and their solutions will be available to instructors via the web.
Bibliographic notes are included at the end of each chapter for readers who are
interested in more advanced topics, historically important papers, and recent
trends. These have also been significantly updated. The book also contains a
comprehensive subject and author index.
What is New in the Second Edition?
Some of the most significant improvements in the text have been in the two
chapters on classification. The introductory chapter uses the decision tree classifier
for illustration, but the discussion on many topics—those that apply across all
classification approaches—has been greatly expanded and clarified, including
topics such as overfitting, underfitting, the impact of training size, model complexity,
model selection, and common pitfalls in model evaluation. Almost every section of
the advanced classification chapter has been significantly updated. The material on
Bayesian networks, support vector machines, and artificial neural networks has
been significantly expanded. We have added a separate section on deep networks
to address the current developments in this area. The discussion of evaluation,
which occurs in the section on imbalanced classes, has also been updated and
improved.
The changes in association analysis are more localized. We have completely
reworked the section on the evaluation of association patterns (introductory
chapter), as well as the sections on sequence and graph mining (advanced
chapter). Changes to cluster analysis are also localized. The introductory chapter
added the K-means initialization technique and an updated the discussion of cluster
evaluation. The advanced clustering chapter adds a new section on spectral graph
clustering. Anomaly detection has been greatly revised and expanded. Existing
approaches—statistical, nearest neighbor/density-based, and clustering based—
have been retained and updated, while new approaches have been added:
reconstruction-based, one-class classification, and information-theoretic. The
reconstruction-based approach is illustrated using autoencoder networks that are
part of the deep learning paradigm. The data chapter has been updated to include
discussions of mutual information and kernel-based techniques.
The last chapter, which discusses how to avoid false discoveries and produce valid
results, is completely new, and is novel among other contemporary textbooks on
data mining. It supplements the discussions in the other chapters with a discussion
of the statistical concepts (statistical significance, p-values, false discovery rate,
permutation testing, etc.) relevant to avoiding spurious results, and then illustrates
these concepts in the context of data mining techniques. This chapter addresses
the increasing concern over the validity and reproducibility of results obtained from
data analysis. The addition of this last chapter is a recognition of the importance of
this topic and an acknowledgment that a deeper understanding of this area is
needed for those analyzing data.
The data exploration chapter has been deleted, as have the appendices, from the
print edition of the book, but will remain available on the web. A new appendix
provides a brief discussion of scalability in the context of big data.
To the Instructor
As a textbook, this book is suitable for a wide range of students at the advanced
undergraduate or graduate level. Since students come to this subject with diverse
backgrounds that may not include extensive knowledge of statistics or databases,
our book requires minimal prerequisites. No database knowledge is needed, and
we assume only a modest background in statistics or mathematics, although such a
background will make for easier going in some sections. As before, the book, and
more specifically, the chapters covering major data mining topics, are designed to
be as self-contained as possible. Thus, the order in which topics can be covered is
quite flexible. The core material is covered in chapters 2 (data), 3 (classification), 5
(association analysis), 7 (clustering), and 9 (anomaly detection). We recommend at
least a cursory coverage of Chapter 10 (Avoiding False Discoveries) to instill in
students some caution when interpreting the results of their data analysis. Although
the introductory data chapter (2) should be covered first, the basic classification (3),
association analysis (5), and clustering chapters (7), can be covered in any order.
Because of the relationship of anomaly detection (9) to classification (3) and
clustering (7), these chapters should precede Chapter 9. Various topics can be
selected from the advanced classification, association analysis, and clustering
chapters (4, 6, and 8, respectively) to fit the schedule and interests of the instructor
and students. We also advise that the lectures be augmented by projects or
practical exercises in data mining. Although they are time consuming, such hands-
on assignments greatly enhance the value of the course.
Support Materials
Support materials available to all readers of this book are available at http://www-
users.cs.umn.edu/~kumar/dmbook.
PowerPoint lecture slides
Suggestions for student projects
Data mining resources, such as algorithms and data sets
Online tutorials that give step-by-step examples for selected data mining
techniques described in the book using actual data sets and data analysis
software
Additional support materials, including solutions to exercises, are available only to
instructors adopting this textbook for classroom use. The book’s resources will be
mirrored at www.pearsonhighered.com/cs-resources. Comments and
suggestions, as well as reports of errors, can be sent to the authors through
[email protected].
Acknowledgments
Many people contributed to the first and second editions of the book. We begin by
acknowledging our families to whom this book is dedicated. Without their patience
and support, this project would have been impossible.
We would like to thank the current and former students of our data mining groups at
the University of Minnesota and Michigan State for their contributions. Eui-Hong
(Sam) Han and Mahesh Joshi helped with the initial data mining classes. Some of
the exercises and presentation slides that they created can be found in the book
and its accompanying slides. Students in our data mining groups who provided
comments on drafts of the book or who contributed in other ways include Shyam
Boriah, Haibin Cheng, Varun Chandola, Eric Eilertson, Levent Ertöz, Jing Gao,
Rohit Gupta, Sridhar Iyer, Jung-Eun Lee, Benjamin Mayer, Aysel Ozgur, Uygar
Oztekin, Gaurav Pandey, Kashif Riaz, Jerry Scripps, Gyorgy Simon, Hui Xiong,
Jieping Ye, and Pusheng Zhang. We would also like to thank the students of our
data mining classes at the University of Minnesota and Michigan State University
who worked with early drafts of the book and provided invaluable feedback. We
specifically note the helpful suggestions of Bernardo Craemer, Arifin Ruslim,
Jamshid Vayghan, and Yu Wei.
http://www.pearsonhighered.com/cs-resources
Joydeep Ghosh (University of Texas) and Sanjay Ranka (University of Florida)
class tested early versions of the book. We also received many useful suggestions
directly from the following UT students: Pankaj Adhikari, Rajiv Bhatia, Frederic
Bosche, Arindam Chakraborty, Meghana Deodhar, Chris Everson, David Gardner,
Saad Godil, Todd Hay, Clint Jones, Ajay Joshi, Joonsoo Lee, Yue Luo, Anuj
Nanavati, Tyler Olsen, Sunyoung Park, Aashish Phansalkar, Geoff Prewett, Michael
Ryoo, Daryl Shannon, and Mei Yang.
Ronald Kostoff (ONR) read an early version of the clustering chapter and offered
numerous suggestions. George Karypis provided invaluable LATEX assistance in
creating an author index. Irene Moulitsas also provided assistance with LATEX and
reviewed some of the appendices. Musetta Steinbach was very helpful in finding
errors in the figures.
We would like to acknowledge our colleagues at the University of Minnesota and
Michigan State who have helped create a positive environment for data mining
research. They include Arindam Banerjee, Dan Boley, Joyce Chai, Anil Jain, Ravi
Janardan, Rong Jin, George Karypis, Claudia Neuhauser, Haesun Park, William F.
Punch, György Simon, Shashi Shekhar, and Jaideep Srivastava. The collaborators
on our many data mining projects, who also have our gratitude, include Ramesh
Agrawal, Maneesh Bhargava, Steve Cannon, Alok Choudhary, Imme Ebert-Uphoff,
Auroop Ganguly, Piet C. de Groen, Fran Hill, Yongdae Kim, Steve Klooster, Kerry
Long, Nihar Mahapatra, Rama Nemani, Nikunj Oza, Chris Potter, Lisiane Pruinelli,
Nagiza Samatova, Jonathan Shapiro, Kevin Silverstein, Brian Van Ness, Bonnie
Westra, Nevin Young, and Zhi-Li Zhang.
The departments of Computer Science and Engineering at the University of
Minnesota and Michigan State University provided computing resources and a
supportive environment for this project. ARDA, ARL, ARO, DOE, NASA, NOAA,
and NSF provided research support for Pang-Ning Tan, Michael Stein-bach, Anuj
Karpatne, and Vipin Kumar. In particular, Kamal Abdali, Mitra Basu, Dick Brackney,
Jagdish Chandra, Joe Coughlan, Michael Coyle, Stephen Davis, Frederica
Darema, Richard Hirsch, Chandrika Kamath, Tsengdar Lee, Raju Namburu, N.
Radhakrishnan, James Sidoran, Sylvia Spengler, Bhavani Thuraisingham, Walt
Tiernin, Maria Zemankova, Aidong Zhang, and Xiaodong Zhang have been
supportive of our research in data mining and high-performance computing.
It was a pleasure working with the helpful staff at Pearson Education. In particular,
we would like to thank Matt Goldstein, Kathy Smith, Carole Snyder, and Joyce
Wells. We would also like to thank George Nichols, who helped with the art work
and Paul Anagnostopoulos, who provided LATEX support.
We are grateful to the following Pearson reviewers: Leman Akoglu (Carnegie
Mellon University), Chien-Chung Chan (University of Akron), Zhengxin Chen
(University of Nebraska at Omaha), Chris Clifton (Purdue University), Joy-deep
Ghosh (University of Texas, Austin), Nazli Goharian (Illinois Institute of
Technology), J. Michael Hardin (University of Alabama), Jingrui He (Arizona State
University), James Hearne (Western Washington University), Hillol Kargupta
(University of Maryland, Baltimore County and Agnik, LLC), Eamonn Keogh
(University of California-Riverside), Bing Liu (University of Illinois at Chicago),
Mariofanna Milanova (University of Arkansas at Little Rock), Srinivasan
Parthasarathy (Ohio State University), Zbigniew W. Ras (University of North
Carolina at Charlotte), Xintao Wu (University of North Carolina at Charlotte), and
Mohammed J. Zaki (Rensselaer Polytechnic Institute).
Over the years since the first edition, we have also received numerous comments
from readers and students who have pointed out typos and various other issues.
We are unable to mention these individuals by name, but their input is much
appreciated and has been taken into account for the second edition.
Contents
Preface to the Second Edition v
1 Introduction 1
1.1 What Is Data Mining? 4
1.2 Motivating Challenges 5
1.3 The Origins of Data Mining 7
1.4 Data Mining Tasks 9
1.5 Scope and Organization of the Book 13
1.6 Bibliographic Notes 15
1.7 Exercises 21
2 Data 23
2.1 Types of Data 26
2.1.1 Attributes and Measurement 27
2.1.2 Types of Data Sets 34
2.2 Data Quality 42
2.2.1 Measurement and Data Collection Issues 42
2.2.2 Issues Related to Applications 49
2.3 Data Preprocessing 50
2.3.1 Aggregation 51
2.3.2 Sampling 52
2.3.3 Dimensionality Reduction 56
2.3.4 Feature Subset Selection 58
2.3.5 Feature Creation 61
2.3.6 Discretization and Binarization 63
2.3.7 Variable Transformation 69
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000005759.xhtml#P7001014785000000000000000005759
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000953.xhtml#P7001014785000000000000000000953
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000963.xhtml#P7001014785000000000000000000963
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000982.xhtml#P7001014785000000000000000000982
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000998.xhtml#P7001014785000000000000000000998
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000009AC.xhtml#P70010147850000000000000000009AC
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000A1F.xhtml#P7001014785000000000000000000A1F
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000A29.xhtml#P7001014785000000000000000000A29
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000AB7.xhtml#P7001014785000000000000000000AB7
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000AE6.xhtml#P7001014785000000000000000000AE6
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000B23.xhtml#P7001014785000000000000000000B23
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000B23.xhtml#P7001014785000000000000000000B41
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000B23.xhtml#P7001014785000000000000000000BCF
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000C42.xhtml#P7001014785000000000000000000C42
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000C42.xhtml#P7001014785000000000000000000C46
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000C42.xhtml#P7001014785000000000000000000CA8
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000CB8
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000CCD
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000D08
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000D34
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000D42
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000D6F
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000D93
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000CB8.xhtml#P7001014785000000000000000000E3A
2.4 Measures of Similarity and Dissimilarity 71
2.4.1 Basics 72
2.4.2 Similarity and Dissimilarity between Simple Attributes 74
2.4.3 Dissimilarities between Data Objects 76
2.4.4 Similarities between Data Objects 78
2.4.5 Examples of Proximity Measures 79
2.4.6 Mutual Information 88
2.4.7 Kernel Functions* 90
2.4.8 Bregman Divergence* 94
2.4.9 Issues in Proximity Calculation 96
2.4.10 Selecting the Right Proximity Measure 98
2.5 Bibliographic Notes 100
2.6 Exercises 105
3 Classification: Basic Concepts and Techniques 113
3.1 Basic Concepts 114
3.2 General Framework for Classification 117
3.3 Decision Tree Classifier 119
3.3.1 A Basic Algorithm to Build a Decision Tree 121
3.3.2 Methods for Expressing Attribute Test Conditions 124
3.3.3 Measures for Selecting an Attribute Test Condition 127
3.3.4 Algorithm for Decision Tree Induction 136
3.3.5 Example Application: Web Robot Detection 138
3.3.6 Characteristics of Decision Tree Classifiers 140
3.4 Model Overfitting 147
3.4.1 Reasons for Model Overfitting 149
3.5 Model Selection 156
3.5.1 Using a Validation Set 156
3.5.2 Incorporating Model Complexity 157
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000E47
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000E4F
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000E60
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000E7B
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000F1D
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000F2A
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000000FCD
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000001044
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P7001014785000000000000000001088
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P70010147850000000000000000010A1
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000000E47.xhtml#P70010147850000000000000000010CC
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000010D6.xhtml#P70010147850000000000000000010D6
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001160.xhtml#P7001014785000000000000000001160
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP700101478500000000000000000125F.xhtml#P700101478500000000000000000125F
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP700101478500000000000000000126C.xhtml#P700101478500000000000000000126C
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001395.xhtml#P7001014785000000000000000001395
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P70010147850000000000000000013C5
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P70010147850000000000000000013EE
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P7001014785000000000000000001418
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P700101478500000000000000000145A
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P7001014785000000000000000001525
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P700101478500000000000000000153D
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000013C5.xhtml#P700101478500000000000000000155E
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001618.xhtml#P7001014785000000000000000001618
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001618.xhtml#P700101478500000000000000000163D
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000016B2.xhtml#P70010147850000000000000000016B2
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000016B2.xhtml#P70010147850000000000000000016B5
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000016B2.xhtml#P70010147850000000000000000016CB
3.5.3 Estimating Statistical Bounds 162
3.5.4 Model Selection for Decision Trees 162
3.6 Model Evaluation 164
3.6.1 Holdout Method 165
3.6.2 Cross-Validation 165
3.7 Presence of Hyper-parameters 168
3.7.1 Hyper-parameter Selection 168
3.7.2 Nested Cross-Validation 170
3.8 Pitfalls of Model Selection and Evaluation 172
3.8.1 Overlap between Training and Test Sets 172
3.8.2 Use of Validation Error as Generalization Error 172
3.9 Model Comparison 173
3.9.1 Estimating the Confidence Interval for Accuracy 174
3.9.2 Comparing the Performance of Two Models 175
3.10 Bibliographic Notes 176
3.11 Exercises 185
4 Classification: Alternative Techniques 193
4.1 Types of Classifiers 193
4.2 Rule-Based Classifier 195
4.2.1 How a Rule-Based Classifier Works 197
4.2.2 Properties of a Rule Set 198
4.2.3 Direct Methods for Rule Extraction 199
4.2.4 Indirect Methods for Rule Extraction 204
4.2.5 Characteristics of Rule-Based Classifiers 206
4.3 Nearest Neighbor Classifiers 208
4.3.1 Algorithm 209
4.3.2 Characteristics of Nearest Neighbor Classifiers 210
*
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000016B2.xhtml#P7001014785000000000000000001711
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000016B2.xhtml#P700101478500000000000000000171A
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP700101478500000000000000000172F.xhtml#P700101478500000000000000000172F
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP700101478500000000000000000172F.xhtml#P7001014785000000000000000001735
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP700101478500000000000000000172F.xhtml#P700101478500000000000000000173A
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001750.xhtml#P7001014785000000000000000001750
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001750.xhtml#P7001014785000000000000000001756
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001750.xhtml#P7001014785000000000000000001770
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001789.xhtml#P7001014785000000000000000001789
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001789.xhtml#P700101478500000000000000000178D
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001789.xhtml#P7001014785000000000000000001791
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001797.xhtml#P7001014785000000000000000001797
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001797.xhtml#P70010147850000000000000000017A1
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001797.xhtml#P70010147850000000000000000017E0
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000017F0.xhtml#P70010147850000000000000000017F0
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP70010147850000000000000000018A8.xhtml#P70010147850000000000000000018A8
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001BDE.xhtml#P7001014785000000000000000001BDE
https://jigsaw.vitalsource.com/books/9780134080284/epub/OPS/xhtml/fileP7001014785000000000000000001BE3.xhtml#P7001014785000000000000000001BE3
…
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more