ITS-632 Introduction to Data
Mining
Copyright © Prof. Kwang Lee All rights reserved.
Kwang Lee, Ph.D.
Computer and Information Science
Cumberland University
2
Association Analysis
Lecture 5
Assignment #2: 06/05 (11:59pm, Saturday)
Copyright © Prof. Kwang Lee All rights reserved.
Announcement!!!
Lecture Overview
Introduce association rules
Review association rule mining
– Itemsets
– General construction
– Frequent itemsets
– Reducing frequent itemsets
Study Apriori algorithms
– Illustrating apriori principle
– Apriori algorithm pruning
Review rule generation
– Rule generation for Apriori algorithm
– Multiple minimum support
Know pattern evaluation
1. Introduction to Association
This chapter presents a methodology known as
association analysis, which is useful for
discovering interesting hidden relationship in
large sets
1. Introduction to Association
Association is a data mining function that
discovers the probability of the co-occurrence of
items in a collection.
– The relationships between co-occurring items are
expressed as association rules
First proposed by Agrawal, Imielinski, and Swami
in the context of frequent itemsets and
association rule mining
– Frequent pattern: a pattern (a set of items,
subsequences, substructures, etc.) that occurs
frequently in a data set
1. Introduction to Association
Motivation: finding inherent regularities in data
– What products were often purchased together?
◆ Beer and diapers?
– What are the subsequent purchases after buying a
PC?
◆ Mouse and keyboard?
– What kinds of DNA are sensitive to this new drug?
◆ Tylenol and advil?
1. Introduction to Association
Applications:
– Basket data analysis, cross-marketing, catalog
design, sale campaign analysis, Web log (click
stream) analysis, and DNA sequence analysis.
Advantages:
– The advantage of association rule discovery over
other data mining techniques is that this method is
relatively “simple” and does not require building and
training a classification or prediction model.
– This advantage is especially significant in real-time
applications like online stores.
1. Introduction to Association
Association is a data mining function that
discovers the probability of the co-occurrence of
items in a collection.
The relationships between co-occurring items are
expressed as association rules.
1. Introduction to Association
Association rules are widely used to analyze
retail basket or transaction data
– Association rules are intended to identify strong
rules discovered in transaction data using measures
of interesting measures, based on the concept of
strong rules.
Visualization of Association Rules:
DBMiner
Visualization of Association Rules:
DBMiner
Visualization of Association Rules
(MineSet 3.0)
2. Association Rule Mining
Unlike other data mining functions, association is
transaction-based.
In transaction processing, a case consists of a
transaction such as a market basket.
2. Association Rule Mining
The collection of items in the transaction is an
attribute of the transaction.
– Other attributes might be the date, time, location, or
user ID associated with the transaction.
2. Association Rule Mining
In transactional data, a collection of items is
associated with each case.
So, the collection could theoretically include all
possible members of the collection.
– For example, all products could theoretically be
purchased in a single market-basket transaction.
When an item is not present in a collection, it may
have a null value, or it may simply be missing
2.1 Itemsets
The first step in association analysis needs the
enumeration of itemsets.
– An itemset is any combination of two or more items in
a transaction.
The maximum number of items in an itemset is
user-specified. If the maximum is two, all the item
pairs will be counted.
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
2.1 Itemsets
Given the set of transactions, find rules that will
predict the occurrence of an item based on the
occurrences of other items in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} → {Beer},
{Milk, Bread} → {Eggs, Coke},
{Beer, Bread} → {Milk},
Implication means co-occurrence,
not causality!
2.2 General Construction
Association rules are ranked by these metrics:
– Support – How often do these items occur together in
the data?
– Confidence – How likely are these items to occur
together in the data?
A simple example of Association Rules
– Assume there are 100 customers
◆ 8 bought Butter, 10 of them bought Milk, and 6 bought both
– bought butter => bought milk
◆ Support = P(Butter & Milk) = 6/100 = 0.06
◆ Confidence = support/P(Butter) = 0.06/0.08 = 0.75
◆ Lift = confidence/P(Milk) = 0.75/0.10 = 7.5
(1) Definition of Frequent Itemset
Itemset
– A collection of one or more items
◆ Example: {Milk, Bread, Diaper}
– k-itemset
◆ An itemset that contains k items
Support count ()
– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread, Diaper}) = 2
Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
(3) Definition of Association Rule
Example:
Beer}Diaper,Milk{
4.0
5
2
|T|
)BeerDiaper,,Milk(
===
s
67.0
3
2
)Diaper,Milk(
)BeerDiaper,Milk,(
===
c
Association Rule
– An implication expression of the form
X → Y, where X and Y are itemsets
– Example:
{Milk, Diaper} → {Beer}
Rule Evaluation Metrics
– Support (s)
◆ Fraction of transactions that contain
both X and Y
– Confidence (c)
◆ Measures how often items in Y
appear in transactions that
contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
(4) Mining Association Rules
Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
2.3 Frequent Itemsets
Association rules are calculated from itemsets.
If rules are generated from all possible itemsets,
there may be a very high number of rules.
So, the rules may not be very meaningful.
– Introduce a minimum concept (minsup).
– The minimum support is a user-specified percentage
that limits the number of itemsets used for association
rules.
◆ Frequent itemsets are those that occur with a minimum
frequency specified by the user.
◆ Generate all itemsets whose support minimum support
(minsup)
2.3 Frequent Itemsets
However, still frequent itemset generation is still
computationally expensive.
– For example, how many itemsets are potentially to be
generated in the worst case?
◆The number of frequent itemsets to be generated is sensitive
to the minsup threshold
◆When minsup is low, there exist potentially an exponential
number of frequent itemsets
◆The worst case: MN where M: # distinct items, and N: max
length of transactions
Using minimum confidence constraint is applied
to these frequent itemsets in order to form rule
– Select all itemsets whose confidence minimum
confidence
2.3 Frequent Itemsets
It refers to set of items that frequently appear
together and satisfies both the minimum
support threshold and the minimum confidence
threshold.
Need two-step approach:
– Frequent itemset generation
◆ Generate all itemsets whose support minimum support
◆ Select all itemsets whose confidence minimum confidence
– Rule generation
◆ Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
2.4 Frequent Itemset Generation
Frequent Itemset
– An itemset whose support is greater than or equal to
the minimum threshold
– Given a set of transactions T, the goal of association
rule mining is to find all rules having
◆ support ≥ minsup threshold
◆ confidence ≥ minconf threshold
Using Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
2.4 Frequent Itemset Generation
Using Brute-force approach:
– Each itemset in the lattice is a candidate frequent
itemset
– Count the support of each candidate by scanning the
database
– Match each transaction against every candidate
– Complexity ≈ O(NMw) => Expensive since M = 2d !!!
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions List of
Candidates
M
w
2.4 Frequent Itemset Generation
Brute-force search or exhaustive search, also known as
generate and test, is a very general problem-solving
technique that consists of systematically enumerating all
possible candidates for the solution and checking
whether each candidate satisfies the problem’s statement
without giving any thoughts regarding the time and space
complexity.
29
Find all the rules X → Y with
minimum support and
confidence
– support, s, probability that a
transaction contains X Y
– confidence, c, conditional
probability that a transaction having
X also contains Y
– Let minsup = 50%, minconf = 50%
– Freq. Pat.: Beer:3, Nuts:3, Diaper:4,
Eggs:3, {Beer, Diaper}:3
Association rules: (many more!)
– Beer → Diaper (60%, 100%)
– Diaper → Beer (60%, 75%)
Customer
buys
diaper
Customer
buys both
Customer
buys beer
Nuts, Eggs, Milk40
Nuts, Coffee, Diaper, Eggs, Milk50
Beer, Diaper, Eggs30
Beer, Coffee, Diaper20
Beer, Nuts, Diaper10
Items boughtTid
Example: Basic Concepts of Association
2.5 Reducing Frequent Itemsets
Association rules calculated from itemsets are
relatively “simple” and does not require building
prediction models.
However, they create lots of itemsets, which
make computational complexity
– Given d unique items:
◆Total number of itemsets = 2d
◆Total number of possible association rules:
123
1
1
1 1
+−=
−
=
+
−
=
−
=
dd
d
k
kd
j j
kd
k
d
R
If d=6, R = 602 rules
Example: Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given 5 items {A, B, C, D, E}, there are 25 possible candidate
itemsets
2.5 Reducing Frequent Itemsets
We should consider how to reduce generating
itemsets. This would be a future research topic.
There are three main approaches for reducing
the computational complexity of frequent itemset
generation
– Reduce the number of transactions (N)
◆ Reduce size of N as the size of itemset increases
◆ Used by DHP and vertical-based mining algorithms
– Reduce the number of candidates (M)
◆ Complete search: M=2d
◆ Use pruning techniques to reduce M
– Reduce the number of comparisons (NM)
◆ Use efficient data structures to store the candidates or
transactions
◆ No need to match every candidate against every transaction
3. Apriori Algorithms
Apriori is an algorithm for frequent item set
mining and association rule learning over
relational databases.
– It proceeds by identifying the frequent individual
items in the database and extending them to larger
and larger item sets as long as those item sets appear
sufficiently often in the database.
3. Apriori Algorithms
Apriori principle: If an itemset is frequent, then
all of its subsets must also be frequent.
– “If an itemset is infrequent, all its supersets will be
infrequent.”
The key concept of Apriori algorithm is its anti-
monotonicity of support measure.
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned
supersets
3. Apriori Algorithms
3.1 Illustrating Apriori Principle
3.1 Illustrating Apriori Principle
How to calculate the combination number!
For example,
6
2
means select a combination set
of two strings out of six data
– { a, b, c, d, e, f }, then select two set
◆ ab, ac, ad, ae, af
◆ bc, bd, be, bf
◆ cd, de, cf
◆ de, df
◆ ef
– Thus,
6
2
= 15
– Thus,
6
1
= 6
3.2 Apriori Algorithm Pruning
Apriori algorithm involves two time-consuming
pruning steps to exclude the infrequent
candidates and hold frequents.
Method: (see 368 page)
– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
◆ Generate length (k+1) candidate itemsets from length k
frequent itemsets
◆ Prune candidate itemsets containing subsets of length k that
are infrequent
◆ Count the support of each candidate by scanning the DB
◆ Eliminate candidates that are infrequent, leaving only those
that are frequent
3.2 Apriori Algorithm Pruning
Candidate generation and pruning algorithm
– Exclude the infrequent candidates and hold
frequents.
Example: Apriori Algorithm Pruning
Simple Apriori principle
– Let’s use the data in this table and assume that the
minimum support is 2.
– Looking for single items that meet
the minimum
Example: Apriori Algorithm Pruning
– Next, we take all of the items that meet the support
requirements, we can out of them; AB, AC, AD, AE,
BC, BD, BE, CD, CE, DE.
Example: Apriori Algorithm Pruning
– Several of these patterns do not meet the support
threshold of 2, so we remove them from the list of
options.
– At this point, we use the surviving items to make other
patterns that contain 3 items.
◆ You’ll get a list like this: ABC, ABD, ABE, BCD, BCE, BDE
Example: Apriori Algorithm Pruning
– The final list of all of the patterns that have support
greater than or equal to 2 are summarized here.
3. Rule Generation
The goal of association rule generation is to
find interesting patterns and trends in
transaction databases.
In this section, we understand how to extract
association rules efficiently from a give frequent
itemset
3. Rule Generation
After found all sets of itemsets that have support
greater than the minimum support, then using the
large itemsets to generate the desired rules that
have confidence greater than the minimum
confidence.
3.1 Rule Generation for Apriori Algorithm
Given a frequent itemset L, find all non-empty
subsets f L such that f → L – f satisfies the
minimum confidence requirement
Method: (see 382 page)
3.1 Rule Generation for Apriori Algorithm
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Lattice of rules
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Pruned
Rules
Low
Confidence
Rule
3.1 Rule Generation for Apriori Algorithm
Candidate rule is generated by merging two rules
that share the same prefix
in the rule consequent
join(CD=>AB, BD=>AC)
would produce the candidate
rule D => ABC
Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
BD=>ACCD=>AB
D=>ABC
Example: Congressional Voting Records
This section demonstrate the result of applying
association analysis to the voting of member of
US house of Representatives
Example: Congressional Voting Records
The Apriori algorithm is then applied to the data
set with minsup = 30% and minconf = 90%
Some of the high confidence rules extracted by
the algorithms
3.2 Multiple Minimum Support
How to set the appropriate minsup threshold?
– If minsup is set too high, we could miss itemsets
involving interesting rare items (e.g., expensive
products)
– If minsup is set too low, it is computationally
expensive and the number of itemsets is very large
Using a single minimum support threshold may
not be effective
In 1999, Liu suggests multiple minimum support
– Keep infrequent items
3.2 Multiple Minimum Support
How to apply multiple minimum supports?
– MS(i): minimum support for item i
– E.g. MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
– MS({Milk, Broccoli}) = min (MS(Milk), MS(Broccoli))
= 0.1%
– Challenge: Support is no longer monotone
◆ Suppose: Support(Milk, Coke) = 1.5% and
Support(Milk, Coke, Broccoli) = 0.5%
◆ {Milk, Coke} is infrequent but {Milk, Coke, Broccoli} is frequent
◆ 0.1/1.5 = 0.06, 0.1/0.5 = 0.2
3.2 Multiple Minimum Support
A
Item MS(I) Sup(I)
A 0.10% 0.25%
B 0.20% 0.26%
C 0.30% 0.29%
D 0.50% 0.05%
E 3% 4.20%
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
3.2 Multiple Minimum Support
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
Item MS(I) Sup(I)
A 0.10% 0.25%
B 0.20% 0.26%
C 0.30% 0.29%
D 0.50% 0.05%
E 3% 4.20%
3.2 Multiple Minimum Support
Order the items according to their minimum
support (in ascending order)
– e.g.: MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
– Ordering: Broccoli, Salmon, Coke, Milk
Need to modify Apriori such that:
– L1 : set of frequent items
– F1 : set of items whose support is MS(1),
where MS(1) is mini( MS(i) )
– C2 : candidate itemsets of size 2 is generated from F1
instead of L1
3.2 Multiple Minimum Support
Modifications to Apriori:
– In traditional Apriori,
◆ A candidate (k+1) – itemset is generated by merging two
frequent itemsets of size k
◆ The candidate is pruned if it contains any infrequent subsets
of size k
– Pruning step has to be modified:
◆ Prune only if subset contains the first item
◆ e.g. Candidate={Broccoli, Coke, Milk}
– (ordered according to minimum support)
◆ {Broccoli, Coke} and {Broccoli, Milk} are frequent but
{Coke, Milk} is infrequent
– Candidate is not pruned because {Coke, Milk} does not contain
the first item, i.e., Broccoli.
4. Pattern Evaluation
Association rule algorithms tend to produce too
many rules
– many of them are uninteresting or redundant
– Redundant if {A,B,C} → {D} and {A,B} → {D} have
same support & confidence
Pattern evaluation is defined as identifying
strictly increasing patterns representing
knowledge based on given measures.
– Find a set of criteria to set “interestingness score”
of each pattern.
– The “interestingness score” measures can be used
to prune/rank the derived patterns
4. Pattern Evaluation
Thus, in the original formulation of association
rules, support & confidence are the only
measures used to find patterns
– Find the “interestingness score” of each pattern.
Use of summarization and visualization makes
data understandable by user.
4. Pattern Evaluation
Interestingness
Measures
Summary
Introduced association rules
Reviewed association rule mining
– Itemsets
– General construction
– Frequent itemsets
– Reducing frequent itemsets
Studied Apriori algorithms
– Illustrating apriori principle
– Apriori algorithm pruning
Reviewed rule generation
– Rule generation for Apriori algorithm
– Multiple minimum support
Knew pattern evaluation
Thank You!
Assignment #2: 06/05 (11:59pm, Saturday)
Copyright © Prof. Kwang Lee All rights reserved.
Note and Thank you!!!
62
Ref: Basic Concepts of Frequent Pattern Mining
(Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining
association rules between sets of items in large databases. SIGMOD’93
(Max-pattern) R. J. Bayardo. Efficiently mining long patterns from
databases. SIGMOD’98
(Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering
frequent closed itemsets for association rules. ICDT’99
(Sequential pattern) R. Agrawal and R. Srikant. Mining sequential patterns.
ICDE’95
63
Ref: Apriori and Its Improvements
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB’94
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering
association rules. KDD’94
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining
association rules in large databases. VLDB’95
J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD’95
H. Toivonen. Sampling large databases for association rules. VLDB’96
S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and
implication rules for market basket analysis. SIGMOD’97
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining
with relational database systems: Alternatives and implications. SIGMOD’98
64
Ref: Depth-First, Projection-Based FP Mining
R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation
of frequent itemsets. J. Parallel and Distributed Computing, 2002.
G. Grahne and J. Zhu, Efficiently Using Prefix-Trees in Mining Frequent Itemsets, Proc.
FIMI’03
B. Goethals and M. Zaki. An introduction to workshop on frequent itemset mining
implementations. Proc. ICDM’03 Int. Workshop on Frequent Itemset Mining
Implementations (FIMI’03), Melbourne, FL, Nov. 2003
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.
SIGMOD’ 00
J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic
Projection. KDD’02
J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed Patterns without
Minimum Support. ICDM’02
J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best Strategies for Mining
Frequent Closed Itemsets. KDD’03
65
Ref: Vertical Format and Row Enumeration Methods
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for
discovery of association rules. DAMI:97.
M. J. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset
Mining, SDM’02.
C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-Pruning
Algorithm for Itemsets with Constraints. KDD’02.
F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER: Finding
Closed Patterns in Long Biological Datasets. KDD’03.
H. Liu, J. Han, D. Xin, and Z. Shao, Mining Interesting Patterns from Very High
Dimensional Data: A Top-Down Row Enumeration Approach, SDM’06.
66
Ref: Mining Correlations and Interesting Rules
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing
association rules to correlations. SIGMOD’97.
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding
interesting rules from large sets of discovered association rules. CIKM’94.
R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Measures of Interest.
Kluwer Academic, 2001.
C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining
causal structures. VLDB’98.
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure
for Association Patterns. KDD’02.
E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE’03.
T. Wu, Y. Chen, and J. Han, “Re-Examination of Interestingness Measures in Pattern
Mining: A Unified Framework”, Data Mining and Knowledge Discovery, 21(3):371-
397, 2010
Assignment #2
Open the assignment #2 MS word file and
answer the following questions. If you complete it,
please upload into the blackboard.
1. Consider the data set shown in Table (Table
6.15, 515 page), with an item taxonomy given in
Figure 6.25. Example of market basket
transactions
– https://www.geeksforgeeks.org/apriori-algorithm/
– https://www.softwaretestinghelp.com/apriori-algorithm/
– https://www.geeksforgeeks.org/association-rule/
https://www.geeksforgeeks.org/apriori-algorithm/
https://www.softwaretestinghelp.com/apriori-algorithm/
https://www.geeksforgeeks.org/association-rule/
Assignment #2
Assignment #2
Assignment #2
Answer for the following questions.
(a) Consider the approach where each transaction t is replaced by
an extended transaction t1 that contains all the items in t as well
as their respective ancestors.
For example, the transaction t = { Chips, Cookies} will be replaced
by t1 = {Chips, Cookies, Snack Food, Food}. Use this approach to
derive all frequent itemsets (up to size 4) with support ≥ 70%.
(b) Consider an alternative approach where the frequent itemsets
are generated one level at a time. Initially, all the frequent itemsets
involving items at the highest level of the hierarchy are generated.
Next, we use the frequent itemsets discovered at the higher level
of the hierarchy to generate candidate itemsets involving items at
the lower levels of the hierarchy.
For example, we generate the candidate itemset {Chips, Diet
Soda} only if {Snack Food, Soda} is frequent. Use this approach to
derive all frequent itemsets (up to size 4) with support ≥ 70%.
Assignment #2
2. Consider the data set shown in Table. Example
of market basket transactions
Assignment #2
Answer for the following questions.
(a) Compute the support for itemsets {e}, {b, d}, and {b,
d, e} by treating each transaction ID as a market basket.
(b) Use the results in part (a) to compute the confidence for
the association rules {b, d} → {e} and {e} → {b, d}. Is
confidence a symmetric measure?
(c) Repeat part (a) by treating each customer ID as a
market basket. Each item should be treated as a binary
variable (1 if an item appears in at least one transaction
bought by the customer, and 0 otherwise). Use this result to
compute the confidence for the association rules {b, d} →
{e} and {e} → {b, d}.
(d) Use the result in part (c) to compute the confidence for
the association rules {b, d} → {e} and {e} → {b, d}.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more