ITS-632 Introduction to Data

Mining

Copyright © Prof. Kwang Lee All rights reserved.

Kwang Lee, Ph.D.

Computer and Information Science

Cumberland University

2

Association Analysis

Lecture 5

Assignment #2: 06/05 (11:59pm, Saturday)

Copyright © Prof. Kwang Lee All rights reserved.

Announcement!!!

Lecture Overview

Introduce association rules

Review association rule mining

– Itemsets

– General construction

– Frequent itemsets

– Reducing frequent itemsets

Study Apriori algorithms

– Illustrating apriori principle

– Apriori algorithm pruning

Review rule generation

– Rule generation for Apriori algorithm

– Multiple minimum support

Know pattern evaluation

1. Introduction to Association

This chapter presents a methodology known as

association analysis, which is useful for

discovering interesting hidden relationship in

large sets

1. Introduction to Association

Association is a data mining function that

discovers the probability of the co-occurrence of

items in a collection.

– The relationships between co-occurring items are

expressed as association rules

First proposed by Agrawal, Imielinski, and Swami

in the context of frequent itemsets and

association rule mining

– Frequent pattern: a pattern (a set of items,

subsequences, substructures, etc.) that occurs

frequently in a data set

1. Introduction to Association

Motivation: finding inherent regularities in data

– What products were often purchased together?

◆ Beer and diapers?

– What are the subsequent purchases after buying a

PC?

◆ Mouse and keyboard?

– What kinds of DNA are sensitive to this new drug?

◆ Tylenol and advil?

1. Introduction to Association

Applications:

– Basket data analysis, cross-marketing, catalog

design, sale campaign analysis, Web log (click

stream) analysis, and DNA sequence analysis.

Advantages:

– The advantage of association rule discovery over

other data mining techniques is that this method is

relatively “simple” and does not require building and

training a classification or prediction model.

– This advantage is especially significant in real-time

applications like online stores.

1. Introduction to Association

Association is a data mining function that

discovers the probability of the co-occurrence of

items in a collection.

The relationships between co-occurring items are

expressed as association rules.

1. Introduction to Association

Association rules are widely used to analyze

retail basket or transaction data

– Association rules are intended to identify strong

rules discovered in transaction data using measures

of interesting measures, based on the concept of

strong rules.

Visualization of Association Rules:

DBMiner

Visualization of Association Rules:

DBMiner

Visualization of Association Rules

(MineSet 3.0)

2. Association Rule Mining

Unlike other data mining functions, association is

transaction-based.

In transaction processing, a case consists of a

transaction such as a market basket.

2. Association Rule Mining

The collection of items in the transaction is an

attribute of the transaction.

– Other attributes might be the date, time, location, or

user ID associated with the transaction.

2. Association Rule Mining

In transactional data, a collection of items is

associated with each case.

So, the collection could theoretically include all

possible members of the collection.

– For example, all products could theoretically be

purchased in a single market-basket transaction.

When an item is not present in a collection, it may

have a null value, or it may simply be missing

2.1 Itemsets

The first step in association analysis needs the

enumeration of itemsets.

– An itemset is any combination of two or more items in

a transaction.

The maximum number of items in an itemset is

user-specified. If the maximum is two, all the item

pairs will be counted.

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

2.1 Itemsets

Given the set of transactions, find rules that will

predict the occurrence of an item based on the

occurrences of other items in the transaction

Market-Basket transactions

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} → {Beer},

{Milk, Bread} → {Eggs, Coke},

{Beer, Bread} → {Milk},

Implication means co-occurrence,

not causality!

2.2 General Construction

Association rules are ranked by these metrics:

– Support – How often do these items occur together in

the data?

– Confidence – How likely are these items to occur

together in the data?

A simple example of Association Rules

– Assume there are 100 customers

◆ 8 bought Butter, 10 of them bought Milk, and 6 bought both

– bought butter => bought milk

◆ Support = P(Butter & Milk) = 6/100 = 0.06

◆ Confidence = support/P(Butter) = 0.06/0.08 = 0.75

◆ Lift = confidence/P(Milk) = 0.75/0.10 = 7.5

(1) Definition of Frequent Itemset

Itemset

– A collection of one or more items

◆ Example: {Milk, Bread, Diaper}

– k-itemset

◆ An itemset that contains k items

Support count ()

– Frequency of occurrence of an itemset

– E.g. ({Milk, Bread, Diaper}) = 2

Support

– Fraction of transactions that contain an

itemset

– E.g. s({Milk, Bread, Diaper}) = 2/5

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

(3) Definition of Association Rule

Example:

Beer}Diaper,Milk{

4.0

5

2

|T|

)BeerDiaper,,Milk(

===

s

67.0

3

2

)Diaper,Milk(

)BeerDiaper,Milk,(

===

c

Association Rule

– An implication expression of the form

X → Y, where X and Y are itemsets

– Example:

{Milk, Diaper} → {Beer}

Rule Evaluation Metrics

– Support (s)

◆ Fraction of transactions that contain

both X and Y

– Confidence (c)

◆ Measures how often items in Y

appear in transactions that

contain X

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

(4) Mining Association Rules

Example of Rules:

{Milk,Diaper} → {Beer} (s=0.4, c=0.67)

{Milk,Beer} → {Diaper} (s=0.4, c=1.0)

{Diaper,Beer} → {Milk} (s=0.4, c=0.67)

{Beer} → {Milk,Diaper} (s=0.4, c=0.67)

{Diaper} → {Milk,Beer} (s=0.4, c=0.5)

{Milk} → {Diaper,Beer} (s=0.4, c=0.5)

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Observations:

• All the above rules are binary partitions of the same itemset:

{Milk, Diaper, Beer}

• Rules originating from the same itemset have identical support but

can have different confidence

• Thus, we may decouple the support and confidence requirements

2.3 Frequent Itemsets

Association rules are calculated from itemsets.

If rules are generated from all possible itemsets,

there may be a very high number of rules.

So, the rules may not be very meaningful.

– Introduce a minimum concept (minsup).

– The minimum support is a user-specified percentage

that limits the number of itemsets used for association

rules.

◆ Frequent itemsets are those that occur with a minimum

frequency specified by the user.

◆ Generate all itemsets whose support minimum support

(minsup)

2.3 Frequent Itemsets

However, still frequent itemset generation is still

computationally expensive.

– For example, how many itemsets are potentially to be

generated in the worst case?

◆The number of frequent itemsets to be generated is sensitive

to the minsup threshold

◆When minsup is low, there exist potentially an exponential

number of frequent itemsets

◆The worst case: MN where M: # distinct items, and N: max

length of transactions

Using minimum confidence constraint is applied

to these frequent itemsets in order to form rule

– Select all itemsets whose confidence minimum

confidence

2.3 Frequent Itemsets

It refers to set of items that frequently appear

together and satisfies both the minimum

support threshold and the minimum confidence

threshold.

Need two-step approach:

– Frequent itemset generation

◆ Generate all itemsets whose support minimum support

◆ Select all itemsets whose confidence minimum confidence

– Rule generation

◆ Generate high confidence rules from each frequent itemset,

where each rule is a binary partitioning of a frequent itemset

2.4 Frequent Itemset Generation

Frequent Itemset

– An itemset whose support is greater than or equal to

the minimum threshold

– Given a set of transactions T, the goal of association

rule mining is to find all rules having

◆ support ≥ minsup threshold

◆ confidence ≥ minconf threshold

Using Brute-force approach:

– List all possible association rules

– Compute the support and confidence for each rule

– Prune rules that fail the minsup and minconf

thresholds

2.4 Frequent Itemset Generation

Using Brute-force approach:

– Each itemset in the lattice is a candidate frequent

itemset

– Count the support of each candidate by scanning the

database

– Match each transaction against every candidate

– Complexity ≈ O(NMw) => Expensive since M = 2d !!!

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

N

Transactions List of

Candidates

M

w

2.4 Frequent Itemset Generation

Brute-force search or exhaustive search, also known as

generate and test, is a very general problem-solving

technique that consists of systematically enumerating all

possible candidates for the solution and checking

whether each candidate satisfies the problem’s statement

without giving any thoughts regarding the time and space

complexity.

29

Find all the rules X → Y with

minimum support and

confidence

– support, s, probability that a

transaction contains X Y

– confidence, c, conditional

probability that a transaction having

X also contains Y

– Let minsup = 50%, minconf = 50%

– Freq. Pat.: Beer:3, Nuts:3, Diaper:4,

Eggs:3, {Beer, Diaper}:3

Association rules: (many more!)

– Beer → Diaper (60%, 100%)

– Diaper → Beer (60%, 75%)

Customer

buys

diaper

Customer

buys both

Customer

buys beer

Nuts, Eggs, Milk40

Nuts, Coffee, Diaper, Eggs, Milk50

Beer, Diaper, Eggs30

Beer, Coffee, Diaper20

Beer, Nuts, Diaper10

Items boughtTid

Example: Basic Concepts of Association

2.5 Reducing Frequent Itemsets

Association rules calculated from itemsets are

relatively “simple” and does not require building

prediction models.

However, they create lots of itemsets, which

make computational complexity

– Given d unique items:

◆Total number of itemsets = 2d

◆Total number of possible association rules:

123

1

1

1 1

+−=

−

=

+

−

=

−

=

dd

d

k

kd

j j

kd

k

d

R

If d=6, R = 602 rules

Example: Frequent Itemset Generation

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Given 5 items {A, B, C, D, E}, there are 25 possible candidate

itemsets

2.5 Reducing Frequent Itemsets

We should consider how to reduce generating

itemsets. This would be a future research topic.

There are three main approaches for reducing

the computational complexity of frequent itemset

generation

– Reduce the number of transactions (N)

◆ Reduce size of N as the size of itemset increases

◆ Used by DHP and vertical-based mining algorithms

– Reduce the number of candidates (M)

◆ Complete search: M=2d

◆ Use pruning techniques to reduce M

– Reduce the number of comparisons (NM)

◆ Use efficient data structures to store the candidates or

transactions

◆ No need to match every candidate against every transaction

3. Apriori Algorithms

Apriori is an algorithm for frequent item set

mining and association rule learning over

relational databases.

– It proceeds by identifying the frequent individual

items in the database and extending them to larger

and larger item sets as long as those item sets appear

sufficiently often in the database.

3. Apriori Algorithms

Apriori principle: If an itemset is frequent, then

all of its subsets must also be frequent.

– “If an itemset is infrequent, all its supersets will be

infrequent.”

The key concept of Apriori algorithm is its anti-

monotonicity of support measure.

Found to be

Infrequent

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Pruned

supersets

3. Apriori Algorithms

3.1 Illustrating Apriori Principle

3.1 Illustrating Apriori Principle

How to calculate the combination number!

For example,

6

2

means select a combination set

of two strings out of six data

– { a, b, c, d, e, f }, then select two set

◆ ab, ac, ad, ae, af

◆ bc, bd, be, bf

◆ cd, de, cf

◆ de, df

◆ ef

– Thus,

6

2

= 15

– Thus,

6

1

= 6

3.2 Apriori Algorithm Pruning

Apriori algorithm involves two time-consuming

pruning steps to exclude the infrequent

candidates and hold frequents.

Method: (see 368 page)

– Let k=1

– Generate frequent itemsets of length 1

– Repeat until no new frequent itemsets are identified

◆ Generate length (k+1) candidate itemsets from length k

frequent itemsets

◆ Prune candidate itemsets containing subsets of length k that

are infrequent

◆ Count the support of each candidate by scanning the DB

◆ Eliminate candidates that are infrequent, leaving only those

that are frequent

3.2 Apriori Algorithm Pruning

Candidate generation and pruning algorithm

– Exclude the infrequent candidates and hold

frequents.

Example: Apriori Algorithm Pruning

Simple Apriori principle

– Let’s use the data in this table and assume that the

minimum support is 2.

– Looking for single items that meet

the minimum

Example: Apriori Algorithm Pruning

– Next, we take all of the items that meet the support

requirements, we can out of them; AB, AC, AD, AE,

BC, BD, BE, CD, CE, DE.

Example: Apriori Algorithm Pruning

– Several of these patterns do not meet the support

threshold of 2, so we remove them from the list of

options.

– At this point, we use the surviving items to make other

patterns that contain 3 items.

◆ You’ll get a list like this: ABC, ABD, ABE, BCD, BCE, BDE

Example: Apriori Algorithm Pruning

– The final list of all of the patterns that have support

greater than or equal to 2 are summarized here.

3. Rule Generation

The goal of association rule generation is to

find interesting patterns and trends in

transaction databases.

In this section, we understand how to extract

association rules efficiently from a give frequent

itemset

3. Rule Generation

After found all sets of itemsets that have support

greater than the minimum support, then using the

large itemsets to generate the desired rules that

have confidence greater than the minimum

confidence.

3.1 Rule Generation for Apriori Algorithm

Given a frequent itemset L, find all non-empty

subsets f L such that f → L – f satisfies the

minimum confidence requirement

Method: (see 382 page)

3.1 Rule Generation for Apriori Algorithm

ABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Lattice of rules

ABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned

Rules

Low

Confidence

Rule

3.1 Rule Generation for Apriori Algorithm

Candidate rule is generated by merging two rules

that share the same prefix

in the rule consequent

join(CD=>AB, BD=>AC)

would produce the candidate

rule D => ABC

Prune rule D=>ABC if its

subset AD=>BC does not have

high confidence

BD=>ACCD=>AB

D=>ABC

Example: Congressional Voting Records

This section demonstrate the result of applying

association analysis to the voting of member of

US house of Representatives

Example: Congressional Voting Records

The Apriori algorithm is then applied to the data

set with minsup = 30% and minconf = 90%

Some of the high confidence rules extracted by

the algorithms

3.2 Multiple Minimum Support

How to set the appropriate minsup threshold?

– If minsup is set too high, we could miss itemsets

involving interesting rare items (e.g., expensive

products)

– If minsup is set too low, it is computationally

expensive and the number of itemsets is very large

Using a single minimum support threshold may

not be effective

In 1999, Liu suggests multiple minimum support

– Keep infrequent items

3.2 Multiple Minimum Support

How to apply multiple minimum supports?

– MS(i): minimum support for item i

– E.g. MS(Milk)=5%, MS(Coke) = 3%,

MS(Broccoli)=0.1%, MS(Salmon)=0.5%

– MS({Milk, Broccoli}) = min (MS(Milk), MS(Broccoli))

= 0.1%

– Challenge: Support is no longer monotone

◆ Suppose: Support(Milk, Coke) = 1.5% and

Support(Milk, Coke, Broccoli) = 0.5%

◆ {Milk, Coke} is infrequent but {Milk, Coke, Broccoli} is frequent

◆ 0.1/1.5 = 0.06, 0.1/0.5 = 0.2

3.2 Multiple Minimum Support

A

Item MS(I) Sup(I)

A 0.10% 0.25%

B 0.20% 0.26%

C 0.30% 0.29%

D 0.50% 0.05%

E 3% 4.20%

B

C

D

E

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

3.2 Multiple Minimum Support

A

B

C

D

E

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

Item MS(I) Sup(I)

A 0.10% 0.25%

B 0.20% 0.26%

C 0.30% 0.29%

D 0.50% 0.05%

E 3% 4.20%

3.2 Multiple Minimum Support

Order the items according to their minimum

support (in ascending order)

– e.g.: MS(Milk)=5%, MS(Coke) = 3%,

MS(Broccoli)=0.1%, MS(Salmon)=0.5%

– Ordering: Broccoli, Salmon, Coke, Milk

Need to modify Apriori such that:

– L1 : set of frequent items

– F1 : set of items whose support is MS(1),

where MS(1) is mini( MS(i) )

– C2 : candidate itemsets of size 2 is generated from F1

instead of L1

3.2 Multiple Minimum Support

Modifications to Apriori:

– In traditional Apriori,

◆ A candidate (k+1) – itemset is generated by merging two

frequent itemsets of size k

◆ The candidate is pruned if it contains any infrequent subsets

of size k

– Pruning step has to be modified:

◆ Prune only if subset contains the first item

◆ e.g. Candidate={Broccoli, Coke, Milk}

– (ordered according to minimum support)

◆ {Broccoli, Coke} and {Broccoli, Milk} are frequent but

{Coke, Milk} is infrequent

– Candidate is not pruned because {Coke, Milk} does not contain

the first item, i.e., Broccoli.

4. Pattern Evaluation

Association rule algorithms tend to produce too

many rules

– many of them are uninteresting or redundant

– Redundant if {A,B,C} → {D} and {A,B} → {D} have

same support & confidence

Pattern evaluation is defined as identifying

strictly increasing patterns representing

knowledge based on given measures.

– Find a set of criteria to set “interestingness score”

of each pattern.

– The “interestingness score” measures can be used

to prune/rank the derived patterns

4. Pattern Evaluation

Thus, in the original formulation of association

rules, support & confidence are the only

measures used to find patterns

– Find the “interestingness score” of each pattern.

Use of summarization and visualization makes

data understandable by user.

4. Pattern Evaluation

Interestingness

Measures

Summary

Introduced association rules

Reviewed association rule mining

– Itemsets

– General construction

– Frequent itemsets

– Reducing frequent itemsets

Studied Apriori algorithms

– Illustrating apriori principle

– Apriori algorithm pruning

Reviewed rule generation

– Rule generation for Apriori algorithm

– Multiple minimum support

Knew pattern evaluation

Thank You!

Assignment #2: 06/05 (11:59pm, Saturday)

Copyright © Prof. Kwang Lee All rights reserved.

Note and Thank you!!!

62

Ref: Basic Concepts of Frequent Pattern Mining

(Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

association rules between sets of items in large databases. SIGMOD’93

(Max-pattern) R. J. Bayardo. Efficiently mining long patterns from

databases. SIGMOD’98

(Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering

frequent closed itemsets for association rules. ICDT’99

(Sequential pattern) R. Agrawal and R. Srikant. Mining sequential patterns.

ICDE’95

63

Ref: Apriori and Its Improvements

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB’94

H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering

association rules. KDD’94

A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining

association rules in large databases. VLDB’95

J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining

association rules. SIGMOD’95

H. Toivonen. Sampling large databases for association rules. VLDB’96

S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and

implication rules for market basket analysis. SIGMOD’97

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining

with relational database systems: Alternatives and implications. SIGMOD’98

64

Ref: Depth-First, Projection-Based FP Mining

R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation

of frequent itemsets. J. Parallel and Distributed Computing, 2002.

G. Grahne and J. Zhu, Efficiently Using Prefix-Trees in Mining Frequent Itemsets, Proc.

FIMI’03

B. Goethals and M. Zaki. An introduction to workshop on frequent itemset mining

implementations. Proc. ICDM’03 Int. Workshop on Frequent Itemset Mining

Implementations (FIMI’03), Melbourne, FL, Nov. 2003

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.

SIGMOD’ 00

J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic

Projection. KDD’02

J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed Patterns without

Minimum Support. ICDM’02

J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best Strategies for Mining

Frequent Closed Itemsets. KDD’03

65

Ref: Vertical Format and Row Enumeration Methods

M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for

discovery of association rules. DAMI:97.

M. J. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset

Mining, SDM’02.

C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-Pruning

Algorithm for Itemsets with Constraints. KDD’02.

F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER: Finding

Closed Patterns in Long Biological Datasets. KDD’03.

H. Liu, J. Han, D. Xin, and Z. Shao, Mining Interesting Patterns from Very High

Dimensional Data: A Top-Down Row Enumeration Approach, SDM’06.

66

Ref: Mining Correlations and Interesting Rules

S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing

association rules to correlations. SIGMOD’97.

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding

interesting rules from large sets of discovered association rules. CIKM’94.

R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Measures of Interest.

Kluwer Academic, 2001.

C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining

causal structures. VLDB’98.

P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure

for Association Patterns. KDD’02.

E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE’03.

T. Wu, Y. Chen, and J. Han, “Re-Examination of Interestingness Measures in Pattern

Mining: A Unified Framework”, Data Mining and Knowledge Discovery, 21(3):371-

397, 2010

Assignment #2

Open the assignment #2 MS word file and

answer the following questions. If you complete it,

please upload into the blackboard.

1. Consider the data set shown in Table (Table

6.15, 515 page), with an item taxonomy given in

Figure 6.25. Example of market basket

transactions

– https://www.geeksforgeeks.org/apriori-algorithm/

– https://www.softwaretestinghelp.com/apriori-algorithm/

– https://www.geeksforgeeks.org/association-rule/

https://www.geeksforgeeks.org/apriori-algorithm/

https://www.softwaretestinghelp.com/apriori-algorithm/

https://www.geeksforgeeks.org/association-rule/

Assignment #2

Assignment #2

Assignment #2

Answer for the following questions.

(a) Consider the approach where each transaction t is replaced by

an extended transaction t1 that contains all the items in t as well

as their respective ancestors.

For example, the transaction t = { Chips, Cookies} will be replaced

by t1 = {Chips, Cookies, Snack Food, Food}. Use this approach to

derive all frequent itemsets (up to size 4) with support ≥ 70%.

(b) Consider an alternative approach where the frequent itemsets

are generated one level at a time. Initially, all the frequent itemsets

involving items at the highest level of the hierarchy are generated.

Next, we use the frequent itemsets discovered at the higher level

of the hierarchy to generate candidate itemsets involving items at

the lower levels of the hierarchy.

For example, we generate the candidate itemset {Chips, Diet

Soda} only if {Snack Food, Soda} is frequent. Use this approach to

derive all frequent itemsets (up to size 4) with support ≥ 70%.

Assignment #2

2. Consider the data set shown in Table. Example

of market basket transactions

Assignment #2

Answer for the following questions.

(a) Compute the support for itemsets {e}, {b, d}, and {b,

d, e} by treating each transaction ID as a market basket.

(b) Use the results in part (a) to compute the confidence for

the association rules {b, d} → {e} and {e} → {b, d}. Is

confidence a symmetric measure?

(c) Repeat part (a) by treating each customer ID as a

market basket. Each item should be treated as a binary

variable (1 if an item appears in at least one transaction

bought by the customer, and 0 otherwise). Use this result to

compute the confidence for the association rules {b, d} →

{e} and {e} → {b, d}.

(d) Use the result in part (c) to compute the confidence for

the association rules {b, d} → {e} and {e} → {b, d}.

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

- Free title page and bibliography
- Unlimited revisions
- Plagiarism-free guarantee
- Money-back guarantee
- 24/7 support

On-demand options

- Writer’s samples
- Part-by-part delivery
- Overnight delivery
- Copies of used sources
- Expert Proofreading

Paper format

- 275 words per page
- 12 pt Arial/Times New Roman
- Double line spacing
- Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Delivering a high-quality product at a reasonable price is not enough anymore.

That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more