Advances and Issues in Frequent Pattern Mining
Advances and Issues in Frequent Pattern Mining
Advances and Issues in Frequent Pattern Mining
Database Laboratory
of
Alberta
PAKDD 2004 Tutorial What Is Frequent Pattern Mining?
Advances and Issues in Frequent
• What is a frequent pattern?
Pattern Mining – Pattern (set of items, sequence, etc.) that occurs together
frequently in a database [AIS92]
A B C D E
AB AC AD AE BC BD BE CD CE DE
Frequent Itemset Mining Association Rules Generation
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [9] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [10]
L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
AB AC AD AE BC BD BE CD CE DE
Ck+1 = candidates generated from Lk;
Infrequent for each transaction t in database do
Items increment the count of all candidates in
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
ABCD ABCE ABDE ACDE BCDE
end
return ∪k Lk;
Pruned Frequent
supersets ABCDE Items
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [11] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [12]
The Apriori Algorithm -- Example Generating Association Rules
Database D Support>1
itemset sup.
L1 itemset sup.
from Frequent Itemsets
TID Items C1
{1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3 • Only strong association rules are generated.
300 1235 {4} 1 {5} 3
400 25 {5} 3 • Frequent itemsets satisfy minimum support threshold.
C2 itemset sup C2 itemset • Strong AR satisfy minimum confidence threshold.
L2 itemset sup {1 2}
{1 2} 1 Scan D Support(P ∪ Q)
{1 3} 2 {1 3} 2 {1 3} • Confidence(P Î Q) = Prob(Q/P) =
{1 5} Support(P)
{2 3} 2 {1 5} 1
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5} For each frequent itemset, f, generate all non-empty subsets of f.
{3 5} 2
{3 5} 2 {3 5} For every non-empty subset s of f do
C3 itemset L3 itemset sup output rule sÎ(f-s) if support(f)/support(s) ≥ min_confidence
Scan D Note: {1,2,3}{1,2,5} end
{2 3 5} {2 3 5} 2 and {1,3,5} not in C3
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [13] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [14]
General Outline
• Association Rules
• Different Frequent Patterns
• Different Lattice Traversal Approaches
•
•
Different Transactional Layouts
State-Of-The-Art Algorithms
{abcd}
Other Frequent Patterns • For All Frequent Patterns
• For Frequent Closed Patterns
• For Frequent Maximal Patterns
• Adding Constraints
• Parallel and Distributed Mining
• Visualization of Association Rules
• Frequent Sequential Pattern Mining
Frequent Closed Patterns {abc}
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [4]
superset of X. a 2 4 ACDE
General Outline
• Association Rules
• Different Frequent Patterns
• Different Lattice Traversal Approaches
• Different Transactional Layouts
TID Items
Maximal vs. Closed Itemsets Mining the Pattern Lattice •
•
•
State-Of-The-Art Algorithms
• For All Frequent Patterns
• For Frequent Closed Patterns
• For Frequent Maximal Patterns
Adding Constraints
Parallel and Distributed Mining
null
Closed but •
•
Visualization of Association Rules
Frequent Sequential Pattern Mining
1 ABC not maximal
• Breadth-First
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [4]
2 ABCD
1.2.3.4 1.2.3 1.2.3.4 2.4.5 4.5
3 ABC A B C D E – It uses current items at level k to generate items of level k+1 (many database scans)
4 ACDE Closed and
5 DE maximal • Depth-First
1.2.3 1.2.3.4 2.4 4 1.2.3 2 – It uses a current item at level k to generate all its supersets (favored when mining long
2.4 4 4.5 DE
AB AC AD AE BC BD BE CD CE frequent patterns)
4
2
ABCD ABCE ABDE ACDE BCDE Maximal There is also the notion of : ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Superset is Subset is
candidate if 3 3 3 candidate if it
3 3 ALL its subsets is marked or if
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
are frequent x one of its
supersets is
candidate
2 2
4
ABCD ABCE ABDE ACDE BCDE ABCD ABCE ABDE ACDE BCDE
18 candidates x 23 candidates
x
to check to check
Minimum support = 2 Minimum support = 2
ABCDE Top ABCDE Top
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [23] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [24]
One Hybrid Example TID
1
Items
ABC
Leap Traversal Example TID
1
Items
ABC
Steps null Bottom
2 ABCD Steps null Bottom
2 ABCD
1
3 ABC 1 3 ABC
A B C D E A B C D E
4 ACDE 4 ACDE
5 DE 5 DE
2 5
3 4 2 Itemset is
AB AC AD AE BC BD BE CD CE DE AB AC AD AE BC BD BE CD CE DE x candidate if it
is marked or if
Superset is it is a subset of
2 2 candidate if more than one
ALL its subsets 2 3
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE infrequent
are frequent x marked
superset
2 2
ABCD ABCE ABDE ACDE BCDE How to find the
2 ABCD ABCE ABDE ACDE BCDE
10 candidates
19 candidates x to check
Support of an itemset x
to check
1. Full scan of the database OR 5 frequent patterns
Minimum support = 2
ABCDE 2. Intelligent techniques: Support of itemset =
ABCDE without checking
Top Top
Summation of the supports of its supersets of marked patterns
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [25] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [26]
Finding Maximal using leap traversal approach Finding Maximal using leap traversal approach
Minimum support = 2 null
TID Items null TID Items
1,3 ABC 1.2.3.4 1.2.3
1,3 ABC 1.2.3.4 1.2.3 1.2.3.4 2.4.5 4.5
1.2.3.4 2.4.5 4.5
A B C D E
2 ABCD A B C D E 2 ABCD
3 ABC 3 ABC
4 ACDE 1.2.3 1.2.3.4 2.4 4 1.2.3 2
4 ACDE 4.5 DE
2.4 4 4.5 DE AB AC AD AE BC BD BE CD CE
AB AC AD AE BC BD BE CD CE
5 DE 5 DE
1.2.3
1.2.3 2 2.4 4 4 2 4 ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
2 4
-Selectively jumps in the lattice 2 4 Step1: Define actual ABCD ABCE ABDE ACDE BCDE
ABCD ABCE ABDE ACDE BCDE
paths (Mark paths)
-Find local Maximal
Minimum support = 2 ABCDE
-Generate patterns from these local maximal ABCDE
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [27] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [28]
Finding Maximal using leap traversal approach Finding Maximal using leap traversal approach
null null
TID Items TID Items
1,3 ABC 1.2.3.4 1.2.3 1.2.3.4 2.4.5 4.5 1,3 ABC
A B C D E A B C D E
2 ABCD 2 ABCD
3 ABC 3 ABC
4 ACDE 4.5 DE 4 ACDE 4.5 DE
AB AC AD AE BC BD BE CD CE AB AC AD AE BC BD BE CD CE
5 DE 5 DE
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [29] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [30]
Finding Maximal using leap traversal approach
TID Items
null
When to use a Given Strategy
1,3 ABC
A B C D E
2 ABCD
• Breadth First
3 ABC
– Suitable for short frequent patterns
4 ACDE
AB AC AD AE BC BD BE CD CE DE – Unsuitable for long frequent patterns
5 DE
• Depth First
– Suitable for long frequent patterns
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
– In general not scalable when long candidate patterns are
not frequent
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [31] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [32]
General Outline
• Association Rules
• Different Frequent Patterns
Empirical Tests
• Different Lattice Traversal Approaches
Transactional Layouts
• Different Transactional Layouts
• State-Of-The-Art Algorithms
• For All Frequent Patterns
• For Frequent Closed Patterns
• For Frequent Maximal Patterns
• Adding Constraints
Connect Connect database (long Frequent Patterns)
• Parallel and Distributed Mining
• Visualization of Association Rules
Total Candidates Created • Frequent Sequential Pattern Mining
16.00 © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [4]
14.00
95 9 17 2205 15626 6654 3044
frequent patterns
• Horizontal Layout
12.00
Breadth
90 12 21 27127 394648 144426 29263 10.00
8.00 Depth
80 15 28 119418 120177 6.00 Leap
65 19 33 1368337 1369585
0.00
95 90 85 75 70 65 60 55
Each transaction is recorded as a list of items
60 20 36 2908632 2910175 Support %
55 21 37 5996892 5998715
Transaction ID Items
T10I4D100K (Short Frequent Patterns)
T10I4D100K 1 A G D C B
100.00 2 B C H E D
Candidacy generation can be removed (FP-Growth)
Order of genertated Candidate
Support Size of Largest F1-Size Total Frequent Breadth Depth Leap 80.00 3 B D E A M
frequent patterns
70.00
6 A C Q R G
500 5 569 1073 1492 731 1546 20.00
13 M D C G O
patterns compared to the
14.00
Support Size of Largest F1-Size Total Frequent Breadth Depth Leap
frequent patterns
12.00
Breadth 14 C F P Q J
95 5 9 78 165 90 136 10.00
8.00 Depth 15 B D E F I
90 7 13 628 2768 1842 1191 6.00
Leap
16 J E B A D
85 8 16 2690 20871 9667 4318 4.00
2.00 17 A K E F C
80 10 20 8282 91577 49196 12021 0.00
18 C D L B A
75 11 23 20846 292363 160362 28986 95 90 85 80 75 70
Support %
70 13 24 48939 731740 560103 65093
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [33] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [34]
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [35] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [36]
Transactional Layouts Why The Matrix Layout?
• Inverted Matrix Layout Interactive mining
El-Hajj and Zaiane, ACM SIGKDD’03
Minimize Superfluous Processing
Candidacy generation can be reduced
Appropriate for Interactive Mining
Changing the support level means
T# Items
Loc Index
Transactional Array expensive steps (whole process is redone)
1 2 3 4 5 6 7 8 9 10 11
T1 A G D C B 1 R 2 (2,1) (3,2)
T2 B C H E D 2 Q 2 (12,2) (3,3) Evaluation and Knowledge
T3 B D E A M 3 P 3 (4,1) (9,1) (9,2)
T4 C E F A N 4 O 3 (5,2) (5,3) (6,3) Presentation
T5 A B N O P 5 N 3 (13,1) (17,4) (6,2)
T6 A C Q R G 6 M 3 (14,2) (13,3) (12,4)
T7 A C H I G 7 L 3 (8,1) (8,2) (15,9) Data Mining
T8 L E F K B 8 K 3 (13,2) (14,5) (13,7)
T9 A F M N O 9 J 3 (13,4) (13,5) (14,7) Selection and Patterns
T10 C F P J R 10 I 3 (11,2) (11,3) (13,6) Transformation
T11 A D B H I 11 H 3 (14,1) (12,3) 15,4)
T12 D E B K L 12 G 4 (15,1) (16,4) (16,5) (15,6) Data warehouse
T13 M D C G O 13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)
T14 C F P Q J 14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)
T15 B D E F I 15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)
T16 J E B A D 16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (¤, ¤) (¤, ¤) (¤, ¤) (18,10) (17,10)
T17 A K E F C 17 B 10 (18,1) (¤, ¤) (18,2) (18,4) (¤, ¤) (18,8) (¤, ¤) (¤, ¤) (18,9) (18,11) Databases
T18 C D L B A 18 A 11 (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [37] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [38]
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [39] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [40]
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [41] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [42]
General Outline
• Association Rules
• Different Frequent Patterns
T1 A D C B
T2
T3
B C E D
B D E A All
T4 C E F A
T5 A B
Loc Index
Transactional Array
T6
T7
A C
A C
Apriori, FP-Growth, COFI*, ECLAT
1 2 3 4 5 6 7 8 9 10 11
T8 E F B
13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)
14
15
E 8 (15,2)
D 9 (16,1)
(15,3)
(16,2)
(16,3)
(17,2)
(17,5)
(17,6)
(15,5) (15,7) (15,8)
(17,7) (16,7) (17,8)
(16,9)
(17,9) (16,10)
T9 A F
T10 C F Closed
16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10) T11 A D B
17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11) T12 D E B
18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) T13 D C
T14 C F
CHARM, CLOSET+,COFI-CLOSED
T15 B D E F
T16 E B A D
T17 A E F C Maximal
T18 C D B A
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [43] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [44]
All-Apriori All-Apriori
Problems with Apriori
Apriori • Generation of candidate itemsets are
expensive (Huge candidate sets)
Repetitive I/O scans • 104 frequent 1-itemset will generate 107 candidate 2-itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},
Huge Computation to generate candidate items one needs to generate 2100 ≈ 1030 candidates.
• High number of data scans
All-FP-Growth All-FP-Growth
FP-Growth
Frequent Pattern Tree
2 I/O scans
Reduced candidacy generation
High memory requirements F, A, C, D, G, I, M, P
Claims to be 1 order of magnitude faster than
Required Support: 3
Apriori A, B, C, F, L, M, O
B, F, H, J, O
A, F, C, E, L, P, M, N
B, C, K, S, P
Patterns
F, M, C, B, A
Recursive
FP-Tree conditional trees and FP-Trees F:5, C:5, A:4, B:4, M:4, P:3 D:1 E:1 G:1 H:1 I:1 J:1 K:1 L:1 O:1
J. Han, J. Pei, Y. Yin, SIGMOD’00
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [47] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [48]
All-FP-Growth All-FP-Growth
All-FP-Growth All-FP-Growth
F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree
F, B root F, B root
C, B, P C, B, P
F, C, A, M, P F, C, A, M, P
C, A, M
F:2 C, A, M
F:2 F:1
F, B F, B
C:2 C:2 B:1
A:2 A:2
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [51] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [52]
All-FP-Growth All-FP-Growth
F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree
F, B root F, B root
C, B, P C, B, P
F, C, A, M, P F, C, A, M, P
C, A, M
F:3 C, A, M
F:3 C:1 F:1
F, B F, B
C:2 B:1 C:2 B:1 B:1 C:1
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [53] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [54]
All-FP-Growth All-FP-Growth
F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree
F, B root F, B root
C, B, P C, B, P
F, C, A, M, P F, C, A, M, P
C, A, M
F:4 C:1 C, A, M
F:4 C:1 C:1
F, B F, B
C:3 B:1 B:1 C:3 B:1 B:1 A:1
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [55] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [56]
All-FP-Growth All-FP-Growth
F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree
F, B root F, B root
C, B, P C, B, P
F, C, A, M, P F, C, A, M, P
C, A, M
F:4 C:2 C, A, M
F:4 C:2 F:1
F, B F, B
C:3 B:1 B:1 A:1 C:3 B:1 B:1 A:1 B:1
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [57] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [58]
All-FP-Growth All-FP-Growth
F, C, A, M, P
F, C, A, B, M
Frequent Pattern Tree Frequent Pattern Growth
root
F, B root P:3
C, B, P
F, C, A, M, P F:5 F:5 C:1
C, A, M
F:5 C:1 C:3
C:5 <C:3, P:3>
F, B C:3 B:2 B:1 A:1
C:3 B:2 B:1 A:1 A:4
B:4 A:3
A:3 P:1 M:1
P:1 M:1 M:4
M:2 B:1
M:2 B:1 P:3
All-COFI All-COFI
Co-Occurrences Frequent
Co-Occurrences Frequent Item tree
Item tree Support
Start with item P: Then with item M: Find Locally
root P:3:0 root
frequent items with respect to M:
Find Locally frequent A:4,C:4:F:3
Participation M:4:0
items with respect to
F:5 C:1 C:3:0 F:5 C:1
P: C3
frequent-path-bases
C:3 B:2 B:1 A:1 C:3 B:2 B:1 A:1 A4 A:4:0 FCA: 3
PC:3 frequent-path-base C4
A:3 P:1 M:1 All subsets of PC:3 are A:3 P:1 M:1 F3 C:4 :0
frequent and have the
M:2 B:1 same support M:2 B:1 F:3 :0
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [63] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [64]
All-COFI All-COFI
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [65] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [66]
All-COFI All-COFI
Co-Occurrences Frequent Item tree Co-Occurrences Frequent Item tree
How to mine frequent-path-bases How to mine frequent-path-bases
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [67] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [68]
All-COFI All-ECLAT
Co-Occurrences Frequent Item tree ECLAT
How to mine frequent-path-bases • For each item, store a list of transaction ids
(tids) Horizontal
Three approaches: Data Layout
Support of any pattern is the Vertical Data Layout
3: Leap-Traversal summation of the supports of its TID Items
A B C D E
supersets of frequent-path-bases 1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
1) Intersect non frequent path bases
FCA: 3 3 C,E 5 5 4 5 6
FCA: 3 ∩ CA:1 = CA 4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
2) Find subsets of the only CA:4 CF:3 AF:3 6 A,E 8 10
7 A,B 9
frequent paths (sure to be frequent) 8 A,B,C
C:4 A:4 F:3
3) Find the support of each pattern 9 A,C,D
10 B TID-list
M.J.Zaki IEEE transactions on Knowledge and data Engineering 00
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [69] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [70]
All-ECLAT All-ECLAT
ECLAT ECLAT
• Determine support of any k-itemset by intersecting tid-lists of two of its (k- Find all frequent patters with respect to item A
1) subsets. AB, AC, …. ABC, ABD, ACD, ABCD ……..
A B AB
1 1 1
5 Then it finds all frequent patters with respect to item B
∧ →
4 2
5 5 7 BC, BD, …. BCD, BDE, BCDE ……..
6 7 8
7 8
8 10 • 3 traversal approaches:
9 – top-down, bottom-up and hybrid
• Advantage: very fast support counting, Few scans of database (best
case 2)
• Disadvantage: intermediate tid-lists may become too large for
memory
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [71] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [72]
CLOSED-CHARM CLOSED-CHARM
CHARM Diffset Intersections (example)
TIDSET database DIFFSET database
5 5 3
4
AB 5
A B 5
6 6 4
1 1 1
6 5
4 2 5
5
6
∧ 5
7 → 7
8
7 8
8 10
9
AC AD AT AW CD CT CW DT DW TW
1 4 1 2 6 2 6 6
CLOSED-CLOSET+ CLOSED-CLOSET+
CLOSET+ CLOSET+
Use horizontal format of transactions • Two-Level hash-indexed results tree
Based on the FP-Growth Model • Compressed result tree structure
• Search space shrinking for subset checking
Divide the search space – If itemset Sc can be absorbed by another already mined itemset Sa, they
Find closed itemset recursively have the following relationships:
1) sup(Sc)=sup(Sa)
2) length(Sc)<length(Sa)
2 level hash indexed result tree structure for dense
3) ∀ i, j∈ Sc ⇒ i ∈ Sa
datasets – Measures to enhance the checking
Pseudo projection based upward-checking for sparse » Two-level hash indices – support and itemID
» Record length information in each result tree node
datasets
CLOSED-CLOSET+ CLOSED-CLOSET+
CLOSET+ CLOSET+
Pseudo-projection based upward checking Pseudo-projection based upward checking
• Result-tree may consume much more memory for • Result-tree may consume much more memory for
sparse datasets sparse datasets
• Subset checking without maintenance of history • Subset checking without maintenance of history
itemsets itemsets
– For a certain prefix X, as long as we can find any item which
– (1) appears in each prefix path w.r.t. prefix X, and
– (2) does not belong to X, any itemset with prefix X will be non-closed,
otherwise, if there’s no such item, the union of X and the complete set
of its locally frequent items with support sup(X) will form a closed
itemset.
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [77] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [78]
CLOSED-COFI-CLOSED CLOSED-COFI-CLOSED
COFI-CLOSED COFI-CLOSED
1 2 3 4 5
G 5 0
A 5 0
B 4 0 A D E 1 1 A B C D 2 2 A B C D E 1 1 D 2 0 E 3 0
A 5
C 4 0 NULL A B C E 1 1
B 4 C 2 0 C 1 0 D 2 0
D 4 0
Use Horizontal format E 3 0
Step 1: Assign frequent-path-bases to their locations in OPB
C
D
E
4
4
3
B 2 0 B 1 0 A 1 0 C 1 0
A 2 0 A 1 0 B 1 0
1 2 3 4 5
COFI- Mining approach A 5 0 A D E 1 1
A 1 0
B 4 0 A B C 0 0 A B C D 2 2 A B C D E 1 1
C 4 0 A B C E 1 1
Leap-traversal approach D
E
4
3
0
0
A D 0 0
A E 0 0
Example (Finding closed
Step 2: Apply leap-traversal approach on frequent-path-bases
MAXIMAL-MAXMINER MAXIMAL-MAXMINER
MaxMiner: Mining Max-patterns MaxMiner: Mining Max-patterns
• 1st scan: find frequent items Tid Items –Abandons a bottom-up traversal
– A, B, C, D, E 10 A,B,C,D,E –Attempts to “look-ahead”
20 B,C,D,E,
• 2nd scan: find support for 30 A,C,D,F
–Identify a long frequent itemset, prune all its subsets.
•Set-enumeration tree
– AB, AC, AD, AE, ABCDE {}
MAXIMAL-MAFIA MAXIMAL-MAFIA {}
MAFIA MAFIA 1 2 3 4
A Maximal Frequent Itemset Algorithm for Transactional Databases 1,2 1,3 1,4 2,3 2,4 3,4
.
K. Jouda and M. Zaki, ICDM’01 M. El-Hajj and O. Zaïane
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [85] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [86]
MAXIMAL-COFIMAX
700 350
1 2 3 4 5
G 5 0 600 300
A 5 0
Time in seconds
Time in seconds
500 250
B 4 0 A D E 1 1 A B C D 2 2 AB C D E 1 1 D 2 0 E 3 0
COFI-ALL COFI-CLOSED
C 4 0 NULL A B C E 1 1 A 5 400 200
D 4 0 B 4 C 2 0 C 1 0 D 2 0 FP-Growth FPCLOSED
300 150
E 3 0 C 4 MAFIA MAFIA
Step 1: Assign frequent-path-bases to their locations in OPB D 4 B 2 0 B 1 0 A 1 0 C 1 0 200 100
E 3
A 2 0 A 1 0 B 1 0 100 50
1 2 3 4 5
0 0
A 5 0 A 1 0
10K 50K 100K 250K 10K 50K 100K 250K
B 4 0 A D E 2 1 A B C D 3 2 AB C D E 1 1
C 4 0 NULL A B C E 2 1 transaction size transaction size
D 4 0
E 3 0
Step 2: Find global support for each frequent-path-base
Example (Finding Maximal
1 2 3 4 5
A 5 0 frequent patterns for a COFI- mining for MAXIMAL patterns in synthetic dataset
B 4 0 A D E 2 1 A B C D 3 2 AB C D E 1 1
C 4
D 4
0A E 3 0
0
A B C E 2 1
tree) 350
300
E 3 0
Time in seconds
NULL
A E 3 0
NULL A B C D 3 2
NULL
Frequent Path Bases 100
MAFIA
50
Step 4: Remove non-frequent patterns or frequent patterns that have
0
superset of frequent patterns
ABCD:2, ABCE:1, ADE:1, ABCDE:1 10K 50K 100K 250K
transaction size
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [87] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [88]
Mining Extremely large datasets Mining real datasets (Low, and High) support
Mining ALL frequent patterns Mining CLOSED frequent patterns
pumsb (ALL with high support) pumsb (CLOSED with high support)
5000 4000
120 18
4500 3500
4000 16
3000 100
Time in seconds
Time in seconds
3500 14
Time in seconds
Time in seconds
2500.00
Time in seconds
2500 COFI-MAX
2000.00 FPMAX
2000 COFI-MAX
1500.00 MAFIA
1500 FPMAX
GENMAX
1000.00
1000
500.00
500
0.00
0
40 35 30 25 20 15
5M 25M 50M 75M 100M
Support %
transaction size
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [89] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [90]
Which algorithm is the winner? Which algorithm is the winner?
What about Extremely large datasets (hundreds of
millions of transactions)?
Not clear yet
Most of the existing algorithms do not run on such sizes
With relatively small datasets we can find
different winners Vertical approaches and Bitmaps approaches cannot
load the transactions in Main Memory
1. By using different datasets
2. By changing the support level Reparative approaches cannot keep scanning these huge
databases many times
3. By changing the implementations
Requirements: We need algorithms that
1) do not require multiple scans of the database
2) Leave small foot print in Main Memory at any given time
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [91] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [92]
General Outline
• Association Rules
• Different Frequent Patterns
• Different Lattice Traversal Approaches
• Finding all the patterns in a database autonomously? •Useful for interactive and ad-hoc mining
— unrealistic! •Reduces the set of association rules discovered and confines
them to more relevant rules.
– The patterns could be too many but not focused!
• Before mining
• Data mining should be an interactive process 9 Knowledge type constraints: classification, etc.
– User directs what to be mined using a data mining query 9 Data constraints: SQL-like queries (DMQL)
language (or a graphical user interface) 9 Dimension/level constraints: relevance to some dimensions
and some concept levels.
• Constraint-based mining • While mining
– User flexibility: provides constraints on what to be mined 9 Rule constraints: form, size, and content.
– System optimization: explores such constraints for efficient 9 Interestingness constraints: support, confidence, correlation.
mining—constraint-based mining • After mining
9 Querying association rules
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [93] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [94]
General Outline
• Association Rules
• Different Frequent Patterns
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [105] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [106]
Parallel Association Rule Mining Algorithms Parallel Association Rule Mining Algorithms
REPLICATED candidate sets Candidate sets
Partitioned
Candidate Candidate Candidate
Replication Algorithms Set Set Set Replication Algorithms P. P. P.
-Partition the Candidate Set Candidate Candidate Candidate
Set Set Set
Partitioning Algorithms Partitioning Algorithms -Need to Scan the whole data
Replicate Candidate Sets
Hybrid Algorithms Partition the databases Hybrid Algorithms
Partitioned
database
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [107] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [108]
Parallel Association Rule Mining Algorithms Presentation of Association Rules (Table Form)
Candidate sets
Partitioned
Replication Algorithms P. P. P.
Candidate Candidate Candidate
Partitioning Algorithms -Combine both ideas of replication and Set Set Set
partition
Hybrid Algorithms
Partitioned
¾ Hybrid Distribution algorithm (E-H Han et all, 1997) database
DBMiner
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [109] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [110]
DBMiner
DBMiner
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [111] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [112]
General Outline
• Input
– A database D of sequences called data-sequences, in
which:
• I={i1, i2,…,in} is the set of items
• each sequence is a list of transactions ordered by transaction-time
• each transaction consists of fields: (sequence-id, transaction-id),
transaction-time and a set of items.
• Problem
– To discover all the sequential patterns with a user-
specified minimum support
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [113] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [114]
Input Database: example Example
Customer ID Transaction Time Items Bought
Sequence-Id Transaction Items 1 June 1 30 Customer ID Customer Sequence
Time 45% of customers who 1 June 30 90
bought Foundation will
C1 1 Ringworld 2 June 10 10,20
C1 2 Foundation 1 <(30) (90)>
Database D C1 15 Ringworld Engineers, Second Foundation buy Foundation and 2 June 15 30 2 <(10 20) (30) (40 60 70)>
2 June 20 40, 60, 70
C2 1 Foundation, Ringworld Empire within the next 3 <(30 50 70)>
3 June 25 30, 50, 70
C2 20 Foundation and Empire
month. 4 <(30) (40 70) (90)>
C2 50 Ringworld Engineers 4 June 20 30 5 <(90)>
4 June 25 40, 70
4 June 30 90 Customer sequence version of the database
•Subsequence: subsequence sequence 5 June 12 90
A sequence <a1, a2, …, an> is <(3)(4 5) (8)> <(7) (3 8) (9)(4 5 6) (8)> √ Data base Sorted by customer ID and transaction time
contained in another sequence <(3) (5)> < (3 5) > X
Sequential Patterns with support = 2 customer,
<b1, b2, …, bm> <(3) (5)> < (3 5) (3 5) > √ 2 sequences (25% support)
if there exists integers i1 < i2 < i3
<(30) (90)>
< …. <in such that a1 ⊆ bi1 ,,a2 ⊆ k-sequence: a sequence that consists of k items
<(30) (40 70)>
bi2 , …,… an ⊆ bin Examples: both <x,y> and <(x, y)> are
2-sequence Answer Set
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [115] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [116]
References:
Constraint-base Frequent Pattern Mining References: Sequential Pattern Mining
• R. Ng, L.V.S. Lakshmanan, J. Han & A. Pang. “Exploratory mining and pruning optimizations of constrained
association rules.” SIGMOD’98
• R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan.
• J. Pei, J. Han, and L. V. S. Lakshmanan, "Mining Frequent Itemsets with Convertible Constraints", Proc. 2001 Int. • R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance
Conf. on Data Engineering (ICDE'01), April 2001. improvements. EDBT’96.
• J. Pei and J. Han "Can We Push More Constraints into Frequent Pattern Mining?", Proc. 2000 Int. Conf. on • J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, "FreeSpan: Frequent Pattern-
Knowledge Discovery and Data Mining (KDD'00), Boston, MA, August 2000. Projected Sequential Pattern Mining", Proc. 2000 Int. Conf. on Knowledge Discovery and Data
• R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Mining (KDD'00), Boston, MA, August 2000.
Beach, California.
• H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences.
• C. Bucila, J. E. Gehrke, D. Kifer, and W. White. Dualminer: A dual-pruning algorithm for itemsets with Data Mining and Knowledge Discovery, 1:259-289, 1997.
constraints. In Proc. SIGKDD 2002.
• M. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data • J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining Sequential
cubes. KDD'97, 207-210, Newport Beach, California. Patterns Efficiently by Prefix-Projected Pattern Growth", Proc. 2001 Int. Conf. on Data Engineering
(ICDE'01), Heidelberg, Germany, April 2001.
Mining Maximal and Closed Patterns • B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, 412-421,
Orlando, FL.
• R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93, Seattle, Washington.
• S. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting patterns in
• J. Pei, J. Han, and R. Mao, "CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", Proc. 2000
ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD'00), Dallas, TX, May 2000. association rules. VLDB'98, 368-379, New York, NY.
• N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. • M.J. Zaki. Efficient enumeration of frequent sequences. CIKM’98. Novermber 1998.
ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999.
• M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining with Regular
• M. Zaki. Generating Non-Redundant Association Rules. KDD'00. Boston, MA. Aug. 2000.
Expression Constraints. VLDB 1999: 223-234, Edinburgh, Scotland.
• M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule Mining, TR99-10, Department of Computer
Science, Rensselaer Polytechnic Institute.
• M. Zaki, Fast Vertical Mining Using Diffsets, TR01-1, Department of Computer Science, Rensselaer Polytechnic
Institute.
• D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In
ICDE 2001. IEEE Computer Society, 2001
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [119] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [120]
References: Frequent-pattern Mining in Spatial,
Multimedia, Text & Web Databases References: Parallel Frequent itemset mining
• K. Koperski, J. Han, and G. B. Marchisio, "Mining Spatial and Image Data through Progressive Refinement Methods",
Revue internationale de gomatique (European Journal of GIS and Spatial Analysis), 9(4):425-440, 1999.
• Agrawal, R., Shafer, J.: Parallel Mining of Association Rules. IEEE Transactions in
• A. K. H. Tung, H. Lu, J. Han, and L. Feng, "Breaking the Barrier of Transactions: Mining Inter-Transaction Association
Knowledge and Data Eng. 8 (1996) pp. 962-969
Rules", Int. Conf. on Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, Aug. 1999, pp. 297-301.
• J. Han, G. Dong and Y. Yin, "Efficient Mining of Partial Periodic Patterns in Time Series Database", Proc. 1999 Int. • Barry Wilkinson, Michael Allen: Parallel Programming Techniques and applications using
Conf. on Data Engineering (ICDE'99), Sydney, Australia, March 1999, pp. 106-115. networked workstations and parallel computers. Alan Apt, New Jersy, USA. 1999
• H. Lu, L. Feng, and J. Han, "Beyond Intra-Transaction Association Analysis:Mining Multi-Dimensional Inter- • D. Cheung, K. Hu, S. Xia: Asynchronous Parallel Algorithm for Mining Association Rules on
Transaction Association Rules", ACM Transactions on Information Systems (TOIS’00), 18(4): 423-454, 2000. a Shared-memory Multi-processors. Proc. 10th ACM Symp. Parallel Algorithms and
• O. R. Zaiane, M. Xin, J. Han, "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining architectures,ACM Press, NY, 1998, pp 279 - 288
Technology on Web Logs," Proc. Advances in Digital Libraries Conf., Santa Barbara, CA, April 1998, pp. 19-29
• David Wai-Lok Cheung, Jiawei Han, Vincent Ng, Ada Wai-Chee Fu, Yongjian Fu: A Fast
• O. R. Zaiane, J. Han, and H. Zhu, "Mining Recurrent Items in Multimedia with Progressive Resolution Refinement",
Distributed Algorithm for Mining Association Rules. PDIS 1996: pp. 31-42
Proc. 2000 Int. Conf. on Data Engineering (ICDE'00), San Diego, CA, Feb. 2000, pp. 461-470.
• David Wai-Lok Cheung, Yongqiao Xiao: Effect of Data Distribution in Parallel Mining of
Associations. Data Mining and Knowledge Discovery 3(3): pp.291-314 (1999)
FIM for Classification & Data Cube Computation • E-H. Han, G. Karypis, and V. Kumar: Scalable parallel data mining for association rules. In
ACM SIGMOD Conf. Management of Data, May 1997
• K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. SIGMOD 1999, pp 359-370
• M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. • J. S. Park, M. Chen, and P. S. Yu: Efficient parallel data mining for association rules. In ACM
VLDB, 299-310, New York, NY, Aug. 1998. Intl. Conf. Infomration and Knowledge Management, November 1995
• J. Han, J. Pei, G. Dong, and K. Wang, Computing Iceberg Data Cubes with Complex Measures, SIGMOD 2001. • Mohammed J. Zaki and Ching-Tien Ho (editors): Large-scale Parallel Data Mining, Lecture
• K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. SIGMOD 1999 Notes in Artificial Intelligence, State-of-the-Art-Survey, Volume 1759, Springer-Verlag, 2000
• B. Liu, W. Hsu and Y. Ma, Integrating classification and association rule mining, SIGKDD 1998, pp 80-86
• M-L. Antonie, O. R. Zaïane, Text Document Categorization by Term Association , in Proc. of the IEEE International
Conference on Data Mining (ICDM'2002), pp 19-26, Maebashi City, Japan.
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [121] © Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [122]
© Osmar R. Zaïane & Mohammad El-Hajj, 2004 PAKDD 2004, Sydney Tutorial: Advances in Frequent Pattern Mining [123]