Professional Documents
Culture Documents
Association Rules & Sequential Patterns: Road Map
Association Rules & Sequential Patterns: Road Map
Road map
1
2
3
4
Key Features
Completeness: find all rules.
No target item(s) on the right-hand-side
Mining with data on hard disk (not in memory)
5
minsup = 30%
minconf = 80%
An example frequent itemset:
{Chicken, Clothes, Milk} [sup = 3/7]
Association rules from the itemset:
Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
… …
Clothes, Chicken → Milk, [sup = 3/7, conf = 3/3]
6
Road map
7
AB AC AD BC BD CD
A B C D
8
The Algorithm
Iterative algo. (also called level-wise search):
Find all 1-item frequent itemsets; then all 2-item
frequent itemsets, and so on.
In each iteration k, only consider itemsets that
contain some k-1 frequent itemset.
Find frequent itemsets of size 1: F1
From k = 2
Ck = candidates of size k: those itemsets of size k
that could be frequent, given Fk-1
Fk = those itemsets that are actually frequent, Fk
⊆ Ck (need to scan the database once).
9
10
Candidate-gen function
Function candidate-gen(Fk-1)
Ck ← ∅;
forall f1, f2 ∈ Fk-1
with f1 = {i1, … , ik-2, ik-1}
and f2 = {i1, … , ik-2, i’k-1}
and ik-1 < i’k-1 do
c ← {i1, …, ik-1, i’k-1}; // join f1 and f2
Ck ← Ck ∪ {c};
for each (k-1)-subset s of c do
if (s ∉ Fk-1) then
delete c from Ck; // prune
end
end
return Ck;
11
An example
F3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4},
{1, 3, 5}, {2, 3, 4}}
After join
C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}
After pruning:
C4 = {{1, 2, 3, 4}}
because {1, 4, 5} is not in F3 ({1, 3, 4, 5} is removed)
12
13
On Apriori Algorithm
14
Road map
15
⇒ Transaction form:
(Attr1, a), (Attr2, b), (Attr3, d)
(Attr1, b), (Attr2, c), (Attr3, e)
Road map
16
17
Minsup of a rule
18
An Example
19
Road map
20
Problem definition
Let T be a transaction data set consisting of n
transactions.
Each transaction is also labeled with a class y.
Let I be the set of all items in T, Y be the set of all
class labels and I ∩ Y = ∅.
A class association rule (CAR) is an implication of
the form
X → y, where X ⊆ I, and y ∈ Y.
The definitions of support and confidence are the
same as those for normal association rules.
21
An example
A text document data set
doc 1: Student, Teach, School : Education
doc 2: Student, School : Education
doc 3: Teach, School, City, Game : Education
doc 4: Baseball, Basketball : Sport
doc 5: Basketball, Player, Spectator : Sport
doc 6: Baseball, Coach, Game, Team : Sport
doc 7: Basketball, Team, City, Game : Sport
Let minsup = 20% and minconf = 60%. The following are two
examples of class association rules:
Student, School → Education [sup= 2/7, conf = 2/2]
game → Sport [sup= 2/7, conf = 2/3]
Mining algorithm
Unlike normal association rules, CARs can be mined
directly in one step.
The key operation is to find all ruleitems that have
support above minsup. A ruleitem is of the form:
(condset, y)
where condset is a set of items from I (i.e., condset
⊆ I), and y ∈ Y is a class label.
Each ruleitem basically represents a rule:
condset → y,
The Apriori algorithm can be modified to generate
CARs
22
Road map
23
Summary
Association rule mining has been extensively studied
in the data mining community.
So is sequential pattern mining
There are many efficient algorithms and model
variations.
Other related work includes
Multi-level or generalized rule mining
Constrained rule mining
Incremental rule mining
Maximal frequent itemset mining
Closed itemset mining
Rule interestingness and visualization
Parallel algorithms
…
Web Text Mining: Association Mining 47
24