Data mining fp growth

Presented By: Shihab Rahman Dolon Chanpa Department Of Computer Science And Engineering, University of Dhaka

 FP Growth Stands for frequent pattern growth  It is a scalable technique for mining frequent pattern in a database

 FP growth improves Apriority to a big extent  Frequent Item set Mining is possible without candidate generation  Only “two scan” to the database is needed BUT HOW?

 Simply a two step procedure – Step 1: Build a compact data structure called the FP-tree • Built using 2 passes over the data-set. – Step 2: Extracts frequent item sets directly from the FP- tree

 Now Lets Consider the following transaction table TID List of item IDs T100 I1,I2,I3 T200 I2,I4 T300 I2,I3 T400 I1,I2,I4 T500 I1,I3 T600 I2,I3 T700 I1,I3 T800 I1,I2,I3,I5 T900 I1,I2,I3

 Now we will build a FP tree of that database  Item sets are considered in order of their descending value of support count.

null I2:1 I1:1 I5:1 For Transaction: I2,I1,I5

null I2:2 I1:1 I5:1 I4:1 For Transaction: I2,I4

null I2:3 I1:1 I5:1 I3:1 I4:1 For Transaction: I2,I3

null I2:4 I1:2 I5:1 I3:1 I4:1 I4:1 For Transaction: I2,I1,I4

null I2:4 I1:2 I3:1 I4:1 I4:1 For Transaction: I1,I3 I5:1 I1:1 I3:1

null I2:6 I1:3 I3:1 I3:2 I5:1 I4:1 I4:1 For Transaction: I2,I1,I3,I5 I5:1 I1:2 I3:2

null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 For Transaction: I2,I1,I3 I1:2 I3:2 I5:1 Almost Over!

I2 7 I1 6 I3 6 I4 2 I5 2 null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 To facilitate tree traversal, an item header table is built so that each item points to its occurrences in the tree via a chain of node-links. I1:2 I3:2 I5:1 FP Tree Construction Over!! Now we need to find conditional pattern base and Conditional FP Tree for each item

null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I5 {{I2,I1:1},{I2,I1,I3:1}} Conditional FP Tree for I5:{I2:2,I1:2}

null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I4 {{I2,I1:1},{I2:1}} Conditional FP Tree for I4:{I2:2}

null I2:7 I14 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I3 {{I2,I1:2},{I2:2},{I1:2}} Conditional FP Tree for I3:{I2:4,I1:2},{I1:2}

null I2:7 I1:4 I3:2 I3:2 I5:1 I4:1 I4:1 I1:2 I3:2 I5:1 Conditional Pattern Base I1 {{I2:4}} Conditional FP Tree for I1:{I2:4}

Frequent Pattern Generated I5 {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2} I4 {I2, I4: 2} I3 {I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2} I1 {I2, I1: 4}

 Advantages of FP-Growth  only 2 passes over data-set  “compresses” data-set  no candidate generation  much faster than Apriori  Disadvantages of FP-Growth  FP-Tree may not fit in memory!!  FP-Tree is expensive to build 0 10 20 30 40 50 60 70 80 90 100 0 0.5 1 1.5 2 2.5 3 Support threshold(%) Runtime(sec.) D1 FP-grow th runtime D1 Apriori runtime

Data mining fp growth

In this document

More Related Content

What's hot

Similar to Data mining fp growth

Recently uploaded

Data mining fp growth