PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Frequent pattern mining aims to discover all interesting patterns in a transactional database that have support no less than the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions in which a pattern must appear in a database.
Reference: Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD ‘93). Association for Computing Machinery, New York, NY, USA, 207–216. link
A transactional database is an unordered collection of transactions. A transaction represents a pair constituting of transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g is shown below
| tid | Transactions |
|---|---|
| 1 | a b c g |
| 2 | b c d e |
| 3 | a b c d |
| 4 | a c d f |
| 5 | a b c d g |
| 6 | c d e f |
| 7 | a b c d |
| 8 | a e f |
| 9 | a b c d |
| 10 | b c d e |
Note: Duplicate items must not exist within a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs.
A sample transactional database, say sampleTransactionalDatabase.txt, is provided below.
a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ') obj.run() obj.printStats() The input parameters to a frequent pattern mining algorithm are:
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a frequent pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/frequentPattern/basic foo@bar:python3 algorithmName.py inputFile outputFile minSup seperator Example: python3 Apriori.py inputFile.txt outputFile.txt 3 ' '
import PAMI.frequentPattern.basic.Apriori as alg iFile = 'sampleTransactionalDatabase.txt' #specify the input transactional database minSup = 5 #specify the minSup value seperator = ' ' #specify the seperator. Default seperator is tab space. oFile = 'frequentPatterns.txt' #specify the output file name obj = alg.Apriori(iFile, minSup, seperator) #initialize the algorithm obj.mine() #start the mining process obj.save(oFile) #store the patterns in file df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe obj.printResults() #Print the stats of mining process Frequent patterns were generated successfully using Apriori algorithm Total number of Frequent Patterns: 13 Total Memory in USS: 81133568 Total Memory in RSS 119091200 Total ExecutionTime in ms: 0.00026297569274902344 !cat frequentPatterns.txt #format: frequentPattern:support a:7 b:7 c:9 d:8 b a:5 c a:6 c b:7 b d:6 c d:8 d a:5 c b a:5 c b d:6 c d a:5 The dataframe containing the patterns is shown below:
df #The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t). | Patterns | Support | |
|---|---|---|
| 0 | a | 7 |
| 1 | b | 7 |
| 2 | c | 9 |
| 3 | d | 8 |
| 4 | b a | 5 |
| 5 | c a | 6 |
| 6 | c b | 7 |
| 7 | b d | 6 |
| 8 | c d | 8 |
| 9 | d a | 5 |
| 10 | c b a | 5 |
| 11 | c b d | 6 |
| 12 | c d a | 5 |