0% found this document useful (0 votes)
17 views8 pages

Apriori

The document provides a detailed explanation of the Apriori algorithm using the mlxtend library in Python, including installation instructions and example code for generating frequent itemsets and association rules from transaction datasets. It covers three different datasets: a custom dataset of grocery items, the 'tips' dataset from seaborn, and a small transaction dataset of food items. The results include frequent itemsets and association rules with various metrics such as support, confidence, and lift.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Apriori

The document provides a detailed explanation of the Apriori algorithm using the mlxtend library in Python, including installation instructions and example code for generating frequent itemsets and association rules from transaction datasets. It covers three different datasets: a custom dataset of grocery items, the 'tips' dataset from seaborn, and a small transaction dataset of food items. The results include frequent itemsets and association rules with various metrics such as support, confidence, and lift.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

APRIORI ALGORITHM

pip install mlxtend

Collecting mlxtend
Requirement already satisfied: scipy>=1.2.1 in c:\users\cathy\
anaconda3\lib\site-packages (from mlxtend) (1.10.1)
Requirement already satisfied: numpy>=1.16.2 in c:\users\cathy\
anaconda3\lib\site-packages (from mlxtend) (1.24.3)
Requirement already satisfied: pandas>=0.24.2 in c:\users\cathy\
anaconda3\lib\site-packages (from mlxtend) (1.5.3)
Collecting scikit-learn>=1.3.1 (from mlxtend)
Downloading scikit_learn-1.6.1-cp311-cp311-win_amd64.whl (11.1 MB)
Requirement already satisfied: matplotlib>=3.0.0 in c:\users\cathy\
anaconda3\lib\site-packages (from mlxtend) (3.7.1)
Requirement already satisfied: joblib>=0.13.2 in c:\users\cathy\
anaconda3\lib\site-packages (from mlxtend) (1.2.0)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (1.0.5)
Requirement already satisfied: cycler>=0.10 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (23.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (10.0.1)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\cathy\
anaconda3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\cathy\
anaconda3\lib\site-packages (from pandas>=0.24.2->mlxtend) (2022.7)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.3.1->mlxtend)
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Requirement already satisfied: six>=1.5 in c:\users\cathy\anaconda3\
lib\site-packages (from python-dateutil>=2.7->matplotlib>=3.0.0-
>mlxtend) (1.16.0)
Installing collected packages: threadpoolctl, scikit-learn, mlxtend
Attempting uninstall: threadpoolctl
Found existing installation: threadpoolctl 2.2.0
Uninstalling threadpoolctl-2.2.0:
Successfully uninstalled threadpoolctl-2.2.0
Attempting uninstall: scikit-learn

Found existing installation: scikit-learn 1.2.

Uninstalling scikit-learn-1.2.2:
Successfully uninstalled scikit-learn-1.2.2
Successfully installed mlxtend-0.23.4 scikit-learn-1.6.1
threadpoolctl-3.6.0
Note: you may need to restart the kernel to use updated packages.

QUESTION1:

from mlxtend.frequent_patterns import apriori, association_rules


import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

dataset = [
['Milk', 'Eggs', 'Butter'],
['Milk', 'Eggs'],
['Eggs', 'Butter'],
['Milk', 'Butter'],
['Eggs', 'Butter']
]

te = TransactionEncoder()
oht = te.fit_transform(dataset)
df_oht = pd.DataFrame(oht, columns=te.columns_)

# Apply Apriori algorithm to find frequent itemsets


frequent_itemsets = apriori(df_oht, min_support=0.4,
use_colnames=True)

rules = association_rules(frequent_itemsets, metric="confidence",


min_threshold=0.5)

print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
support itemsets
0 0.8 (Butter)
1 0.8 (Eggs)
2 0.6 (Milk)
3 0.6 (Eggs, Butter)
4 0.4 (Milk, Butter)
5 0.4 (Milk, Eggs)

Association Rules:
antecedents consequents antecedent support consequent support
support \
0 (Eggs) (Butter) 0.8 0.8
0.6
1 (Butter) (Eggs) 0.8 0.8
0.6
2 (Milk) (Butter) 0.6 0.8
0.4
3 (Butter) (Milk) 0.8 0.6
0.4
4 (Milk) (Eggs) 0.6 0.8
0.4
5 (Eggs) (Milk) 0.8 0.6
0.4

confidence lift representativity leverage conviction \


0 0.750000 0.937500 1.0 -0.04 0.8
1 0.750000 0.937500 1.0 -0.04 0.8
2 0.666667 0.833333 1.0 -0.08 0.6
3 0.500000 0.833333 1.0 -0.08 0.8
4 0.666667 0.833333 1.0 -0.08 0.6
5 0.500000 0.833333 1.0 -0.08 0.8

zhangs_metric jaccard certainty kulczynski


0 -0.250000 0.6 -0.250000 0.750000
1 -0.250000 0.6 -0.250000 0.750000
2 -0.333333 0.4 -0.666667 0.583333
3 -0.500000 0.4 -0.250000 0.583333
4 -0.333333 0.4 -0.666667 0.583333
5 -0.500000 0.4 -0.250000 0.583333

Question 2:

import seaborn as sns


import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings('ignore', category=RuntimeWarning)
# Load the tips dataset from seaborn
tips = sns.load_dataset('tips')
transaction = pd.get_dummies(tips[['day', 'time', 'sex', 'smoker',
'size']])
transaction = transaction.apply(lambda x: x > 0)
frequent_itemsets = apriori(transaction, min_support=0.1,
use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1.0)
rules = rules.dropna(subset=['lift', 'confidence'])
print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
support itemsets
0 1.000000 (size)
1 0.254098 (day_Thur)
2 0.356557 (day_Sat)
3 0.311475 (day_Sun)
4 0.278689 (time_Lunch)
.. ... ...
100 0.131148 (time_Dinner, day_Sat, smoker_No, sex_Male)
101 0.176230 (time_Dinner, sex_Male, smoker_No, day_Sun)
102 0.110656 (day_Sat, smoker_Yes, size, sex_Male, time_Din...
103 0.131148 (day_Sat, size, smoker_No, sex_Male, time_Dinner)
104 0.176230 (size, day_Sun, smoker_No, sex_Male, time_Dinner)
Association Rules:
antecedents consequents \
0 (size) (day_Thur)
1 (day_Thur) (size)
2 (day_Sat) (size)
3 (size) (day_Sat)
4 (size) (day_Sun)
.. ... ...
543 (size) (time_Dinner, sex_Male, smoker_No, day_Sun)
544 (day_Sun) (time_Dinner, sex_Male, smoker_No, size)
545 (smoker_No) (sex_Male, time_Dinner, size, day_Sun)
546 (sex_Male) (time_Dinner, smoker_No, size, day_Sun)
547 (time_Dinner) (sex_Male, smoker_No, size, day_Sun)

antecedent support consequent support support confidence


lift \
0 1.000000 0.254098 0.254098 0.254098
1.000000
1 0.254098 1.000000 0.254098 1.000000
1.000000
2 0.356557 1.000000 0.356557 1.000000
1.000000
3 1.000000 0.356557 0.356557 0.356557
1.000000
4 1.000000 0.311475 0.311475 0.311475
1.000000
.. ... ... ... ...
...
543 1.000000 0.176230 0.176230 0.176230
1.000000
544 0.311475 0.315574 0.176230 0.565789
1.792891
545 0.618852 0.237705 0.176230 0.284768
1.197990
546 0.643443 0.233607 0.176230 0.273885
1.172421
547 0.721311 0.176230 0.176230 0.244318
1.386364

representativity leverage conviction zhangs_metric


jaccard \
0 1.0 0.000000 1.000000 0.000000 0.254098

1 1.0 0.000000 inf 0.000000 0.254098

2 1.0 0.000000 inf 0.000000 0.356557

3 1.0 0.000000 1.000000 0.000000 0.356557

4 1.0 0.000000 1.000000 0.000000 0.311475


.. ... ... ... ... ...

543 1.0 0.000000 1.000000 0.000000 0.176230

544 1.0 0.077936 1.576254 0.642303 0.390909

545 1.0 0.029125 1.065801 0.433608 0.259036

546 1.0 0.025917 1.055472 0.412457 0.251462

547 1.0 0.049113 1.090102 1.000000 0.244318

certainty kulczynski
0 0.000000 0.627049
1 0.000000 0.627049
2 0.000000 0.678279
3 0.000000 0.678279
4 0.000000 0.655738
.. ... ...
543 0.000000 0.588115
544 0.365585 0.562116
545 0.061739 0.513074
546 0.052556 0.514136
547 0.082655 0.622159

QUESTION 3:

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
# Suppress the warning
warnings.filterwarnings('ignore', category=DeprecationWarning)
# Example of a small transaction dataset
data = {
'milk': [1, 0, 1, 1, 0],
'bread': [1, 1, 1, 0, 1],
'butter': [0, 1, 1, 1, 0],
'cheese': [1, 0, 1, 1, 1]
}
df = pd.DataFrame(data)
transaction = df.apply(lambda x: x > 0)

frequent_itemsets = apriori(transaction, min_support=0.1,


use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1.0)

print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)
Frequent Itemsets:
support itemsets
0 0.6 (milk)
1 0.8 (bread)
2 0.6 (butter)
3 0.8 (cheese)
4 0.4 (milk, bread)
5 0.4 (milk, butter)
6 0.6 (milk, cheese)
7 0.4 (bread, butter)
8 0.6 (cheese, bread)
9 0.4 (cheese, butter)
10 0.2 (milk, bread, butter)
11 0.4 (milk, bread, cheese)
12 0.4 (cheese, milk, butter)
13 0.2 (cheese, bread, butter)
14 0.2 (cheese, milk, bread, butter)

Association Rules:
antecedents consequents antecedent
support \
0 (milk) (butter)
0.6
1 (butter) (milk)
0.6
2 (milk) (cheese)
0.6
3 (cheese) (milk)
0.8
4 (milk, bread) (cheese)
0.4
5 (cheese, bread) (milk)
0.6
6 (milk) (cheese, bread)
0.6
7 (cheese) (milk, bread)
0.8
8 (cheese, milk) (butter)
0.6
9 (cheese, butter) (milk)
0.4
10 (milk, butter) (cheese)
0.4
11 (cheese) (milk, butter)
0.8
12 (milk) (cheese, butter)
0.6
13 (butter) (cheese, milk)
0.6
14 (cheese, bread, butter) (milk)
0.2
15 (milk, bread, butter) (cheese)
0.2
16 (cheese, butter) (milk, bread)
0.4
17 (milk, bread) (cheese, butter)
0.4
18 (cheese) (milk, bread, butter)
0.8
19 (milk) (cheese, bread, butter)
0.6

consequent support support confidence lift


representativity \
0 0.6 0.4 0.666667 1.111111
1.0
1 0.6 0.4 0.666667 1.111111
1.0
2 0.8 0.6 1.000000 1.250000
1.0
3 0.6 0.6 0.750000 1.250000
1.0
4 0.8 0.4 1.000000 1.250000
1.0
5 0.6 0.4 0.666667 1.111111
1.0
6 0.6 0.4 0.666667 1.111111
1.0
7 0.4 0.4 0.500000 1.250000
1.0
8 0.6 0.4 0.666667 1.111111
1.0
9 0.6 0.4 1.000000 1.666667
1.0
10 0.8 0.4 1.000000 1.250000
1.0
11 0.4 0.4 0.500000 1.250000
1.0
12 0.4 0.4 0.666667 1.666667
1.0
13 0.6 0.4 0.666667 1.111111
1.0
14 0.6 0.2 1.000000 1.666667
1.0
15 0.8 0.2 1.000000 1.250000
1.0
16 0.4 0.2 0.500000 1.250000
1.0
17 0.4 0.2 0.500000 1.250000
1.0
18 0.2 0.2 0.250000 1.250000
1.0
19 0.2 0.2 0.333333 1.666667
1.0

leverage conviction zhangs_metric jaccard certainty


kulczynski
0 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
1 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
2 0.12 inf 0.500000 0.750000 1.000000
0.875000
3 0.12 1.600000 1.000000 0.750000 0.375000
0.875000
4 0.08 inf 0.333333 0.500000 1.000000
0.750000
5 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
6 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
7 0.08 1.200000 1.000000 0.500000 0.166667
0.750000
8 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
9 0.16 inf 0.666667 0.666667 1.000000
0.833333
10 0.08 inf 0.333333 0.500000 1.000000
0.750000
11 0.08 1.200000 1.000000 0.500000 0.166667
0.750000
12 0.16 1.800000 1.000000 0.666667 0.444444
0.833333
13 0.04 1.200000 0.250000 0.500000 0.166667
0.666667
14 0.08 inf 0.500000 0.333333 1.000000
0.666667
15 0.04 inf 0.250000 0.250000 1.000000
0.625000
16 0.04 1.200000 0.333333 0.333333 0.166667
0.500000
17 0.04 1.200000 0.333333 0.333333 0.166667
0.500000
18 0.04 1.066667 1.000000 0.250000 0.062500
0.625000
19 0.08 1.200000 1.000000 0.333333 0.166667
0.666667

You might also like