Skip to content

Commit 737a2d7

Browse files
committed
Initial commit
0 parents commit 737a2d7

File tree

491 files changed

+118275
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

491 files changed

+118275
-0
lines changed

.gitignore

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
private
2+
images
3+
# Created by https://www.toptal.com/developers/gitignore/api/python,pycharm+all
4+
# Edit at https://www.toptal.com/developers/gitignore?templates=python,pycharm+all
5+
6+
### PyCharm+all ###
7+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
8+
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
9+
10+
# User-specific stuff
11+
.idea/**/workspace.xml
12+
.idea/**/tasks.xml
13+
.idea/**/usage.statistics.xml
14+
.idea/**/dictionaries
15+
.idea/**/shelf
16+
17+
# Generated files
18+
.idea/**/contentModel.xml
19+
20+
# Sensitive or high-churn files
21+
.idea/**/dataSources/
22+
.idea/**/dataSources.ids
23+
.idea/**/dataSources.local.xml
24+
.idea/**/sqlDataSources.xml
25+
.idea/**/dynamic.xml
26+
.idea/**/uiDesigner.xml
27+
.idea/**/dbnavigator.xml
28+
29+
# Gradle
30+
.idea/**/gradle.xml
31+
.idea/**/libraries
32+
33+
# Gradle and Maven with auto-import
34+
# When using Gradle or Maven with auto-import, you should exclude module files,
35+
# since they will be recreated, and may cause churn. Uncomment if using
36+
# auto-import.
37+
# .idea/artifacts
38+
# .idea/compiler.xml
39+
# .idea/jarRepositories.xml
40+
# .idea/modules.xml
41+
# .idea/*.iml
42+
# .idea/modules
43+
# *.iml
44+
# *.ipr
45+
46+
# CMake
47+
cmake-build-*/
48+
49+
# Mongo Explorer plugin
50+
.idea/**/mongoSettings.xml
51+
52+
# File-based project format
53+
*.iws
54+
55+
# IntelliJ
56+
out/
57+
58+
# mpeltonen/sbt-idea plugin
59+
.idea_modules/
60+
61+
# JIRA plugin
62+
atlassian-ide-plugin.xml
63+
64+
# Cursive Clojure plugin
65+
.idea/replstate.xml
66+
67+
# Crashlytics plugin (for Android Studio and IntelliJ)
68+
com_crashlytics_export_strings.xml
69+
crashlytics.properties
70+
crashlytics-build.properties
71+
fabric.properties
72+
73+
# Editor-based Rest Client
74+
.idea/httpRequests
75+
76+
# Android studio 3.1+ serialized cache file
77+
.idea/caches/build_file_checksums.ser
78+
79+
### PyCharm+all Patch ###
80+
# Ignores the whole .idea folder and all .iml files
81+
# See https://github.com/joeblau/gitignore.io/issues/186 and https://github.com/joeblau/gitignore.io/issues/360
82+
83+
.idea/
84+
85+
# Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-249601023
86+
87+
*.iml
88+
modules.xml
89+
.idea/misc.xml
90+
*.ipr
91+
92+
# Sonarlint plugin
93+
.idea/sonarlint
94+
95+
### Python ###
96+
# Byte-compiled / optimized / DLL files
97+
__pycache__/
98+
*.py[cod]
99+
*$py.class
100+
101+
# C extensions
102+
*.so
103+
104+
# Distribution / packaging
105+
.Python
106+
build/
107+
develop-eggs/
108+
dist/
109+
downloads/
110+
eggs/
111+
.eggs/
112+
lib/
113+
lib64/
114+
parts/
115+
sdist/
116+
var/
117+
wheels/
118+
pip-wheel-metadata/
119+
share/python-wheels/
120+
*.egg-info/
121+
.installed.cfg
122+
*.egg
123+
MANIFEST
124+
125+
# PyInstaller
126+
# Usually these files are written by a python script from a template
127+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
128+
*.manifest
129+
*.spec
130+
131+
# Installer logs
132+
pip-log.txt
133+
pip-delete-this-directory.txt
134+
135+
# Unit test / coverage reports
136+
htmlcov/
137+
.tox/
138+
.nox/
139+
.coverage
140+
.coverage.*
141+
.cache
142+
nosetests.xml
143+
coverage.xml
144+
*.cover
145+
*.py,cover
146+
.hypothesis/
147+
.pytest_cache/
148+
pytestdebug.log
149+
150+
# Translations
151+
*.mo
152+
*.pot
153+
154+
# Django stuff:
155+
*.log
156+
local_settings.py
157+
db.sqlite3
158+
db.sqlite3-journal
159+
160+
# Flask stuff:
161+
instance/
162+
.webassets-cache
163+
164+
# Scrapy stuff:
165+
.scrapy
166+
167+
# Sphinx documentation
168+
docs/_build/
169+
doc/_build/
170+
171+
# PyBuilder
172+
target/
173+
174+
# Jupyter Notebook
175+
.ipynb_checkpoints
176+
177+
# IPython
178+
profile_default/
179+
ipython_config.py
180+
181+
# pyenv
182+
.python-version
183+
184+
# pipenv
185+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
186+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
187+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
188+
# install all needed dependencies.
189+
#Pipfile.lock
190+
191+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
192+
__pypackages__/
193+
194+
# Celery stuff
195+
celerybeat-schedule
196+
celerybeat.pid
197+
198+
# SageMath parsed files
199+
*.sage.py
200+
201+
# Environments
202+
.env
203+
.venv
204+
env/
205+
venv/
206+
ENV/
207+
env.bak/
208+
venv.bak/
209+
210+
# Spyder project settings
211+
.spyderproject
212+
.spyproject
213+
214+
# Rope project settings
215+
.ropeproject
216+
217+
# mkdocs documentation
218+
/site
219+
220+
# mypy
221+
.mypy_cache/
222+
.dmypy.json
223+
dmypy.json
224+
225+
# Pyre type checker
226+
.pyre/
227+
228+
# pytype static type analyzer
229+
.pytype/
230+
231+
# End of https://www.toptal.com/developers/gitignore/api/python,pycharm+all
232+

README.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Multi-Directional Rule Set Learning
2+
3+
This project contains a Python module for learning Multi-directional rule sets, accompanying the paper:
4+
> Schouterden J., Davis J., Blockeel H.: *Multi-Directional Rule Set Learning.* To be presented at: Discovery Science 2020
5+
6+
__________________________________
7+
[Abstract](https://github.com/joschout/Multi-Directional_Rule_Set_Learning#abstract) -
8+
[Basic use](https://github.com/joschout/Multi-Directional_Rule_Set_Learning#basic-use) -
9+
[Experiments](https://github.com/joschout/Multi-Directional_Rule_Set_Learning#experiments) -
10+
[Dependencies](https://github.com/joschout/Multi-Directional_Rule_Set_Learning#dependencies) -
11+
[References](https://github.com/joschout/Multi-Directional_Rule_Set_Learning#references)
12+
_________________
13+
14+
15+
## Abstract
16+
The following is the abstract of our paper:
17+
18+
>A rule set is a type of classifier that, given attributes X, predicts a target Y. Its main advantage over other types of classifiers is its simplicity and interpretability. A practical challenge is that the end user of a rule set does not always know in advance which target will need to be predicted. One way to deal with this is to learn a multi-directional rule set, which can predict any attribute from all others.
19+
An individual rule in such a multi-directional rule set can have multiple targets in its head, and thus be used to predict any one of these.
20+
>
21+
>Compared to the naive approach of learning one rule set for each possible target and merging them, a multi-directional rule set containing multi-target rules is potentially smaller and more interpretable. Training a multi-directional rule set involves two key steps: generating candidate rules and selecting rules. However, the best way to tackle these steps remains an open question.
22+
>
23+
>In this paper, we investigate the effect of using Random Forests as candidate rule generators and
24+
propose two new approaches for selecting rules with multi-target heads:
25+
MIDS, a generalization of the recent single-target IDS approach, and RR, a new simple algorithm focusing only on predictive performance.
26+
>
27+
>Our experiments indicate that (1) using multi-target rules leads to smaller rule sets with a similar predictive performance, (2) using Forest-derived rules instead of association rules leads to rule sets of similar quality, and (3) RR outperforms MIDS, underlining the usefulness of simple selection objectives.
28+
29+
## Basic use
30+
31+
The basic use of IDS, RR and MIDS is illustrated by the following Jupyter notebooks:
32+
33+
* [Single-target Interpretable Decision Sets (IDS)](./notebooks/basic_use/ids_on_titanic.ipynb)
34+
* [Round Robin (RR)](./notebooks/basic_use/rr_on_titanic.ipynb)
35+
* [Multi-directional IDS (MIDS)](./notebooks/basic_use/mids_on_titanic.ipynb)
36+
37+
To use this project as a Python module, you can install it in your Python environment after cloning this repository as follows:
38+
```shell
39+
git clone https://github.com/joschout/Multi-Directional_Rule_Set_Learning.git
40+
cd Multi-Directional_Rule_Set_Learning/
41+
python setup.py install develop --user
42+
```
43+
44+
## Experiments
45+
46+
In our paper, we include two sets of experiments. Here, we describe how to reproduce these experiments. We use data generated using the scripts from the [arcBench benchmarking suit](https://github.com/kliegr/arcBench), by Tomas Kliegr. You can find the data we used [in this repository, in the `data` directory](./data).
47+
48+
### 1. Comparing models generated from association rules and Random Forest derived rules.
49+
50+
First, we compared the single-target association rules with single-target rules derived from Random Forest trees as the candidate rule set, out of which the rules of the associative classifier are selected.
51+
As a rule selection method, we used single-target IDS. (Note: in our experiments, we use our MIDS implementation, which corresponds to IDS when given single-target rules.)
52+
The code for these experiments can be found in [`experiments/e1_st_association_vs_tree_rules`](./experiments/e1_st_association_vs_tree_rules).
53+
54+
To reproduce our experiments, do the following for each candidate rule set type:
55+
* When considering single-target **association rules** as the candidate rule set:
56+
1. [Mine single-target association rules.](./experiments/e1_st_association_vs_tree_rules/rule_mining/single_target_car_mining_ifo_confidence_level.py)
57+
2. [Fit an AR-IDS model.](./experiments/e1_st_association_vs_tree_rules/model_induction/single_target_car_mids_model_induction.py) That is, use IDS to select a subset of the candidate single-target association rules.
58+
3. [Evaluate the AR-IDS model on the test data](./experiments/e1_st_association_vs_tree_rules/model_evaluation/single_target_car_mids_model_evaluation.py), measuring both predictive performance and interpretability.
59+
* When considering single-target **rules derived from random forests (i.e. decision trees)** as the candidate rule set:
60+
1. [Generate rules from single-target Random Forests.](./experiments/e1_st_association_vs_tree_rules/rule_mining/single_target_tree_rule_generation_ifo_confidence_bound.py)
61+
2. [Fit a T-IDS model.](./experiments/e1_st_association_vs_tree_rules/model_induction/single_target_tree_mids_model_induction.py) That is, use IDS to select a subset of the candidate rules derived from single-target Random Forest trees.
62+
3. [Evaluate the T-IDS model on the test data](./experiments/e1_st_association_vs_tree_rules/model_evaluation/single_target_tree_mids_model_evaluation.py), measuring both predictive performance and interpretability.
63+
64+
### 2. Comparing multi-directional model generated from multi-target and single-target tree rules
65+
66+
In our second experiment, we compare using multi-target rules with single-target rules as the candidate rule set. From the multi-target rules, we fit two multi-directional model using Round Robin and MIDS as the rule selectors. From the single-target rules, we fit an ensemble of single-target IDS models.
67+
68+
To reproduce our experiments, do the following steps for each rule type:
69+
70+
* When using multi-target rules:
71+
1. [Generate multi-target rules from multi-target Random Forest trees.](./experiments/e2_multi_directional_model_comparison/rule_mining/mine_multi_target_rules_from_random_forests2.py)
72+
2. Choose which a rule selector able to select multi-directonal rules:
73+
* When using **Round Robin** as the rule selector, do:
74+
1. [Fit a multi-directional RR model.](./experiments/e2_multi_directional_model_comparison/model_induction/round_robin_tree_based_model_induction.py)
75+
2. [Evaluate the RR model on the test data.](./experiments/e2_multi_directional_model_comparison/model_evaluation/round_robin_tree_based_model_evaluation.py)
76+
* When using **MIDS** as the rule selector:
77+
1. [Fit a multi-directional MIDS model.](./experiments/e2_multi_directional_model_comparison/model_induction/mids_tree_based_model_induction.py)
78+
2. [Evaluate the MIDS model on the test data.](./experiments/e2_multi_directional_model_comparison/model_evaluation/mids_tree_based_model_evaluation.py)
79+
80+
* When using single-target rules:
81+
1. [Genearte single-target rules derived from single-target Random Forest trees, for each attribute in the dataset.](./experiments/e2_multi_directional_model_comparison/rule_mining/single_target_tree_based_rule_generation.py) This results in one candidate rule set per attribute.
82+
2. [Fit a single-target IDS model for each attribute in the dataset](./experiments/e2_multi_directional_model_comparison/model_induction/single_target_tree_mids_model_induction.py) This results in one single-target IDS model per attribute.
83+
3. [Merge the single-target IDS models into one ensemble (eIDS) model, and evaluate it on the test data.](./experiments/e2_multi_directional_model_comparison/eids_model_merging/single_target_tree_mids_model_merging.py)
84+
85+
86+
## Dependencies
87+
88+
Depending on what you will use, you need to install some of the following packages. Note: we assume you have a recent Python 3 distribution installed (we used Python 3.8). Our installation instructions assume the use of a Unix shell.
89+
90+
* [*submodmax*](https://github.com/joschout/SubmodularMaximization), for unconstrained submodular maximization of the (M)IDS objective functions. This package is required for our versions of single-target IDS and Multi-directional IDS, as it contains the algorithms used for finding a locally optimal rule set. You can install it as follows:
91+
```shell
92+
git clone https://github.com/joschout/SubmodularMaximization.git
93+
cd SubmodularMaximization/
94+
python setup.py install develop --user
95+
```
96+
* [PyFIM](https://borgelt.net/pyfim.html), by Christian Borgelt. This package is used for frequent itemset mining and (single-target) association rule mining, and is a dependency for pyARC. We downloaded the precompiled version and added it to our conda environment. This package is necessary wherever `import fim` is used.
97+
* [pyARC](https://github.com/jirifilip/pyARC), by Jiří Filip. This package provides a Python implementation of the *Classification Based on Association Rules (CBA)* algorithn, which is one of the oldest *associative classifiers*. This package is a requirement for pyIDS. We make use of some of its data structures, and base some code snippets on theirs. *Note: there seems to be an error in the pyARC pip package, making the `QuantitativeDataFrame` class unavailable. Thus, we recommend installing it directly from the repository.*
98+
```shell
99+
git clone https://github.com/jirifilip/pyARC.git
100+
cd pyARC/
101+
python setup.py install develop --user
102+
```
103+
* [pyIDS](https://github.com/jirifilip/pyIDS), by Jiří Filip and Tomas Kliegr. This package provides a great reimplementation of *Interpretable Decision Sets (IDS)*. We include a reworked IDS implementation in this repository, based on and using classes from pyIDS. To install pyIDS, run:
104+
```shell
105+
git clone https://github.com/jirifilip/pyIDS.git
106+
cd pyIDS
107+
```
108+
Next, copy our the `install_utls/pyIDS/setup.py` to the `pyIDS` directory and run:
109+
```shell
110+
python setup.py install develop --user
111+
```
112+
* *MLxtend* is a Python library with an implementation of *FP-growth* that allows to extract *single-target class association rules*. Most association rule mining implementations only allow to mine single-target rules for a given target attribute, out of efficiency considerations. We forked MLxtend and modified it to also generate *multi-target* association rules. [Our fork can be found here](https://github.com/joschout/mlxtend/), while [the regular source code can be found here](https://github.com/rasbt/mlxtend). To install our fork, run:
113+
```shell
114+
git clone https://github.com/joschout/mlxtend.git
115+
cd mlxtend/
116+
python setup.py install develop --user
117+
```
118+
119+
* *gzip* and *jsonpickle* ([code](https://github.com/jsonpickle/jsonpickle), [docs](https://jsonpickle.readthedocs.io/en/latest/)) are used to save learned rule sets to disk.
120+
* [tabulate](https://github.com/astanin/python-tabulate) is used to pretty-print tabular data, such as the different subfunction values of the (M)IDS objective function.
121+
* [Apyori](https://github.com/ymoch/apyori), by Yu Mochizuki. Apriori implementation completely in Python.
122+
* STAC: Statistical Tests for Algorithms Comparisons. ([Website](https://tec.citius.usc.es/stac/), [code](https://gitlab.citius.usc.es/ismael.rodriguez/stac/), [doc](https://tec.citius.usc.es/stac/doc/index.html), [paper PDF](http://persoal.citius.usc.es/manuel.mucientes/pubs/Rodriguez-Fdez15_fuzz-ieee-stac.pdf))
123+
* graphviz, for visualizing decision trees during decision-tree-to-rule conversion.
124+
* [bidict](https://github.com/jab/bidict), used to encode the training data during association rule minin. This way, large strings don't have to be used as data. `pip install bidict`
125+
126+
## References
127+
* Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp 80-86.
128+
* Kliegr, Tomas. Quantitative CBA: Small and Comprehensible Association Rule Classification Models. arXiv preprint arXiv:1711.10166, 2017.
129+
* Jiri Filip, Tomas Kliegr. PyIDS - Python Implementation of Interpretable Decision Sets Algorithm by Lakkaraju et al, 2016. RuleML+RR2019@Rule Challenge 2019. http://ceur-ws.org/Vol-2438/paper8.pdf
130+
* I. Rodríguez-Fdez, A. Canosa, M. Mucientes, A. Bugarín, STAC: a web platform for the comparison of algorithms using statistical tests, in:Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2015.

0 commit comments

Comments
 (0)