An automated tool for binary and multi-class classification and hyper-parameter optimization on stationary and streaming type datasets. Trains different architectures for traditional batch-type datasets (KNN, DT, Random Forests, SVM, Bagging, Boosting etc.) and streaming datasets (Hoeffding Tree classifier, SAM-KNN, Adaptive Hoeffding Trees, Adaptive Random Forests, OzaBag, OzaBoost etc.) and generates metric dumps and performance evaluation graphs (ROCs) comparing the best models. Hypothesis Testing using Friedmans Statistics and Nemenyis Post-hoc test is also supported for comparative analysis of algorithms using statistical techniques.
├───configs │ │───model_hparams.py │ ├───data │ |───drug_consumption.data │ ├───dataset │ │───dataset_base.py │ │───feature_select.py │ ├───driver │ |───driver.py │ ├───models │ │───models.py │ ├───output │ ├───run_20221001-124242 │ ... | ... | ... └───utils │───plot_results.py │───scoring.py model_hparams.py: Hyperparameter combinations for each model can be specified here.drug_consumption.data: Stores the dataset (all datasets are stored under data folder.)driver.py: Starting point for execution of the program (default).driver_online.py: Starting point for execution of the program for online models.models.py: Model classes and definitions.plot_results.py: Utility to plot ROC curvesscoring.py: Utility to compute different metrics such as GMean, F-score, AUC etc.dataset.py: Dataset class, used for preparing train test splits and pre-processing data.feature_select.py: Feature Selection algorithms used for feature reduction based on statistical tests.output: Directory where run dumps are generated with evaluation of models and vizualisation of performance through ROC plots and confusion metrics.
- Batch based Models
# Navigate to the root directory >> python ./driver/driver.py - Online Models
# Navigate to the root directory >> python ./driver/driver_online.py