A set of machine learing algorithms implemented in Python 3.5. Please also see my related repository for Python Data Science which contains various data science scripts for data analysis and visualisation.
Programs can one of three implementations:
- Algorithm is implemented from scratch in Python *
- Algorithm is implemented using Scikit Learn * *
- Algorithm is implemented both ways * * *
The included programs are:
-
Regression:
- Linear Regression * * *
- Neural Network Regression * *
- Decision Tree Regression * * *
-
Classification:
- Logistic Regression for Classification * * *
- Logistic Regression for Classification with PCA * * *
- Naive Bayes Classification * * *
- Support Vector Machine Classification * *
- Neural Network Classification * *
- Decision Tree Classification * * *
- Random Forest Classification * * *
-
Clustering:
- K-Means Clustering * * *
- K-Nearest-Neighbor * * *
- Mean-Shift Clustering * * *
- K-Mediods Clustering *
- DBSCAN Clustering * * *
In addition the the main algorithm files, we have the following set of helper functions in the "ml_helpers.py" file:
- Train and Test data splitting
- Random shuffling of data
- Compute Euclidean Distance
- Compute Mean and Variance of features
- Normalize data
- Divide dataset based on feature threshold
- Retrieve a random subset of the data with a random subset of the features
- Compute entropy
- Compute Mean Squared Error
- Sigmoid function
- Derivative of the sigmoid function
- Compute the covariance matrix
- Perform PCA dimensionality reduction
- Gaussian function 1D
- Gaussian function 2D
- Python 3.5
- Numpy
- Scipy
- Scikit Learn
- Matplotlib
The above packages can be installed by running the commands listed in the "install.txt" file