The goal of this project is to practise in supervised learning using provided data. We need to create the model for the prediction/classification. Each group will need to research and implement the defined supervised machine learning methods.
-
Please perform EDA and data cleaning.
-
Please conduct EDA and descriptive analytics
-
As soon as your dataset is ready, please start the research about your models. Please note, that each group member should research at least 1 model.
- Logistic regression
- NuSVC
- BernoulliNB
- AdaBoostClassifier
- Linear Discriminant Analysis
-
Feature selection (if needed)
-
Implement your models on your data
-
Do not forget about Hyperparameters tuning
-
Implement AutoML (TPOT)
-
Compare the results using metrics:
- accuracy
- recall
- precision
- ROC_AUC score
- plot ROC_AUC curve
- Clean, well-commented code
- Clean data with EDA
- Clear board in Trello with logged time for each task
- Clear descriprion of each model
- Models implementation and comparison
- '1. data.csv with clean and encoded data
- '2. project7.ipynb' with all code concerning data cleaning and modelling
- '3. Slides/dashboard/notebook with must-have EDA, each model description (how it works, what the parameters are, what exectly you used) and results (for each model and final table with models comparison).
- '4. Please state the conclusion about usability of each model.
- '5. Trello board with logged time.
- Data cleaning 2 hours
- Data preprocessing (features, scaling) 1 hours
- Models investigation 3 hours - this task can be splited
- Models implementation - 2 hours
- Slides/dashboard/notebook - 2 hours
- Finalization and "beautification" (github, etc) - 2 hours Average time per person 4-6 hours
Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.
