Adding support for Baseline Model #58

gbayomi · 2022-09-13T09:04:50Z

I'm adding support for baseline models.

Baselines come from auto-sklearn (https://automl.github.io/auto-sklearn/master/) and data processing comes from Automunge (https://www.automunge.com/).

The baseline.py basically extracts pure sklearn models from auto-sklearn. Then, we use Automunge to create processor object and build a prediction function.

@gustavocidornelas will help me merge this PR so I can go back to building the sales pipeline (:

The logic is relatively simple though!

gustavocidornelas

Overall, it's a cool feature! I can fix the points that I raised, commented here just so I don't forget. Lmk what you think about the train_df, val_df, and df thing, because I think that's an important design decision :)

unboxapi/__init__.py

unboxapi/baseline.py

gustavocidornelas · 2022-09-13T11:55:31Z

unboxapi/models.py

 if model_type is ModelType.custom or model_type is ModelType.rasa:
 return "@artifacts([PickleArtifact('function'), PickleArtifact('kwargs')])"
+ elif model_type is ModelType.sklearn:
+ return f"@artifacts([PickleArtifact('model'), PickleArtifact('function'), PickleArtifact('kwargs')])"


Did you test this for regular sklearn models? The PickeArtifact is more general, so it should work for sklearn models. However, I think it's safer to use BentoML's SklearnModelArtifact for regular sklearn models. You probably made this choice because of the Too many open files error, so I would suggest using PickleArtifact only for the models from the quick baseline and not for all sklearn models we deploy

Talked to @gbayomi about this: "PickleArtifact, according to Bo (from BentoML), is the same as SklearnArtifact, but more reliable, that’s why I made the change"

unboxapi/__init__.py

examples/tabular-classification/sklearn/churn-classifier/churn-classifier-sklearn.ipynb

…e are issues with the dataset

whoseoyster · 2022-09-29T21:03:22Z

openlayer/projects.py

- *args,
- **kwargs,
- ) -> Model:
+ def add_model(self, *args, **kwargs,) -> Model:


this auto-formatting violates black. Can you run black on the openlayer directory and push?

whoseoyster · 2022-09-29T21:04:18Z

setup.cfg

 requests
 tqdm
 marshmallow
+ scikit-learn==0.24.1


this is going to cause problems / warnings if the user is uploading their own scikit-learn models with a different package version

whoseoyster · 2022-09-29T21:06:51Z

openlayer/__init__.py

+ categorical_feature_names=categorical_feature_names,
+ requirements_txt_file="auto-requirements.txt",
+ col_names=col_names,
+ preprocessor=AutoMunge(),


anything that isn't exposed via the api should just be the default value for the arg.

Then you can remove from Automunge import AutoMunge from this file

whoseoyster · 2022-09-29T21:08:02Z

openlayer/baseline.py

+from sklearn.ensemble import VotingClassifier
+from sklearn.preprocessing import LabelEncoder
+
+# ------------------------------- MONKEY PATCH ------------------------------- #


The code for creating a baseline in this manner is now available for anyone copy / use without actually using Openlayer. Are you sure you don't want to run this in the backend?

That's true. Personally, I would prefer running on the backend. cc: @gbayomi

gbayomi requested review from Parthib, gustavocidornelas and whoseoyster September 13, 2022 09:04

gustavocidornelas requested changes Sep 13, 2022

View reviewed changes

gustavocidornelas reviewed Sep 14, 2022

View reviewed changes

examples/tabular-classification/sklearn/churn-classifier/churn-classifier-sklearn.ipynb Outdated Show resolved Hide resolved

gbayomi and others added 5 commits September 28, 2022 12:25

add support for baseline model

da6d7bb

WIP - Update quick baseline according to PR reviews

2c1b3ea

adding back validation df and fixing minor bugs

d306d5e

Add missing docstrings, update API reference, and update changelog

1a60542

Upload dataset before training model auto-sklearn. Fail early if ther…

e597790

…e are issues with the dataset

gustavocidornelas force-pushed the gabe/baseline-model branch from 6759f87 to e597790 Compare September 28, 2022 15:28

Update Unbox -> Openlayer references for the quick baseline

2f84830

whoseoyster requested changes Sep 29, 2022

View reviewed changes

Fix black formatting issues

a06a98a

whoseoyster closed this Feb 18, 2023

whoseoyster deleted the gabe/baseline-model branch February 21, 2023 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding support for Baseline Model #58

Adding support for Baseline Model #58

Uh oh!

gbayomi commented Sep 13, 2022

gustavocidornelas left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gustavocidornelas Sep 13, 2022

gustavocidornelas Sep 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoseoyster Sep 29, 2022

whoseoyster Sep 29, 2022

whoseoyster Sep 29, 2022

whoseoyster Sep 29, 2022

gustavocidornelas Sep 30, 2022

Labels

4 participants

Adding support for Baseline Model #58

Adding support for Baseline Model #58

Uh oh!

Conversation

gbayomi commented Sep 13, 2022

gustavocidornelas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gustavocidornelas Sep 13, 2022

Choose a reason for hiding this comment

gustavocidornelas Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoseoyster Sep 29, 2022

Choose a reason for hiding this comment

whoseoyster Sep 29, 2022

Choose a reason for hiding this comment

whoseoyster Sep 29, 2022

Choose a reason for hiding this comment

whoseoyster Sep 29, 2022

Choose a reason for hiding this comment

gustavocidornelas Sep 30, 2022

Choose a reason for hiding this comment

Labels

4 participants