- Notifications
You must be signed in to change notification settings - Fork 1
Adding support for Baseline Model #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it's a cool feature! I can fix the points that I raised, commented here just so I don't forget. Lmk what you think about the train_df, val_df, and df thing, because I think that's an important design decision :)
| if model_type is ModelType.custom or model_type is ModelType.rasa: | ||
| return "@artifacts([PickleArtifact('function'), PickleArtifact('kwargs')])" | ||
| elif model_type is ModelType.sklearn: | ||
| return f"@artifacts([PickleArtifact('model'), PickleArtifact('function'), PickleArtifact('kwargs')])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this for regular sklearn models? The PickeArtifact is more general, so it should work for sklearn models. However, I think it's safer to use BentoML's SklearnModelArtifact for regular sklearn models. You probably made this choice because of the Too many open files error, so I would suggest using PickleArtifact only for the models from the quick baseline and not for all sklearn models we deploy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to @gbayomi about this: "PickleArtifact, according to Bo (from BentoML), is the same as SklearnArtifact, but more reliable, that’s why I made the change"
examples/tabular-classification/sklearn/churn-classifier/churn-classifier-sklearn.ipynb Outdated Show resolved Hide resolved
…e are issues with the dataset
6759f87 to e597790 Compare openlayer/projects.py Outdated
| *args, | ||
| **kwargs, | ||
| ) -> Model: | ||
| def add_model(self, *args, **kwargs,) -> Model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this auto-formatting violates black. Can you run black on the openlayer directory and push?
| requests | ||
| tqdm | ||
| marshmallow | ||
| scikit-learn==0.24.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to cause problems / warnings if the user is uploading their own scikit-learn models with a different package version
| categorical_feature_names=categorical_feature_names, | ||
| requirements_txt_file="auto-requirements.txt", | ||
| col_names=col_names, | ||
| preprocessor=AutoMunge(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anything that isn't exposed via the api should just be the default value for the arg.
Then you can remove from Automunge import AutoMunge from this file
| from sklearn.ensemble import VotingClassifier | ||
| from sklearn.preprocessing import LabelEncoder | ||
| | ||
| # ------------------------------- MONKEY PATCH ------------------------------- # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for creating a baseline in this manner is now available for anyone copy / use without actually using Openlayer. Are you sure you don't want to run this in the backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Personally, I would prefer running on the backend. cc: @gbayomi
I'm adding support for baseline models.
Baselines come from auto-sklearn (https://automl.github.io/auto-sklearn/master/) and data processing comes from Automunge (https://www.automunge.com/).
The baseline.py basically extracts pure sklearn models from auto-sklearn. Then, we use Automunge to create processor object and build a prediction function.
@gustavocidornelas will help me merge this PR so I can go back to building the sales pipeline (:
The logic is relatively simple though!