Skip to content

Conversation

@gbayomi
Copy link
Contributor

@gbayomi gbayomi commented Sep 13, 2022

I'm adding support for baseline models.

Baselines come from auto-sklearn (https://automl.github.io/auto-sklearn/master/) and data processing comes from Automunge (https://www.automunge.com/).

The baseline.py basically extracts pure sklearn models from auto-sklearn. Then, we use Automunge to create processor object and build a prediction function.

@gustavocidornelas will help me merge this PR so I can go back to building the sales pipeline (:

The logic is relatively simple though!

Copy link
Contributor

@gustavocidornelas gustavocidornelas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it's a cool feature! I can fix the points that I raised, commented here just so I don't forget. Lmk what you think about the train_df, val_df, and df thing, because I think that's an important design decision :)

if model_type is ModelType.custom or model_type is ModelType.rasa:
return "@artifacts([PickleArtifact('function'), PickleArtifact('kwargs')])"
elif model_type is ModelType.sklearn:
return f"@artifacts([PickleArtifact('model'), PickleArtifact('function'), PickleArtifact('kwargs')])"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this for regular sklearn models? The PickeArtifact is more general, so it should work for sklearn models. However, I think it's safer to use BentoML's SklearnModelArtifact for regular sklearn models. You probably made this choice because of the Too many open files error, so I would suggest using PickleArtifact only for the models from the quick baseline and not for all sklearn models we deploy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @gbayomi about this: "PickleArtifact, according to Bo (from BentoML), is the same as SklearnArtifact, but more reliable, that’s why I made the change"

*args,
**kwargs,
) -> Model:
def add_model(self, *args, **kwargs,) -> Model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this auto-formatting violates black. Can you run black on the openlayer directory and push?

requests
tqdm
marshmallow
scikit-learn==0.24.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to cause problems / warnings if the user is uploading their own scikit-learn models with a different package version

categorical_feature_names=categorical_feature_names,
requirements_txt_file="auto-requirements.txt",
col_names=col_names,
preprocessor=AutoMunge(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything that isn't exposed via the api should just be the default value for the arg.

Then you can remove from Automunge import AutoMunge from this file

from sklearn.ensemble import VotingClassifier
from sklearn.preprocessing import LabelEncoder

# ------------------------------- MONKEY PATCH ------------------------------- #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for creating a baseline in this manner is now available for anyone copy / use without actually using Openlayer. Are you sure you don't want to run this in the backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Personally, I would prefer running on the backend. cc: @gbayomi

@whoseoyster whoseoyster deleted the gabe/baseline-model branch February 21, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants