Skip to content

Conversation

@gustavocidornelas
Copy link
Contributor

Adds support for the predictions upload via the Python API.

Summary

  • Predictions are uploaded together with a dataset.
  • On the add_dataset and add_dataframe methods, the user can specify a predictions_column_name as an arg.
  • On the actual dataset, the column with the predictions should contain lists with the class probabilities. For instance, for the Churn dataset, it should look like:
    Screen Shot 2023-01-26 at 17 33 08
  • The validations done for predictions are:
    1. Check if predictions_column_name is a column in the dataset;
    2. Check if all values in predictions_column_name are lists;
    3. Check if all lists in predictions_column_name are of the same length;
    4. Check if the length of the lists in the predictions_column_name match exactly the number of classes specified in class_names;
    5. Check if the sum of the values of the individual lists with class probabilities is equal to 1 (I'm allowing an error margin of 10%, so I allow if it's between 0.9 and 1.1).
  • All the 27 states of commit bundle were outlined on this Notion doc. A warning is thrown on the cases where the bundle could be ambiguous.
@gustavocidornelas gustavocidornelas force-pushed the cid/predictions branch 2 times, most recently from 86f8639 to c10b5ac Compare January 30, 2023 11:35
@gustavocidornelas gustavocidornelas merged commit 5f1b3b3 into mvp Jan 30, 2023
@gustavocidornelas gustavocidornelas deleted the cid/predictions branch January 30, 2023 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants