- Notifications
You must be signed in to change notification settings - Fork 0
Configuration
The MLSToolbox Code Generator uses a defined model that allows to customize/configure the graphical editor and the nodes, representing the steps of the ML pipelines, that are provided by the editor. This model is configured through several JSON files.
The information required for the nodes is defined in several configuration files in the mls_code_generator_config repository.
For each node representing a task in the editor, there are a node definition in the file nodes.json.
Let's go through the fields that define a node:
-
node
: Defines the display name of the node on the client. -
info.title
: Brief description of the functionality of the node. -
category
: Used for organizing the nodes on the client. -
params
: List of parameters that are used to customize the working of the node.-
param_label
: Name of the param. -
param_type
: Type of the value that goes in the parameter.string
number
-
description
: string that can not be parametrized list
map
-
option
: Value must be taken from a provided list of values on theoptions.json
. -
option_of_options
: Similar tooption
, but the available options are split into categories.
-
optionId
(opt): Required to identify what list of options is used in theoption
andoption_of_options
to be used. -
show
: Boolean is used to know if it is interesting to show the value of the parameter on the node.
-
-
inputs
: List of the sockets that will be used as inputs.-
port_label
: Label of the socket in the interface. -
port_value
: Name of the argument in the implementation of the node that uses the value of this input. -
port_type
: Type of the socket, defined in thesockets.json
. It should refer to the type of data that is moving through the port.
-
-
outputs
: List of the sockets that will be used as outputs.-
port_label
: Label of the socket in the interface. -
port_value
: Name of the argument in the implementation of the node that uses the value of this output. -
port_type
: Type of the socket, defined in thesockets.json
. It should refer to the type of data that is moving through the port.
-
-
dependencies
: Dictionary used for the code generator to know what module of mls_lib to import this node from and other needed codes:- KEY: Defines the module this node implementation is in.
- VALUE:
-
origin
: Values can becustom
orparameter
-
value
:- In case of
origin=custom
: The import is going to be done from the value in this field. - In case of
origin=parameter
: The import is going to be done from the value of the parameter specified in this field.
- In case of
-
-
origin
: Definition used to generate the code and where to get the name of the implementation from.-
custom
: Custom name for the class. - OR
-
parameter
: The value here is the label of the parameter that has the name of the class as its value.
-
As an example, here we have the node "Replace Nulls Average" for the "Data Cleaning stage".
With the following configuration:
{ "node" : "Replace Nulls Average", "category": "Data Cleaning", "info": { "title": "Replace null values with the average of the column" }, "params": [ {"param_label": "description", "param_type": "description", "show": true}, {"param_label": "column", "param_type": "string", "show": false} ], "inputs": [ {"port_label": "data_in", "port_type": "DataFrame" } ], "outputs": [ {"port_label": "out", "port_type": "DataFrame" } ], "dependencies": { "data_cleaning" : { "origin": "custom", "value": "ReplaceNullAverage" } }, "origin": { "custom": "ReplaceNullAverage" } }
Some of the node configuration is parametrised, i.e., needs to define some options in the attribute params
. These options are defined into the file options.json. This file contains two lists options
for simple parameters and option_of_options
for complex parameters.
Options attibutes are OPTION that defines the name that identifies the option, and VALUE that defines the list of possible values for that option. For options_of_options the attributes are label
, value
, and items
.
The following piece of the configuration file defines:
- "options": posible values for the
socket_type
(port_typein nodes.json) and
parameter_type(
param_type` in nodes.json). - "options_of_options": the possible values for the complex parameter
model_type
(port_type
: "Model" in nodes.json)
{ "options" : { "socket_type" : [ "Any", "DataFrame", "Model", "Result", "Object", "Encoder", "Scaler" ], "parameter_type" : [ "Number", "String" ] }, "option_of_options": { "model_type": [ { "label" : "Linear Models", "value": "Linear Models", "items": [ {"label":"Linear Regression", "value" :"LinearRegression"}, {"label":"Logistic Regression", "value" :"LogisticRegression"}, {"label":"Ridge Regression", "value" :"RidgeRegression"}, {"label":"Lasso Regression", "value" :"LassoRegression"}, {"label":"Elastic Net", "value" :"ElasticNet"} ] }, { "label" : "Tree Models", "value": "Tree Models", "items": [ {"label":"Decision Tree", "value":"DecisionTree"}, {"label":"Random Forest", "value":"RandomForest"}, {"label":"Gradient Boosting", "value":"GradientBoosting"}, {"label":"AdaBoost", "value":"AdaBoost"} ] } ] } }
Sockets are the elements to define input and outputs for the nodes, there are different types of sockets depending on the type of data they are representing. The configuration of the sockets is defined into the file sockets.json.
For each data type, the following colors need to be configured:
-
background-color
andborder-color
for the colors in the General Editor area. -
background-color:hover" and
border-color:hover" for the colors when the mouse move over it. With the following configuration:
For the "Replace Nulls Average" example, all the sockets (port_type
) are "DataFrames", configured as:
"DataFrame" : { "background-color" : "palegreen", "border-color" : "darkolivegreen", "background-color:hover" : "palegreen", "border-color:hover" : "green" }
The values for the options
and options_of_options
types of parameters can be extended by modifying the options.json
file.
For each node in the configuration file, a class is implemented into the mls_lib repository. The class must be located in the folder defined as KEY
in the dependencies
property and the name of the class must be the value for the property value
for that KEY
"dependencies": { "data_cleaning" : { "origin": "custom", "value": "ReplaceNullAverage" } }
The corresponding Python source code is at mls_lib/data_cleaning/replace_null_average.py
- Home
- How to install
- How to use
- How to configure and extend
- Demos
-
- MLSToolbox related Wikis