Skip to content

Configuration

cgomezse edited this page Feb 11, 2025 · 11 revisions

The MLSToolbox Code Generator uses a defined model that allows to customize/configure the graphical editor and the nodes, representing the steps of the ML pipelines, that are provided by the editor. This model is configured through several JSON files.

Editor configuration

The information required for the nodes is defined in several configuration files in the mls_code_generator_config repository.

Nodes definition

For each node representing a task in the editor, there are a node definition in the file nodes.json.

Let's go through the fields that define a node:

  • node: Defines the display name of the node on the client.
  • info.title: Brief description of the functionality of the node.
  • category: Used for organizing the nodes on the client.
  • params: List of parameters that are used to customize the working of the node.
    • param_label: Name of the param.
    • param_type: Type of the value that goes in the parameter.
      • string
      • number
      • description: string that can not be parametrized
      • list
      • map
      • option: Value must be taken from a provided list of values on the options.json.
      • option_of_options: Similar to option, but the available options are split into categories.
    • optionId (opt): Required to identify what list of options is used in the option and option_of_options to be used.
    • show: Boolean is used to know if it is interesting to show the value of the parameter on the node.
  • inputs: List of the sockets that will be used as inputs.
    • port_label: Label of the socket in the interface.
    • port_value: Name of the argument in the implementation of the node that uses the value of this input.
    • port_type: Type of the socket, defined in the sockets.json. It should refer to the type of data that is moving through the port.
  • outputs: List of the sockets that will be used as outputs.
    • port_label: Label of the socket in the interface.
    • port_value: Name of the argument in the implementation of the node that uses the value of this output.
    • port_type: Type of the socket, defined in the sockets.json. It should refer to the type of data that is moving through the port.
  • dependencies: Dictionary used for the code generator to know what module of mls_lib to import this node from and other needed codes:
    • KEY: Defines the module this node implementation is in.
    • VALUE:
      • origin: Values can be custom or parameter
      • value:
        • In case of origin=custom: The import is going to be done from the value in this field.
        • In case of origin=parameter: The import is going to be done from the value of the parameter specified in this field.
  • origin: Definition used to generate the code and where to get the name of the implementation from.
    • custom: Custom name for the class.
    • OR
    • parameter: The value here is the label of the parameter that has the name of the class as its value.

As an example, here we have the node "Replace Nulls Average" for the "Data Cleaning stage".

Replace Nulls Average task

With the following configuration:

{ "node" : "Replace Nulls Average", "category": "Data Cleaning", "info": { "title": "Replace null values with the average of the column" }, "params": [ {"param_label": "description", "param_type": "description", "show": true}, {"param_label": "column", "param_type": "string", "show": false} ], "inputs": [ {"port_label": "data_in", "port_type": "DataFrame" } ], "outputs": [ {"port_label": "out", "port_type": "DataFrame" } ], "dependencies": { "data_cleaning" : { "origin": "custom", "value": "ReplaceNullAverage" } }, "origin": { "custom": "ReplaceNullAverage" } }

Options

Some of the node configuration is parametrised, i.e., needs to define some options in the attribute params. These options are defined into the file options.json. This file contains two lists options for simple parameters and option_of_options for complex parameters.

Options attibutes are OPTION that defines the name that identifies the option, and VALUE that defines the list of possible values for that option. For options_of_options the attributes are label, value, and items.

The following piece of the configuration file defines:

  • "options": posible values for the socket_type (port_typein nodes.json) andparameter_type(param_type` in nodes.json).
  • "options_of_options": the possible values for the complex parameter model_type(port_type : "Model" in nodes.json)
{ "options" : { "socket_type" : [ "Any", "DataFrame", "Model", "Result", "Object", "Encoder", "Scaler" ], "parameter_type" : [ "Number", "String" ] }, "option_of_options": { "model_type": [ { "label" : "Linear Models", "value": "Linear Models", "items": [ {"label":"Linear Regression", "value" :"LinearRegression"}, {"label":"Logistic Regression", "value" :"LogisticRegression"}, {"label":"Ridge Regression", "value" :"RidgeRegression"}, {"label":"Lasso Regression", "value" :"LassoRegression"}, {"label":"Elastic Net", "value" :"ElasticNet"} ] }, { "label" : "Tree Models", "value": "Tree Models", "items": [ {"label":"Decision Tree", "value":"DecisionTree"}, {"label":"Random Forest", "value":"RandomForest"}, {"label":"Gradient Boosting", "value":"GradientBoosting"}, {"label":"AdaBoost", "value":"AdaBoost"} ] } ] } }

Sockets

Sockets are the elements to define input and outputs for the nodes, there are different types of sockets depending on the type of data they are representing. The configuration of the sockets is defined into the file sockets.json.

For each data type, the following colors need to be configured:

  • background-color and border-color for the colors in the General Editor area.
  • background-color:hover" and border-color:hover" for the colors when the mouse move over it. With the following configuration:

For the "Replace Nulls Average" example, all the sockets (port_type) are "DataFrames", configured as:

"DataFrame" : { "background-color" : "palegreen", "border-color" : "darkolivegreen", "background-color:hover" : "palegreen", "border-color:hover" : "green" }

The values for the options and options_of_options types of parameters can be extended by modifying the options.json file.

Python Code for the Code Generation

For each node in the configuration file, a class is implemented into the mls_lib repository. The class must be located in the folder defined as KEY in the dependencies property and the name of the class must be the value for the property value for that KEY

"dependencies": { "data_cleaning" : { "origin": "custom", "value": "ReplaceNullAverage" } } 

The corresponding Python source code is at mls_lib/data_cleaning/replace_null_average.py

Replace Nulls Average task

Clone this wiki locally