The ML.FEATURE_IMPORTANCE function
This document describes the ML.FEATURE_IMPORTANCE
function, which lets you see the feature importance score. This score indicates how useful or valuable each feature was in the construction of a boosted tree or a random forest model during training. For more information, see the feature_importances
property in the XGBoost library.
Syntax
ML.FEATURE_IMPORTANCE(MODEL `project_id.dataset.model`)
Arguments
ML.FEATURE_IMPORTANCE
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.
Output
ML.FEATURE_IMPORTANCE
returns the following columns:
feature
: aSTRING
value that contains the name of the feature column in the input training data.importance_weight
: aFLOAT64
value that contains the number of times a feature is used to split the data across all trees.importance_gain
: aFLOAT64
value that contains the average gain across all splits the feature is used in.importance_cover
: aFLOAT64
value that contains the average coverage across all splits the feature is used in.
If the TRANSFORM
clause was used in the CREATE MODEL
statement that created the model, ML.FEATURE_IMPORTANCE
returns the information of the pre-transform columns from the query_statement
clause of the CREATE MODEL
statement.
Permissions
You must have the bigquery.models.create
and bigquery.models.getData
Identity and Access Management (IAM) permissions in order to run ML.FEATURE_IMPORTANCE
.
Limitations
ML.FEATURE_IMPORTANCE
is only supported with boosted tree models and random forest models.
Example
This example retrieves feature importance from mymodel
in mydataset
. The dataset is in your default project.
SELECT * FROM ML.FEATURE_IMPORTANCE(MODEL `mydataset.mymodel`)
What's next
- For information about Explainable AI, see BigQuery Explainable AI overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.