The ML.EXPLAIN_FORECAST function
This document describes the ML.EXPLAIN_FORECAST function, which lets you generate forecasts that are based on a trained time series model. It only works on ARIMA_PLUS models with the training option decompose_time_series enabled or on ARIMA_PLUS_XREG models. The ML.EXPLAIN_FORECAST function encompasses the ML.FORECAST function because its output is a superset of the results of ML.FORECAST.
Syntax
 # `ARIMA_PLUS` models: ML.EXPLAIN_FORECAST( MODEL `PROJECT_ID.DATASET.MODEL`, STRUCT( [HORIZON AS horizon] [, CONFIDENCE_LEVEL AS confidence_level])) # `ARIMA_PLUS_XREG` model: ML.EXPLAIN_FORECAST( MODEL `PROJECT_ID.DATASET.MODEL`, STRUCT( [HORIZON AS horizon] [, CONFIDENCE_LEVEL AS confidence_level]), { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }) Arguments
ML.EXPLAIN_FORECAST takes the following arguments:
- PROJECT_ID: the project that contains the resource.
- DATASET: the dataset that contains the resource.
- MODEL: the name of the model.
- TABLE: the name of the input table that contains the data to be evaluated.- If - TABLEis specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.- If there are unused columns from the table, they are ignored. - The - TABLEargument is required for the- ARIMA_PLUS_XREGmodel.
- QUERY_STATEMENT: the GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax for the- QUERY_STATEMENTclause in GoogleSQL, see Query syntax.- If - QUERY_STATEMENTis specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.
- HORIZON: an- INT64value that specifies the number of time points to forecast. The maximum value is the horizon value specified in the- CREATE MODELstatement for the time series model, or- 1000if unspecified. The default value is- 3. When forecasting multiple time series at the same time, this parameter applies to each time series.
- CONFIDENCE_LEVEL: a- FLOAT64value that specifies the percentage of the future values that fall in the prediction interval. The valid input range is- [0, 1). The default value is- 0.95.
Output
The ML.EXPLAIN_FORECAST function returns the following columns:
- time_series_id_color- time_series_id_cols: a value that contains the identifiers of a time series.- time_series_id_colcan be an- INT64or- STRINGvalue.- time_series_id_colscan be an- ARRAY<INT64>or- ARRAY<STRING>value. Only present when forecasting multiple time series at once. The column names and types are inherited from the- TIME_SERIES_ID_COLoption as specified in the- CREATE MODELstatement.
- time_series_timestamp: a- TIMESTAMPvalue that contains the timestamp of the time series. This column has a type of- TIMESTAMPregardless of the type of the input- time_series_timestamp_col. For each time series, the output rows are sorted in chronological order by the- time_series_timestampvalue.
- time_series_type: a- STRINGvalue that contains either- historyor- forecast. The rows that have a value of- historyin this column are used in training, either directly from the training table, or from interpolation using the training data.
- time_series_data: a- FLOAT64value that contains the data of the time series. For rows that have a value of- historyin the- time_series_typecolumn,- time_series_datais either the training data or the interpolated value using the training data. For rows that have a value of- forecastin the- time_series_typecolumn,- time_series_datais the forecast value.
- time_series_adjusted_data: a- FLOAT64value that contains the adjusted data of the time series. For rows that have a value of- historyin the- time_series_typecolumn, this is the value after cleaning spikes and dips, adjusting the step changes, and removing the residuals. It is the aggregation of all the valid components: holiday effect, seasonal components, and trend. For rows that have a value of- forecastin the- time_series_typecolumn, this is the forecast value, which is the same as the value of- time_series_data.
- standard_error: a- FLOAT64value that contains the standard error of the residuals during the ARIMA fitting. The values are the same for all rows that have a value of- historyin the- time_series_typecolumn. For rows that have a value of- forecastin the- time_series_typecolumn, this value increases with time, as the forecast values become less reliable.
- confidence_level: a- FLOAT64value that contains the user-specified confidence level or, if unspecified, the default value. This value is the same for all rows that have a value of- historyin the- time_series_typecolumn. This value is- NULLfor all rows that have a value of- forecastin the- time_series_typecolumn.
- prediction_interval_lower_bound: a- FLOAT64value that contains the lower bound of the prediction result. Only rows that have a value of- forecastin the- time_series_typecolumn have values other than- NULLin this column.
- prediction_interval_upper_bound: a- FLOAT64value that contains the upper bound of the prediction result. Only rows that have a value of- forecastin the- time_series_typecolumn have values other than- NULLin this column.
- trend: a- FLOAT64value that contains the long-term increase or decrease in the time series data.
- seasonal_period_yearly: a- FLOAT64value that contains the time series data value affected by the time of the year. This value is- NULLif no yearly effect is found.
- seasonal_period_quarterly: a- FLOAT64value that contains the time series data value affected by the time of the quarter. This value is- NULLif no quarterly effect is found.
- seasonal_period_monthly: a- FLOAT64value that contains the time series data value affected by the time of the month. This value is- NULLif no monthly effect is found.
- seasonal_period_weekly: a- FLOAT64value that contains the time series data value affected by the time of the week. This value is- NULLif no weekly effect is found.
- seasonal_period_daily: a- FLOAT64value that contains the time series data value affected by the time of the day. This value is- NULLif no daily effect is found.
- holiday_effect: a- FLOAT64value that contains the time series data value affected by different holidays. This is the sum of the maximum positive individual holiday effect and the minimum negative individual holiday effect. This is shown in the following formula, where \(H\) is the overall holiday effect and \(h(i)\) is the individual holiday effect:- \[H=\max\limits_{h(i)>0} h(i) + \min\limits_{h(i)<0} h(i)\] - This value is - NULLif no holiday effect is found.
- spikes_and_dips: a- FLOAT64value that contains the unexpectedly high or low values of the time series. For rows that have a value of- historyin the- time_series_typecolumn, the value is- NULLif no spike or dip is found. For rows that have a value of- forecastin the- time_series_typecolumn, this value is- NULL.
- step_changes: a- FLOAT64value that contains the abrupt or structural change in the distributional properties of the time series. For rows that have a value of- historyin the- time_series_typecolumn, this value is- NULLif no step change is found. For rows that have a value of- forecastin the- time_series_typecolumn, this value is- NULL.
- residual: a- FLOAT64value that contains the difference between the actual time series and the fitted time series after model training. The residual value is only meaningful for historical data. For rows that have a value of- forecastin the- time_series_typecolumn, the- residualvalue is- NULL.
- holiday_effect_holiday_name: a- FLOAT64value that contains the time series data value affected by the holiday that's identified in holiday_name. If no holiday effect is found, this value is- NULL.- There is one - holiday_effect_holiday_namecolumn for each holiday that's included in the model.
- attribution_feature_name: a- FLOAT64value that contains the attribution of each feature to the final forecast. This only applies to- ARIMAX_PLUS_XREGmodels. The value is calculated by multiplying the weight of the feature with the feature value. This is shown in the following formula, where \(\beta_{fn}\) is the weight of feature- fnin the linear regression and \(X_{fn}\) is the numericalized feature value:- \[attribution_{fn}=\beta_{fn} * X_{fn}\] 
Mathematical explanation
The mathematical relationship of the output columns is described in the following sections.
time_series_data
 The time_series_data value is decomposed into several components to get better explainability. For ARIMA_PLUS models, the component list includes the following components for better explainability:
- trend
- seasonal_period_yearly
- seasonal_period_quarterly
- seasonal_period_monthly
- seasonal_period_weekly
- seasonal_period_daily
- holiday_effect
- spikes_and_dips
- step_changes
- residual
For ARIMA_PLUS_XREG models, this list also includes the feature contribution attribution_feature_name. For future data, the spikes_and_dips, step_changes, and residuals values aren't applicable.
The following formulas show what components make up the time_series_data value for historical and forecast data for time series models
- For - ARIMA_PLUSmodels:- Historical data:
 - time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual- Forecast data:
 - time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect
- For - ARIMA_PLUS_XREGmodels:- Historical data:
 - time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual + (attribution_feature_1 + ... + attribution_feature_n)- Forecast data:
 - time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
time_series_adjusted_data
 The time_series_adjusted_data value is the value that remains after cleaning spikes and dips, adjusting the step changes, and removing the residuals. Its formula is the same for both historical and forecast data.
- For - ARIMA_PLUSmodels:- time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect
- For - ARIMA_PLUS_XREGmodels:- time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
holiday_effect
 The holiday_effect_holiday_name value is a subcomponent . The holiday_effect value is the sum of all the holiday_effect_holiday_name values. For example, if you specify holidays xmas and mlk, the formula is holiday_effect = holiday_effect_xmas + holiday_effect_mlk.
ARIMA_PLUS example
 The following example forecasts 30 time points with a confidence level of 0.8:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level))
ARIMA_PLUS_XREG example
 The following example forecasts 30 time points with a confidence level of 0.8 with future features:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level), (SELECT * FROM `mydataset.mytable`))
What's next
- For more information about Explainable AI, see BigQuery Explainable AI overview.
- For more information about supported SQL statements and functions for time series forecasting models, see End-to-end user journeys for time series forecasting models.