Try in Colab

early_stopping_rounds
? What should the max_depth
of a tree be? Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy very fast. Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and crown a winner. They enable this by automatically searching through combinations of hyperparameter values to find the most optimal values. In this tutorial we’ll see how you can run sophisticated hyperparameter sweeps on XGBoost models in 3 easy steps using W&B. For a teaser, check out the plots below: 
Sweeps: An Overview
Running a hyperparameter sweep with W&B is very easy. There are just 3 simple steps:- Define the sweep: we do this by creating a dictionary-like object that specifies the sweep: which parameters to search through, which search strategy to use, which metric to optimize.
- Initialize the sweep: with one line of code we initialize the sweep and pass in the dictionary of sweep configurations:
sweep_id = wandb.sweep(sweep_config)
- Run the sweep agent: also accomplished with one line of code, we call w
andb.agent()
and pass thesweep_id
along with a function that defines your model architecture and trains it:wandb.agent(sweep_id, function=train)
Resources
1. Define the Sweep
W&B sweeps give you powerful levers to configure your sweeps exactly how you want them, with just a few lines of code. The sweeps config can be defined as a dictionary or a YAML file. Let’s walk through some of them together:- Metric: This is the metric the sweeps are attempting to optimize. Metrics can take a
name
(this metric should be logged by your training script) and agoal
(maximize
orminimize
). - Search Strategy: Specified using the
"method"
key. We support several different search strategies with sweeps. - Grid Search: Iterates over every combination of hyperparameter values.
- Random Search: Iterates over randomly chosen combinations of hyperparameter values.
- Bayesian Search: Creates a probabilistic model that maps hyperparameters to probability of a metric score, and chooses parameters with high probability of improving the metric. The objective of Bayesian optimization is to spend more time in picking the hyperparameter values, but in doing so trying out fewer hyperparameter values.
- Parameters: A dictionary containing the hyperparameter names, and discrete values, a range, or distributions from which to pull their values on each iteration.
2. Initialize the Sweep
Callingwandb.sweep
starts a Sweep Controller — a centralized process that provides settings of the parameters
to any who query it and expects them to return performance on metrics
via wandb
logging. Define your training process
Before we can run the sweep, we need to define a function that creates and trains the model — the function that takes in hyperparameter values and spits out metrics. We’ll also needwandb
to be integrated into our script. There’s three main components: wandb.init()
: Initialize a new W&B Run. Each run is single execution of the training script.run.config
: Save all your hyperparameters in a config object. This lets you use our app to sort and compare your runs by hyperparameter values.run.log()
: Logs metrics and custom objects, such as images, videos, audio files, HTML, plots, or point clouds.
3. Run the Sweep with an agent
Now, we callwandb.agent
to start up our sweep. You can call wandb.agent
on any machine where you’re logged into W&B that has - the
sweep_id
, - the dataset and
train
function
Note: arandom
sweep will by defauly run forever, trying new parameter combinations until the cows come home — or until you turn the sweep off from the app UI. You can prevent this by providing the totalcount
of runs you’d like theagent
to complete.
Visualize your results
Now that your sweep is finished, it’s time to look at the results. W&B will generate a number of useful plots for you automatically.Parallel coordinates plot
This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance. This plot seems to indicate that using a tree as our learner slightly, but not mind-blowingly, outperforms using a simple linear model as our learner.
Hyperparameter importance plot
The hyperparameter importance plot shows which hyperparameter values had the biggest impact on your metrics. We report both the correlation (treating it as a linear predictor) and the feature importance (after training a random forest on your results) so you can see which parameters had the biggest effect and whether that effect was positive or negative. Reading this chart, we see quantitative confirmation of the trend we noticed in the parallel coordinates chart above: the largest impact on validation accuracy came from the choice of learner, and thegblinear
learners were generally worse than gbtree
learners. 