Efficient Optimization Algorithms

Optuna enables efficient hyperparameter optimization by adopting state-of-the-art algorithms for sampling hyperparameters and pruning efficiently unpromising trials.

Sampling Algorithms

Samplers basically continually narrow down the search space using the records of suggested parameter values and evaluated objective values, leading to an optimal search space which giving off parameters leading to better objective values. More detailed explanation of how samplers suggest parameters is in optuna.samplers.BaseSampler.

Optuna provides the following sampling algorithms:

The default sampler is optuna.samplers.TPESampler.

Switching Samplers

import optuna

By default, Optuna uses TPESampler as follows.

study = optuna.create_study()
print(f"Sampler is {study.sampler.__class__.__name__}")

Out:

Sampler is TPESampler

If you want to use different samplers for example RandomSampler and CmaEsSampler,

study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

Out:

Sampler is RandomSampler
Sampler is CmaEsSampler

Pruning Algorithms

Pruners automatically stop unpromising trials at the early stages of the training (a.k.a., automated early-stopping).

Optuna provides the following pruning algorithms:

We use optuna.pruners.MedianPruner in most examples, though basically it is outperformed by optuna.pruners.SuccessiveHalvingPruner and optuna.pruners.HyperbandPruner as in this benchmark result.

Activating Pruners

To turn on the pruning feature, you need to call report() and should_prune() after each step of the iterative training. report() periodically monitors the intermediate objective values. should_prune() decides termination of the trial that does not meet a predefined condition.

We would recommend using integration modules for major machine learning frameworks. Exclusive list is optuna.integration and usecases are available in optuna/examples.

import logging
import sys

import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection


def objective(trial):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
        iris.data, iris.target, test_size=0.25, random_state=0
    )

    alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Report intermediate objective value.
        intermediate_value = 1.0 - clf.score(valid_x, valid_y)
        trial.report(intermediate_value, step)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.TrialPruned()

    return 1.0 - clf.score(valid_x, valid_y)

Set up the median stopping rule as the pruning condition.

# Add stream handler of stdout to show the messages
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)

Out:

A new study created in memory with name: no-name-ac9fb8f9-9426-450f-b142-a18760b67ef8
Trial 0 finished with value: 0.07894736842105265 and parameters: {'alpha': 4.723850833728441e-05}. Best is trial 0 with value: 0.07894736842105265.
Trial 1 finished with value: 0.3157894736842105 and parameters: {'alpha': 9.165912868474045e-05}. Best is trial 0 with value: 0.07894736842105265.
Trial 2 finished with value: 0.07894736842105265 and parameters: {'alpha': 0.006168487800010215}. Best is trial 0 with value: 0.07894736842105265.
Trial 3 finished with value: 0.21052631578947367 and parameters: {'alpha': 0.0001374616158100757}. Best is trial 0 with value: 0.07894736842105265.
Trial 4 finished with value: 0.368421052631579 and parameters: {'alpha': 1.7093928202691693e-05}. Best is trial 0 with value: 0.07894736842105265.
Trial 5 pruned.
Trial 6 finished with value: 0.02631578947368418 and parameters: {'alpha': 0.01051578671412293}. Best is trial 6 with value: 0.02631578947368418.
Trial 7 pruned.
Trial 8 finished with value: 0.10526315789473684 and parameters: {'alpha': 0.0004955536268239461}. Best is trial 6 with value: 0.02631578947368418.
Trial 9 finished with value: 0.052631578947368474 and parameters: {'alpha': 0.010389831369281085}. Best is trial 6 with value: 0.02631578947368418.
Trial 10 finished with value: 0.3421052631578947 and parameters: {'alpha': 0.09919202768290344}. Best is trial 6 with value: 0.02631578947368418.
Trial 11 finished with value: 0.07894736842105265 and parameters: {'alpha': 0.022239081903621165}. Best is trial 6 with value: 0.02631578947368418.
Trial 12 pruned.
Trial 13 finished with value: 0.13157894736842102 and parameters: {'alpha': 0.006640122447343879}. Best is trial 6 with value: 0.02631578947368418.
Trial 14 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.030559126239526894}. Best is trial 6 with value: 0.02631578947368418.
Trial 15 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.0030848171853979367}. Best is trial 6 with value: 0.02631578947368418.
Trial 16 finished with value: 0.07894736842105265 and parameters: {'alpha': 0.035447166254879245}. Best is trial 6 with value: 0.02631578947368418.
Trial 17 finished with value: 0.07894736842105265 and parameters: {'alpha': 0.0021780277701493947}. Best is trial 6 with value: 0.02631578947368418.
Trial 18 finished with value: 0.23684210526315785 and parameters: {'alpha': 0.012383544060612841}. Best is trial 6 with value: 0.02631578947368418.
Trial 19 finished with value: 0.368421052631579 and parameters: {'alpha': 0.09061832508198175}. Best is trial 6 with value: 0.02631578947368418.

As you can see, several trials were pruned (stopped) before they finished all of the iterations. The format of message is "Trial <Trial Number> pruned.".

Which Sampler and Pruner Should be Used?

From the benchmark results which are available at optuna/optuna - wiki “Benchmarks with Kurobako”, at least for not deep learning tasks, we would say that

However, note that the benchmark is not deep learning. For deep learning tasks, consult the below table. This table is from the Ozaki et al., Hyperparameter Optimization Methods: Overview and Characteristics, in IEICE Trans, Vol.J103-D No.9 pp.615-631, 2020 paper, which is written in Japanese.

Parallel Compute Resource

Categorical/Conditional Hyperparameters

Recommended Algorithms

Limited

No

TPE. GP-EI if search space is low-dimensional and continuous.

Yes

TPE. GP-EI if search space is low-dimensional and continuous

Sufficient

No

CMA-ES, Random Search

Yes

Random Search or Genetic Algorithm

Integration Modules for Pruning

To implement pruning mechanism in much simpler forms, Optuna provides integration modules for the following libraries.

For the complete list of Optuna’s integration modules, see optuna.integration.

For example, XGBoostPruningCallback introduces pruning without directly changing the logic of training iteration. (See also example for the entire script.)

pruning_callback = optuna.integration.XGBoostPruningCallback(trial, 'validation-error')
bst = xgb.train(param, dtrain, evals=[(dvalid, 'validation')], callbacks=[pruning_callback])

Total running time of the script: ( 0 minutes 3.422 seconds)

Gallery generated by Sphinx-Gallery