optuna.pruners.WilcoxonPruner

class optuna.pruners.WilcoxonPruner(*, p_threshold=0.1, n_startup_steps=0)[source]

Pruner based on the Wilcoxon signed-rank test.

This pruner performs the Wilcoxon signed-rank test between the current trial and the current best trial, and stops whenever the pruner is sure up to a given p-value that the current trial is worse than the best one.

This pruner is effective for optimizing the mean/median of some (costly-to-evaluate) performance scores over a set of problem instances. Example applications include the optimization of:

the mean performance of a heuristic method (simulated annealing, genetic algorithm, SAT solver, etc.) on a set of problem instances,
the k-fold cross-validation score of a machine learning model, and
the accuracy of outputs of a large language model (LLM) on a set of questions.

There can be “easy” or “hard” instances (the pruner handles correspondence of the instances between different trials). In each trial, it is recommended to shuffle the evaluation order, so that the optimization doesn’t overfit to the instances in the beginning.

When you use this pruner, you must call Trial.report(value, step) method for each step (instance id) with the evaluated value. The instance id may not be in ascending order. This is different from other pruners in that the reported value need not converge to the real value. To use pruners such as SuccessiveHalvingPruner in the same setting, you must provide e.g., the historical average of the evaluated values.

See also

Please refer to report().

Example

import optuna
import numpy as np


# We minimize the mean evaluation loss over all the problem instances.
def evaluate(param, instance):
    # A toy loss function for demonstrative purpose.
    return (param - instance) ** 2


problem_instances = np.linspace(-1, 1, 100)


def objective(trial):
    # Sample a parameter.
    param = trial.suggest_float("param", -1, 1)

    # Evaluate performance of the parameter.
    results = []

    # For best results, shuffle the evaluation order in each trial.
    instance_ids = np.random.permutation(len(problem_instances))
    for instance_id in instance_ids:
        loss = evaluate(param, problem_instances[instance_id])
        results.append(loss)

        # Report loss together with the instance id.
        # CAVEAT: You need to pass the same id for the same instance,
        # otherwise WilcoxonPruner cannot correctly pair the losses across trials and
        # the pruning performance will degrade.
        trial.report(loss, instance_id)

        if trial.should_prune():
            # Return the current predicted value instead of raising `TrialPruned`.
            # This is a workaround to tell the Optuna about the evaluation
            # results in pruned trials. (See the note below.)
            return sum(results) / len(results)

    return sum(results) / len(results)


study = optuna.create_study(pruner=optuna.pruners.WilcoxonPruner(p_threshold=0.1))
study.optimize(objective, n_trials=100)

Note

This pruner cannot handle infinity or nan values. Trials containing those values are never pruned.

Note

If should_prune() returns True, you can return an estimation of the final value (e.g., the average of all evaluated values) instead of raise optuna.TrialPruned(). This is a workaround for the problem that currently there is no way to tell Optuna the predicted objective value for trials raising optuna.TrialPruned.

Parameters:

p_threshold (float) –
The p-value threshold for pruning. This value should be between 0 and 1. A trial will be pruned whenever the pruner is sure up to the given p-value that the current trial is worse than the best trial. The larger this value is, the more aggressive pruning will be performed. Defaults to 0.1.

Note

This pruner repeatedly performs statistical tests between the current trial and the current best trial with increasing samples. The false-positive rate of such a sequential test is different from performing the test only once. To get the nominal false-positive rate, please specify the Pocock-corrected p-value.
n_startup_steps (int) – The number of steps before which no trials are pruned. Pruning starts only after you have n_startup_steps steps of available observations for comparison between the current trial and the best trial. Defaults to 0 (pruning kicks in from the very first step).

Note

Added in v3.6.0 as an experimental feature. The interface may change in newer versions without prior notice. See https://github.com/optuna/optuna/releases/tag/v3.6.0.

Methods

prune(study, trial)

Judge whether the trial should be pruned based on the reported values.

prune(study, trial)[source]

Judge whether the trial should be pruned based on the reported values.

Note that this method is not supposed to be called by library users. Instead, optuna.trial.Trial.report() and optuna.trial.Trial.should_prune() provide user interfaces to implement pruning mechanism in an objective function.

Parameters:

study (Study) – Study object of the target study.
trial (FrozenTrial) – FrozenTrial object of the target trial. Take a copy before modifying this object.

Returns:

A boolean value representing whether the trial should be pruned.

Return type:

bool