optuna.integration.OptunaSearchCV

class optuna.integration.OptunaSearchCV(estimator, param_distributions, cv=5, enable_pruning=False, error_score=nan, max_iter=1000, n_jobs=1, n_trials=10, random_state=None, refit=True, return_train_score=False, scoring=None, study=None, subsample=1.0, timeout=None, verbose=0)[source]

Hyperparameter search with cross-validation.

Parameters
  • estimator – Object to use to fit the data. This is assumed to implement the scikit-learn estimator interface. Either this needs to provide score, or scoring must be passed.

  • param_distributions – Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the optuna distribution interface.

  • cv

    Cross-validation strategy. Possible inputs for cv are:

    • integer to specify the number of folds in a CV splitter,

    • a CV splitter,

    • an iterable yielding (train, validation) splits as arrays of indices.

    For integer, if estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. otherwise, sklearn.model_selection.KFold is used.

  • enable_pruning – If True, pruning is performed in the case where the underlying estimator supports partial_fit.

  • error_score – Value to assign to the score if an error occurs in fitting. If ‘raise’, the error is raised. If numeric, sklearn.exceptions.FitFailedWarning is raised. This does not affect the refit step, which will always raise the error.

  • max_iter – Maximum number of epochs. This is only used if the underlying estimator supports partial_fit.

  • n_jobs

    Number of threading based parallel jobs. -1 means using the number is set to CPU count.

    Note

    n_jobs allows parallelization using threading and may suffer from Python’s GIL. It is recommended to use process-based parallelization if func is CPU bound.

    Warning

    Deprecated in v2.7.0. This feature will be removed in the future. It is recommended to use process-based parallelization. The removal of this feature is currently scheduled for v4.0.0, but this schedule is subject to change. See https://github.com/optuna/optuna/releases/tag/v2.7.0.

  • n_trials – Number of trials. If None, there is no limitation on the number of trials. If timeout is also set to None, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.

  • random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If numpy.random.RandomState object, this is the random number generator. If None, the global random state from numpy.random is used.

  • refit – If True, refit the estimator with the best found hyperparameters. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly.

  • return_train_score – If True, training scores will be included. Computing training scores is used to get insights on how different hyperparameter settings impact the overfitting/underfitting trade-off. However computing training scores can be computationally expensive and is not strictly required to select the hyperparameters that yield the best generalization performance.

  • scoring – String or callable to evaluate the predictions on the validation data. If None, score on the estimator is used.

  • study – Study corresponds to the optimization task. If None, a new study is created.

  • subsample

    Proportion of samples that are used during hyperparameter search.

    • If int, then draw subsample samples.

    • If float, then draw subsample * X.shape[0] samples.

  • timeout – Time limit in seconds for the search of appropriate models. If None, the study is executed without time limitation. If n_trials is also set to None, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.

  • verbose – Verbosity level. The higher, the more messages.

best_estimator_

Estimator that was chosen by the search. This is present only if refit is set to True.

n_splits_

Number of cross-validation splits.

refit_time_

Time for refitting the best estimator. This is present only if refit is set to True.

sample_indices_

Indices of samples that are used during hyperparameter search.

scorer_

Scorer function.

study_

Actual study.

Examples

import optuna
from sklearn.datasets import load_iris
from sklearn.svm import SVC

clf = SVC(gamma="auto")
param_distributions = {"C": optuna.distributions.LogUniformDistribution(1e-10, 1e10)}
optuna_search = optuna.integration.OptunaSearchCV(clf, param_distributions)
X, y = load_iris(return_X_y=True)
optuna_search.fit(X, y)
y_pred = optuna_search.predict(X)

Note

Added in v0.17.0 as an experimental feature. The interface may change in newer versions without prior notice. See https://github.com/optuna/optuna/releases/tag/v0.17.0.

Methods

fit(X[, y, groups])

Run fit with all sets of parameters.

get_params([deep])

Get parameters for this estimator.

score(X[, y])

Return the score on the given data.

set_params(**params)

Set the parameters of this estimator.

Attributes

best_index_

Index which corresponds to the best candidate parameter setting.

best_params_

Parameters of the best trial in the Study.

best_score_

Mean cross-validated score of the best estimator.

best_trial_

Best trial in the Study.

classes_

Class labels.

decision_function

Call decision_function on the best estimator.

inverse_transform

Call inverse_transform on the best estimator.

n_trials_

Actual number of trials.

predict

Call predict on the best estimator.

predict_log_proba

Call predict_log_proba on the best estimator.

predict_proba

Call predict_proba on the best estimator.

score_samples

Call score_samples on the best estimator.

set_user_attr

Call set_user_attr on the Study.

transform

Call transform on the best estimator.

trials_

All trials in the Study.

trials_dataframe

Call trials_dataframe on the Study.

user_attrs_

User attributes in the Study.

property best_index_

Index which corresponds to the best candidate parameter setting.

property best_params_

Parameters of the best trial in the Study.

property best_score_

Mean cross-validated score of the best estimator.

property best_trial_

Best trial in the Study.

property classes_

Class labels.

property decision_function

Call decision_function on the best estimator.

This is available only if the underlying estimator supports decision_function and refit is set to True.

fit(X, y=None, groups=None, **fit_params)[source]

Run fit with all sets of parameters.

Parameters
  • X (Union[List[List[float]], numpy.ndarray, pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]) – Training data.

  • y (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series, List[List[float]], pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]]) – Target variable.

  • groups (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series]]) – Group labels for the samples used while splitting the dataset into train/validation set.

  • **fit_params – Parameters passed to fit on the estimator.

  • fit_params (Any) –

Returns

Return self.

Return type

self

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

property inverse_transform

Call inverse_transform on the best estimator.

This is available only if the underlying estimator supports inverse_transform and refit is set to True.

property n_trials_

Actual number of trials.

property predict

Call predict on the best estimator.

This is available only if the underlying estimator supports predict and refit is set to True.

property predict_log_proba

Call predict_log_proba on the best estimator.

This is available only if the underlying estimator supports predict_log_proba and refit is set to True.

property predict_proba

Call predict_proba on the best estimator.

This is available only if the underlying estimator supports predict_proba and refit is set to True.

score(X, y=None)[source]

Return the score on the given data.

Parameters
  • X (Union[List[List[float]], numpy.ndarray, pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]) – Data.

  • y (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series, List[List[float]], pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]]) – Target variable.

Returns

Scaler score.

Return type

score

property score_samples

Call score_samples on the best estimator.

This is available only if the underlying estimator supports score_samples and refit is set to True.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

property set_user_attr

Call set_user_attr on the Study.

property transform

Call transform on the best estimator.

This is available only if the underlying estimator supports transform and refit is set to True.

property trials_

All trials in the Study.

property trials_dataframe

Call trials_dataframe on the Study.

property user_attrs_

User attributes in the Study.