optuna.integration.OptunaSearchCV¶

class
optuna.integration.
OptunaSearchCV
(estimator: BaseEstimator, param_distributions: Mapping[str, optuna.distributions.BaseDistribution], cv: Union[BaseCrossValidator, int, None] = 5, enable_pruning: bool = False, error_score: Union[numbers.Number, str] = nan, max_iter: int = 1000, n_jobs: int = 1, n_trials: int = 10, random_state: Union[int, numpy.random.mtrand.RandomState, None] = None, refit: bool = True, return_train_score: bool = False, scoring: Union[Callable[[…], float], str, None] = None, study: Optional[optuna.study.Study] = None, subsample: Union[float, int] = 1.0, timeout: Optional[float] = None, verbose: int = 0)[source]¶ Hyperparameter search with crossvalidation.
 Parameters
estimator – Object to use to fit the data. This is assumed to implement the scikitlearn estimator interface. Either this needs to provide
score
, orscoring
must be passed.param_distributions – Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the optuna distribution interface.
cv –
Crossvalidation strategy. Possible inputs for cv are:
integer to specify the number of folds in a CV splitter,
a CV splitter,
an iterable yielding (train, validation) splits as arrays of indices.
For integer, if
estimator
is a classifier andy
is either binary or multiclass,sklearn.model_selection.StratifiedKFold
is used. otherwise,sklearn.model_selection.KFold
is used.enable_pruning – If
True
, pruning is performed in the case where the underlying estimator supportspartial_fit
.error_score – Value to assign to the score if an error occurs in fitting. If ‘raise’, the error is raised. If numeric,
sklearn.exceptions.FitFailedWarning
is raised. This does not affect the refit step, which will always raise the error.max_iter – Maximum number of epochs. This is only used if the underlying estimator supports
partial_fit
.n_jobs – Number of parallel jobs.
1
means using all processors.n_trials – Number of trials. If
None
, there is no limitation on the number of trials. Iftimeout
is also set toNone
, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If
numpy.random.RandomState
object, this is the random number generator. IfNone
, the global random state fromnumpy.random
is used.refit – If
True
, refit the estimator with the best found hyperparameters. The refitted estimator is made available at thebest_estimator_
attribute and permits usingpredict
directly.return_train_score – If
True
, training scores will be included. Computing training scores is used to get insights on how different hyperparameter settings impact the overfitting/underfitting tradeoff. However computing training scores can be computationally expensive and is not strictly required to select the hyperparameters that yield the best generalization performance.scoring – String or callable to evaluate the predictions on the validation data. If
None
,score
on the estimator is used.study – Study corresponds to the optimization task. If
None
, a new study is created.subsample –
Proportion of samples that are used during hyperparameter search.
If int, then draw
subsample
samples.If float, then draw
subsample
*X.shape[0]
samples.
timeout – Time limit in seconds for the search of appropriate models. If
None
, the study is executed without time limitation. Ifn_trials
is also set toNone
, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.verbose – Verbosity level. The higher, the more messages.

best_estimator\_
Estimator that was chosen by the search. This is present only if
refit
is set toTrue
.

n_splits\_
Number of crossvalidation splits.

refit_time\_
Time for refitting the best estimator. This is present only if
refit
is set toTrue
.

sample_indices\_
Indices of samples that are used during hyperparameter search.

scorer\_
Scorer function.

study\_
Actual study.
Examples
import optuna from sklearn.datasets import load_iris from sklearn.svm import SVC clf = SVC(gamma='auto') param_distributions = { 'C': optuna.distributions.LogUniformDistribution(1e10, 1e+10) } optuna_search = optuna.integration.OptunaSearchCV( clf, param_distributions ) X, y = load_iris(return_X_y=True) optuna_search.fit(X, y) y_pred = optuna_search.predict(X)
Note
Added in v0.17.0 as an experimental feature. The interface may change in newer versions without prior notice. See https://github.com/optuna/optuna/releases/tag/v0.17.0.

__init__
(estimator: BaseEstimator, param_distributions: Mapping[str, optuna.distributions.BaseDistribution], cv: Union[BaseCrossValidator, int, None] = 5, enable_pruning: bool = False, error_score: Union[numbers.Number, str] = nan, max_iter: int = 1000, n_jobs: int = 1, n_trials: int = 10, random_state: Union[int, numpy.random.mtrand.RandomState, None] = None, refit: bool = True, return_train_score: bool = False, scoring: Union[Callable[[…], float], str, None] = None, study: Optional[optuna.study.Study] = None, subsample: Union[float, int] = 1.0, timeout: Optional[float] = None, verbose: int = 0) → None[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(estimator, param_distributions[, …])Initialize self.
fit
(X[, y, groups])Run fit with all sets of parameters.
score
(X[, y])Return the score on the given data.
Attributes
Index which corresponds to the best candidate parameter setting.
Parameters of the best trial in the
Study
.Mean crossvalidated score of the best estimator.
Best trial in the
Study
.Class labels.
Call
decision_function
on the best estimator.Call
inverse_transform
on the best estimator.Actual number of trials.
Call
predict
on the best estimator.Call
predict_log_proba
on the best estimator.Call
predict_proba
on the best estimator.Call
score_samples
on the best estimator.Call
set_user_attr
on theStudy
.Call
transform
on the best estimator.All trials in the
Study
.Call
trials_dataframe
on theStudy
.User attributes in the
Study
.
property
best_index_
¶ Index which corresponds to the best candidate parameter setting.

property
best_score_
¶ Mean crossvalidated score of the best estimator.

property
classes_
¶ Class labels.

property
decision_function
¶ Call
decision_function
on the best estimator.This is available only if the underlying estimator supports
decision_function
andrefit
is set toTrue
.

fit
(X: Union[List[List[float]], numpy.ndarray, pd.DataFrame, scipy.sparse.base.spmatrix], y: Union[List[float], numpy.ndarray, pd.Series, List[List[float]], pd.DataFrame, scipy.sparse.base.spmatrix, None] = None, groups: Union[List[float], numpy.ndarray, pd.Series, None] = None, **fit_params: Any) → OptunaSearchCV[source]¶ Run fit with all sets of parameters.
 Parameters
X – Training data.
y – Target variable.
groups – Group labels for the samples used while splitting the dataset into train/validation set.
**fit_params – Parameters passed to
fit
on the estimator.
 Returns
Return self.
 Return type
self

property
inverse_transform
¶ Call
inverse_transform
on the best estimator.This is available only if the underlying estimator supports
inverse_transform
andrefit
is set toTrue
.

property
n_trials_
¶ Actual number of trials.

property
predict
¶ Call
predict
on the best estimator.This is available only if the underlying estimator supports
predict
andrefit
is set toTrue
.

property
predict_log_proba
¶ Call
predict_log_proba
on the best estimator.This is available only if the underlying estimator supports
predict_log_proba
andrefit
is set toTrue
.

property
predict_proba
¶ Call
predict_proba
on the best estimator.This is available only if the underlying estimator supports
predict_proba
andrefit
is set toTrue
.

score
(X: Union[List[List[float]], numpy.ndarray, pd.DataFrame, scipy.sparse.base.spmatrix], y: Union[List[float], numpy.ndarray, pd.Series, List[List[float]], pd.DataFrame, scipy.sparse.base.spmatrix, None] = None) → float[source]¶ Return the score on the given data.
 Parameters
X – Data.
y – Target variable.
 Returns
Scaler score.
 Return type
score

property
score_samples
¶ Call
score_samples
on the best estimator.This is available only if the underlying estimator supports
score_samples
andrefit
is set toTrue
.