optuna.integration.OptunaSearchCV¶
-
class
optuna.integration.
OptunaSearchCV
(estimator, param_distributions, cv=5, enable_pruning=False, error_score=nan, max_iter=1000, n_jobs=1, n_trials=10, random_state=None, refit=True, return_train_score=False, scoring=None, study=None, subsample=1.0, timeout=None, verbose=0)[source]¶ Hyperparameter search with cross-validation.
- Parameters
estimator – Object to use to fit the data. This is assumed to implement the scikit-learn estimator interface. Either this needs to provide
score
, orscoring
must be passed.param_distributions – Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the optuna distribution interface.
cv –
Cross-validation strategy. Possible inputs for cv are:
integer to specify the number of folds in a CV splitter,
a CV splitter,
an iterable yielding (train, validation) splits as arrays of indices.
For integer, if
estimator
is a classifier andy
is either binary or multiclass,sklearn.model_selection.StratifiedKFold
is used. otherwise,sklearn.model_selection.KFold
is used.enable_pruning – If
True
, pruning is performed in the case where the underlying estimator supportspartial_fit
.error_score – Value to assign to the score if an error occurs in fitting. If ‘raise’, the error is raised. If numeric,
sklearn.exceptions.FitFailedWarning
is raised. This does not affect the refit step, which will always raise the error.max_iter – Maximum number of epochs. This is only used if the underlying estimator supports
partial_fit
.n_jobs –
Number of
threading
based parallel jobs.-1
means using the number is set to CPU count.Note
n_jobs
allows parallelization usingthreading
and may suffer from Python’s GIL. It is recommended to use process-based parallelization iffunc
is CPU bound.Warning
Deprecated in v2.7.0. This feature will be removed in the future. It is recommended to use process-based parallelization. The removal of this feature is currently scheduled for v4.0.0, but this schedule is subject to change. See https://github.com/optuna/optuna/releases/tag/v2.7.0.
n_trials – Number of trials. If
None
, there is no limitation on the number of trials. Iftimeout
is also set toNone
, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If
numpy.random.RandomState
object, this is the random number generator. IfNone
, the global random state fromnumpy.random
is used.refit – If
True
, refit the estimator with the best found hyperparameters. The refitted estimator is made available at thebest_estimator_
attribute and permits usingpredict
directly.return_train_score – If
True
, training scores will be included. Computing training scores is used to get insights on how different hyperparameter settings impact the overfitting/underfitting trade-off. However computing training scores can be computationally expensive and is not strictly required to select the hyperparameters that yield the best generalization performance.scoring – String or callable to evaluate the predictions on the validation data. If
None
,score
on the estimator is used.study – Study corresponds to the optimization task. If
None
, a new study is created.subsample –
Proportion of samples that are used during hyperparameter search.
If int, then draw
subsample
samples.If float, then draw
subsample
*X.shape[0]
samples.
timeout – Time limit in seconds for the search of appropriate models. If
None
, the study is executed without time limitation. Ifn_trials
is also set toNone
, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.verbose – Verbosity level. The higher, the more messages.
-
best_estimator_
¶ Estimator that was chosen by the search. This is present only if
refit
is set toTrue
.
-
n_splits_
¶ Number of cross-validation splits.
-
sample_indices_
¶ Indices of samples that are used during hyperparameter search.
-
scorer_
¶ Scorer function.
-
study_
¶ Actual study.
Examples
import optuna from sklearn.datasets import load_iris from sklearn.svm import SVC clf = SVC(gamma="auto") param_distributions = {"C": optuna.distributions.LogUniformDistribution(1e-10, 1e10)} optuna_search = optuna.integration.OptunaSearchCV(clf, param_distributions) X, y = load_iris(return_X_y=True) optuna_search.fit(X, y) y_pred = optuna_search.predict(X)
Note
Added in v0.17.0 as an experimental feature. The interface may change in newer versions without prior notice. See https://github.com/optuna/optuna/releases/tag/v0.17.0.
Methods
fit
(X[, y, groups])Run fit with all sets of parameters.
get_params
([deep])Get parameters for this estimator.
score
(X[, y])Return the score on the given data.
set_params
(**params)Set the parameters of this estimator.
Attributes
Index which corresponds to the best candidate parameter setting.
Parameters of the best trial in the
Study
.Mean cross-validated score of the best estimator.
Best trial in the
Study
.Class labels.
Call
decision_function
on the best estimator.Call
inverse_transform
on the best estimator.Actual number of trials.
Call
predict
on the best estimator.Call
predict_log_proba
on the best estimator.Call
predict_proba
on the best estimator.Call
score_samples
on the best estimator.Call
set_user_attr
on theStudy
.Call
transform
on the best estimator.All trials in the
Study
.Call
trials_dataframe
on theStudy
.User attributes in the
Study
.-
property
best_index_
¶ Index which corresponds to the best candidate parameter setting.
-
property
best_score_
¶ Mean cross-validated score of the best estimator.
-
property
classes_
¶ Class labels.
-
property
decision_function
¶ Call
decision_function
on the best estimator.This is available only if the underlying estimator supports
decision_function
andrefit
is set toTrue
.
-
fit
(X, y=None, groups=None, **fit_params)[source]¶ Run fit with all sets of parameters.
- Parameters
X (Union[List[List[float]], numpy.ndarray, pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]) – Training data.
y (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series, List[List[float]], pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]]) – Target variable.
groups (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series]]) – Group labels for the samples used while splitting the dataset into train/validation set.
**fit_params – Parameters passed to
fit
on the estimator.fit_params (Any) –
- Returns
Return self.
- Return type
self
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
property
inverse_transform
¶ Call
inverse_transform
on the best estimator.This is available only if the underlying estimator supports
inverse_transform
andrefit
is set toTrue
.
-
property
n_trials_
¶ Actual number of trials.
-
property
predict
¶ Call
predict
on the best estimator.This is available only if the underlying estimator supports
predict
andrefit
is set toTrue
.
-
property
predict_log_proba
¶ Call
predict_log_proba
on the best estimator.This is available only if the underlying estimator supports
predict_log_proba
andrefit
is set toTrue
.
-
property
predict_proba
¶ Call
predict_proba
on the best estimator.This is available only if the underlying estimator supports
predict_proba
andrefit
is set toTrue
.
-
score
(X, y=None)[source]¶ Return the score on the given data.
- Parameters
- Returns
Scaler score.
- Return type
score
-
property
score_samples
¶ Call
score_samples
on the best estimator.This is available only if the underlying estimator supports
score_samples
andrefit
is set toTrue
.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance