optuna.integration.OptunaSearchCV¶

class optuna.integration.OptunaSearchCV(estimator, param_distributions, cv=5, enable_pruning=False, error_score=nan, max_iter=1000, n_jobs=1, n_trials=10, random_state=None, refit=True, return_train_score=False, scoring=None, study=None, subsample=1.0, timeout=None, verbose=0)[source]¶

Hyperparameter search with cross-validation.

Parameters

estimator – Object to use to fit the data. This is assumed to implement the scikit-learn estimator interface. Either this needs to provide score, or scoring must be passed.
param_distributions – Dictionary where keys are parameters and values are distributions. Distributions are assumed to implement the optuna distribution interface.
cv –
Cross-validation strategy. Possible inputs for cv are:
- integer to specify the number of folds in a CV splitter,
- a CV splitter,
- an iterable yielding (train, validation) splits as arrays of indices.
For integer, if estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. otherwise, sklearn.model_selection.KFold is used.
enable_pruning – If True, pruning is performed in the case where the underlying estimator supports partial_fit.
error_score – Value to assign to the score if an error occurs in fitting. If ‘raise’, the error is raised. If numeric, sklearn.exceptions.FitFailedWarning is raised. This does not affect the refit step, which will always raise the error.
max_iter – Maximum number of epochs. This is only used if the underlying estimator supports partial_fit.
n_jobs –
Number of threading based parallel jobs. -1 means using the number is set to CPU count.

Note

n_jobs allows parallelization using threading and may suffer from Python’s GIL. It is recommended to use process-based parallelization if func is CPU bound.

Warning

Deprecated in v2.7.0. This feature will be removed in the future. It is recommended to use process-based parallelization. The removal of this feature is currently scheduled for v4.0.0, but this schedule is subject to change. See https://github.com/optuna/optuna/releases/tag/v2.7.0.
n_trials – Number of trials. If None, there is no limitation on the number of trials. If timeout is also set to None, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.
random_state – Seed of the pseudo random number generator. If int, this is the seed used by the random number generator. If numpy.random.RandomState object, this is the random number generator. If None, the global random state from numpy.random is used.
refit – If True, refit the estimator with the best found hyperparameters. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly.
return_train_score – If True, training scores will be included. Computing training scores is used to get insights on how different hyperparameter settings impact the overfitting/underfitting trade-off. However computing training scores can be computationally expensive and is not strictly required to select the hyperparameters that yield the best generalization performance.
scoring – String or callable to evaluate the predictions on the validation data. If None, score on the estimator is used.
study – Study corresponds to the optimization task. If None, a new study is created.
subsample –
Proportion of samples that are used during hyperparameter search.
- If int, then draw subsample samples.
- If float, then draw subsample * X.shape[0] samples.
timeout – Time limit in seconds for the search of appropriate models. If None, the study is executed without time limitation. If n_trials is also set to None, the study continues to create trials until it receives a termination signal such as Ctrl+C or SIGTERM. This trades off runtime vs quality of the solution.
verbose – Verbosity level. The higher, the more messages.

best_estimator_¶: Estimator that was chosen by the search. This is present only if refit is set to True.

n_splits_¶: Number of cross-validation splits.

refit_time_¶: Time for refitting the best estimator. This is present only if refit is set to True.

sample_indices_¶: Indices of samples that are used during hyperparameter search.

scorer_¶: Scorer function.

study_¶: Actual study.

Examples

import optuna
from sklearn.datasets import load_iris
from sklearn.svm import SVC

clf = SVC(gamma="auto")
param_distributions = {"C": optuna.distributions.LogUniformDistribution(1e-10, 1e10)}
optuna_search = optuna.integration.OptunaSearchCV(clf, param_distributions)
X, y = load_iris(return_X_y=True)
optuna_search.fit(X, y)
y_pred = optuna_search.predict(X)

Note

Added in v0.17.0 as an experimental feature. The interface may change in newer versions without prior notice. See https://github.com/optuna/optuna/releases/tag/v0.17.0.

Methods

`fit`(X[, y, groups])	Run fit with all sets of parameters.
`get_params`([deep])	Get parameters for this estimator.
`score`(X[, y])	Return the score on the given data.
`set_params`(**params)	Set the parameters of this estimator.

Attributes

`best_index_`	Index which corresponds to the best candidate parameter setting.
`best_params_`	Parameters of the best trial in the `Study`.
`best_score_`	Mean cross-validated score of the best estimator.
`best_trial_`	Best trial in the `Study`.
`classes_`	Class labels.
`decision_function`	Call `decision_function` on the best estimator.
`inverse_transform`	Call `inverse_transform` on the best estimator.
`n_trials_`	Actual number of trials.
`predict`	Call `predict` on the best estimator.
`predict_log_proba`	Call `predict_log_proba` on the best estimator.
`predict_proba`	Call `predict_proba` on the best estimator.
`score_samples`	Call `score_samples` on the best estimator.
`set_user_attr`	Call `set_user_attr` on the `Study`.
`transform`	Call `transform` on the best estimator.
`trials_`	All trials in the `Study`.
`trials_dataframe`	Call `trials_dataframe` on the `Study`.
`user_attrs_`	User attributes in the `Study`.

property best_index_¶: Index which corresponds to the best candidate parameter setting.

property best_params_¶: Parameters of the best trial in the Study.

property best_score_¶: Mean cross-validated score of the best estimator.

property best_trial_¶: Best trial in the Study.

property classes_¶: Class labels.

property decision_function¶

Call decision_function on the best estimator.

This is available only if the underlying estimator supports decision_function and refit is set to True.

fit(X, y=None, groups=None, **fit_params)[source]¶

Run fit with all sets of parameters.

Parameters

X (Union[List[List[float]], numpy.ndarray, pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]) – Training data.
y (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series, List[List[float]], pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]]) – Target variable.
groups (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series]]) – Group labels for the samples used while splitting the dataset into train/validation set.
**fit_params – Parameters passed to fit on the estimator.
fit_params (Any) –

Returns

Return self.

Return type

self

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

property inverse_transform¶

Call inverse_transform on the best estimator.

This is available only if the underlying estimator supports inverse_transform and refit is set to True.

property n_trials_¶: Actual number of trials.

property predict¶

Call predict on the best estimator.

This is available only if the underlying estimator supports predict and refit is set to True.

property predict_log_proba¶

Call predict_log_proba on the best estimator.

This is available only if the underlying estimator supports predict_log_proba and refit is set to True.

property predict_proba¶

Call predict_proba on the best estimator.

This is available only if the underlying estimator supports predict_proba and refit is set to True.

score(X, y=None)[source]¶

Return the score on the given data.

Parameters

X (Union[List[List[float]], numpy.ndarray, pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]) – Data.
y (Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series, List[List[float]], pandas.core.frame.DataFrame, scipy.sparse._base.spmatrix]]) – Target variable.

Returns

Scaler score.

Return type

score

property score_samples¶

Call score_samples on the best estimator.

This is available only if the underlying estimator supports score_samples and refit is set to True.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

property set_user_attr¶: Call set_user_attr on the Study.

property transform¶

Call transform on the best estimator.

This is available only if the underlying estimator supports transform and refit is set to True.

property trials_¶: All trials in the Study.

property trials_dataframe¶: Call trials_dataframe on the Study.

property user_attrs_¶: User attributes in the Study.