Automated machine learning (AutoML) is getting more and more attention.
This is the first post in a series of blog posts on auto-sklearn similar to my introduction to auto-keras series. This means that we will have a look at the performance of auto-sklearn after an hour of training on several (not so well known) datasets.
Contents
About auto-sklearn
Auto-sklearn is the result of research at the University of Freiburg. It was introduced by Feurer et al. in 2015 at NIPS: Efficient and Robust Automated Machine Learning. Advances in Neural Information Processing Systems 28 (NIPS 2015). online available at https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.
In contrast to Auto-Keras, it does not focus on neural architechture search for deep neural networks but uses Bayesian optimization for hyperparameter tuning for “traditional” machine learning algorithms that are implemented within scikit-learn. Further, it supports XGBoost as well.
In the next weeks we will have a look at its functions increasing the level of details with every blog post.
Installation
The installation of auto-sklearn is a bit tricky. There are no conda packages for it. However, auto-sklearn requires some compilers from the conda environment (depending on the OS version, the swig compiler provided by the distribution may cause some trouble):
(env) $ conda install gxx_linux-64 gcc_linux-64 swig
We can install auto-sklearn now:
(env) $ pip install git+https://github.com/automl/auto-sklearn
The official installation guide is more comprehensive.
Walk-through standard examples
The team behind auto-sklearn provides a variety of examples. Today, we will start with the first two.
Regression example
The regression example is one of the simplest ones. It uses the Boston Housing Prices dataset.
First, we have to load the dataset and do train-test splitting:
import sklearn.datasets
import sklearn.model_selection
X, y = sklearn.datasets.load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
Next, we have to pre-define if features are numerical or categorical:
feature_types = (['numerical'] * 3) + ['categorical'] + (['numerical'] * 9)
The pipeline decides how to handle categorical features.
Next, we have to initialize the regressor and run the process:
import sklearn.metrics
import autosklearn.regression
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=3600,
per_run_time_limit=300,
tmp_folder='/tmp/autosklearn_regression_example_tmp',
output_folder='/tmp/autosklearn_regression_example_out',
)
automl.fit(X_train, y_train,
dataset_name='boston',
feat_type=feature_types)
print(automl.show_models())
predictions = automl.predict(X_test)
print("R2 score:", sklearn.metrics.r2_score(y_test, predictions))
I tried to run it a few times in a jupyter notebook whiles experimenting a bit. First, after interrupting without a kernel restart it complained that the files exists already. This means that it is probably safe of overwriting results by accident.
FileExistsError: [Errno 17] File exists: '/tmp/autosklearn_regression_example_tmp'
However, after deleting the folders this came up:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/autosklearn_regression_example_tmp/smac3-output/run_1/stats.json'
The second problem can be solved by restarting the ipython kernel.
The full example code looks as follows:
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.regression
def main():
X, y = sklearn.datasets.load_boston(return_X_y=True)
feature_types = (['numerical'] * 3) + ['categorical'] + (['numerical'] * 9)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=3600,
per_run_time_limit=300,
tmp_folder='/tmp/autosklearn_regression_example_tmp',
output_folder='/tmp/autosklearn_regression_example_out',
)
automl.fit(X_train, y_train, dataset_name='boston',
feat_type=feature_types)
print(automl.show_models())
predictions = automl.predict(X_test)
print("R2 score:", sklearn.metrics.r2_score(y_test, predictions))
if __name__ == '__main__':
main()
After 60 minutes it outputs:
[(0.540000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'feature_agglomeration', 'regressor:__choice__': 'adaboost', 'rescaling:__choice__': 'quantile_transformer', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:feature_agglomeration:affinity': 'manhattan', 'preprocessor:feature_agglomeration:linkage': 'average', 'preprocessor:feature_agglomeration:n_clusters': 257, 'preprocessor:feature_agglomeration:pooling_func': 'median', 'regressor:adaboost:learning_rate': 0.13904708846178393, 'regressor:adaboost:loss': 'linear', 'regressor:adaboost:max_depth': 8, 'regressor:adaboost:n_estimators': 464, 'rescaling:quantile_transformer:n_quantiles': 1819, 'rescaling:quantile_transformer:output_distribution': 'uniform', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.1026144927608074},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.240000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'no_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'kernel_pca', 'regressor:__choice__': 'ridge_regression', 'rescaling:__choice__': 'minmax', 'preprocessor:kernel_pca:kernel': 'rbf', 'preprocessor:kernel_pca:n_components': 1927, 'regressor:ridge_regression:alpha': 0.00040213071820876666, 'regressor:ridge_regression:fit_intercept': 'True', 'regressor:ridge_regression:tol': 0.00014526904072022776, 'preprocessor:kernel_pca:gamma': 0.026606124688788306},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.140000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'no_encoding', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'random_trees_embedding', 'regressor:__choice__': 'sgd', 'rescaling:__choice__': 'standardize', 'preprocessor:random_trees_embedding:bootstrap': 'False', 'preprocessor:random_trees_embedding:max_depth': 8, 'preprocessor:random_trees_embedding:max_leaf_nodes': 'None', 'preprocessor:random_trees_embedding:min_samples_leaf': 1, 'preprocessor:random_trees_embedding:min_samples_split': 12, 'preprocessor:random_trees_embedding:min_weight_fraction_leaf': 1.0, 'preprocessor:random_trees_embedding:n_estimators': 88, 'regressor:sgd:alpha': 0.03819847836558832, 'regressor:sgd:average': 'False', 'regressor:sgd:fit_intercept': 'True', 'regressor:sgd:learning_rate': 'invscaling', 'regressor:sgd:loss': 'squared_epsilon_insensitive', 'regressor:sgd:penalty': 'l2', 'regressor:sgd:tol': 4.423214145587373e-05, 'regressor:sgd:epsilon': 1.166808740803192e-05, 'regressor:sgd:eta0': 0.050920044977088394, 'regressor:sgd:power_t': 0.5334091888441694},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
(0.080000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'no_encoding', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'no_preprocessing', 'regressor:__choice__': 'libsvm_svr', 'rescaling:__choice__': 'standardize', 'regressor:libsvm_svr:C': 24.135539315821017, 'regressor:libsvm_svr:epsilon': 0.0016437064038004703, 'regressor:libsvm_svr:kernel': 'poly', 'regressor:libsvm_svr:max_iter': -1, 'regressor:libsvm_svr:shrinking': 'False', 'regressor:libsvm_svr:tol': 0.024988692113831024, 'regressor:libsvm_svr:coef0': 0.6826167099946432, 'regressor:libsvm_svr:degree': 4, 'regressor:libsvm_svr:gamma': 0.018698202462750394},
dataset_properties={
'task': 4,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'regression',
'signed': False})),
]
R2 score: 0.8952946820454217
Classification example
The classification example is simple as well. The classification is done on this dataset for handwritten digits using a stylus. In this case we do not have to predefine feature types.
In this example the classifier will sequentially fit classifiers and ensemble a final model over them.
The code looks as follows:
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
def main():
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=3600,
per_run_time_limit=300,
tmp_folder='/tmp/autosklearn_sequential_example_tmp',
output_folder='/tmp/autosklearn_sequential_example_out',
# Do not construct ensembles in parallel to avoid using more than one
# core at a time. The ensemble will be constructed after auto-sklearn
# finished fitting all machine learning models.
ensemble_size=0,
delete_tmp_folder_after_terminate=False,
)
automl.fit(X_train, y_train, dataset_name='digits')
# This call to fit_ensemble uses all models trained in the previous call
# to fit to build an ensemble which can be used with automl.predict()
automl.fit_ensemble(y_train, ensemble_size=50)
print(automl.show_models())
predictions = automl.predict(X_test)
print(automl.sprint_statistics())
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
if __name__ == '__main__':
main()
This outputs a lot more than the regression example:
[(0.220000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'qda', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'select_rates', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'False', 'classifier:qda:reg_param': 0.6396026761675004, 'preprocessor:select_rates:alpha': 0.06544340428506021, 'preprocessor:select_rates:mode': 'fwe', 'preprocessor:select_rates:score_func': 'f_classif'},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.140000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 20.269651182050794, 'classifier:libsvm_svc:gamma': 0.0009326051547535202, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 1.9176814818606268e-05, 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:max_features': 0.4242777824925543, 'preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 8, 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 20, 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.010000000000000004},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.100000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'robust_scaler', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 1724.2556715341675, 'classifier:libsvm_svc:gamma': 0.001353320460697879, 'classifier:libsvm_svc:kernel': 'poly', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 4.1028293009675063e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.25, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.010000000000000004, 'classifier:libsvm_svc:coef0': 0.860892562017646, 'classifier:libsvm_svc:degree': 4},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.100000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'rescaling:__choice__': 'none', 'classifier:libsvm_svc:C': 8.93425908281578, 'classifier:libsvm_svc:gamma': 0.001226763138737657, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 1.0648631563506635e-05, 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'False', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:max_features': 0.4242777824925543, 'preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 8, 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 20, 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'fast_ica', 'rescaling:__choice__': 'robust_scaler', 'classifier:libsvm_svc:C': 282.7778160201695, 'classifier:libsvm_svc:gamma': 0.0006851364763478802, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 4.265967995808305e-05, 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.2954432786946423},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 200.96294573012327, 'classifier:libsvm_svc:gamma': 0.0004789329856033374, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 2.3080402841468966e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.12447215687605924},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 200.96294573012327, 'classifier:libsvm_svc:gamma': 0.0004789329856033374, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 3.6332032397182576e-05, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.002385546176068135},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'fast_ica', 'rescaling:__choice__': 'robust_scaler', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 178.5651027749005, 'classifier:libsvm_svc:gamma': 0.00046917674427437626, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 1.273241991475758e-05, 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.25, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.010000000000000004},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 13105.64747902622, 'classifier:libsvm_svc:gamma': 0.0008571081478684856, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 1.6845667281551194e-05, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.0014560012385566967},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'most_frequent', 'preprocessor:__choice__': 'fast_ica', 'rescaling:__choice__': 'robust_scaler', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 699.0336479631634, 'classifier:libsvm_svc:gamma': 0.0004575509483673049, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 2.8547476038081654e-05, 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.25, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.005289435968696683},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.040000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'robust_scaler', 'classifier:libsvm_svc:C': 23665.229181086608, 'classifier:libsvm_svc:gamma': 0.0006416897235411666, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 3.337713408855307e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.25},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'none', 'classifier:libsvm_svc:C': 13105.64747902622, 'classifier:libsvm_svc:gamma': 0.0006416897235411666, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 3.337713408855307e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False'},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 13105.64747902622, 'classifier:libsvm_svc:gamma': 0.0006416897235411666, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 3.337713408855307e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.13745214536441364},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'pca', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 200.96294573012327, 'classifier:libsvm_svc:gamma': 0.0004789329856033374, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 3.253865181117026e-05, 'preprocessor:pca:keep_variance': 0.9999, 'preprocessor:pca:whiten': 'False', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.002385546176068135},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 29.72525640589007, 'classifier:libsvm_svc:gamma': 0.0005147805968347301, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 1.5008701254838985e-05, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.008172906777681075},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'median', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 891.7242386751988, 'classifier:libsvm_svc:gamma': 0.0005281245945286075, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 3.4729522830318765e-05, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.0028598529720937414},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 29.72525640589007, 'classifier:libsvm_svc:gamma': 0.0008013672456514697, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'False', 'classifier:libsvm_svc:tol': 1.1610167251351588e-05, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.0007008398711053638},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'fast_ica', 'rescaling:__choice__': 'robust_scaler', 'classifier:libsvm_svc:C': 7025.283062123013, 'classifier:libsvm_svc:gamma': 0.0006851364763478802, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 2.2445046956557538e-05, 'preprocessor:fast_ica:algorithm': 'parallel', 'preprocessor:fast_ica:fun': 'logcosh', 'preprocessor:fast_ica:whiten': 'False', 'rescaling:robust_scaler:q_max': 0.75, 'rescaling:robust_scaler:q_min': 0.25},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
(0.020000, SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'categorical_encoding:__choice__': 'one_hot_encoding', 'classifier:__choice__': 'libsvm_svc', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'feature_agglomeration', 'rescaling:__choice__': 'none', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'classifier:libsvm_svc:C': 1.772947230488284, 'classifier:libsvm_svc:gamma': 0.0004789329856033374, 'classifier:libsvm_svc:kernel': 'rbf', 'classifier:libsvm_svc:max_iter': -1, 'classifier:libsvm_svc:shrinking': 'True', 'classifier:libsvm_svc:tol': 6.58869648864534e-05, 'preprocessor:feature_agglomeration:affinity': 'cosine', 'preprocessor:feature_agglomeration:linkage': 'complete', 'preprocessor:feature_agglomeration:n_clusters': 177, 'preprocessor:feature_agglomeration:pooling_func': 'mean', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.002385546176068135},
dataset_properties={
'task': 2,
'sparse': False,
'multilabel': False,
'multiclass': True,
'target_type': 'classification',
'signed': False})),
]
auto-sklearn results:
Dataset name: digits
Metric: accuracy
Best validation score: 0.993258
Number of target algorithm runs: 549
Number of successful target algorithm runs: 539
Number of crashed target algorithm runs: 9
Number of target algorithms that exceeded the time limit: 1
Number of target algorithms that exceeded the memory limit: 0
Accuracy score 0.9911111111111112
The results looks promising. However, a comparison by MLJAR claims that the performance is rather poor. We should be careful with their results since they want to cell their own product. Further, they compared log-loss only and choosing the right metric
Pipeline Explanation
Overview
Auto-sklearn uses 2.5 stage preprocessing pipeline. First, hyperparameters and preprocessing is predetermined by the meta-learning part. The meta-learning results predetermine the search space.
The AutoML pipeline uses this search space iteratively starting with the data processor, followed by the feature preprocessor followed by training a classifier/regressor. The results are evaluated and the hyperparameters are optimized using the Bayesian optimizer. The best models are ensembled (if we want to).
The data preprocessor consists of the following processes:
- Balancing out classifiers on imbalanced datasets via the
class_weight
parameter if supported.
- Dealing with datasets containing missing values
- Applying One-Hot-Encoding to categorical features
- Rescaling for faster convergence
- activates
sklearn.feature_selection.VarianceThreshold
to reduce dimensionality if used.
The metalearning folder
Since the meta-learning code is a complete mess that sometimes seems to end in loops?, I am not going to explain the meta-learning in detail. It seems like many things are broken according to the bug tracker or not even intended to work. Looking at some of metalearning data I doubt that it will help much outside (carefully selected?) toy examples (at least that is what my tests show me).
It seems like all the metalearning handling is done via mismbo.py
. We find the pretrained model configurations and results from with we transfer knowledge (metalearning) to new datasets in the files folder. It seems like the rest of the scripts are part of training models to update the metalearning database. Unfortunately, it seems like only classification is supported.
The metalearning folder has the following structure:
.
├── files
│ ├── accuracy_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── accuracy_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── accuracy_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── accuracy_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── average_precision_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── average_precision_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── average_precision_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── average_precision_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── balanced_accuracy_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── balanced_accuracy_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── balanced_accuracy_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── balanced_accuracy_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_macro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_macro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_macro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_macro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_micro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_micro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_micro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_micro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_weighted_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_weighted_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_weighted_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── f1_weighted_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── log_loss_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── log_loss_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── log_loss_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── log_loss_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── pac_score_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── pac_score_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── pac_score_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── pac_score_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_macro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_macro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_macro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_macro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_micro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_micro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_micro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_micro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_weighted_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_weighted_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_weighted_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── precision_weighted_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_macro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_macro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_macro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_macro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_micro_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_micro_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_micro_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_micro_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_weighted_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_weighted_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_weighted_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── recall_weighted_multiclass.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── roc_auc_binary.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── roc_auc_binary.classification_sparse
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ ├── roc_auc_multiclass.classification_dense
│ │ ├── algorithm_runs.arff
│ │ ├── configurations.csv
│ │ ├── description.txt
│ │ ├── feature_costs.arff
│ │ ├── feature_runstatus.arff
│ │ ├── feature_values.arff
│ │ └── readme.txt
│ └── roc_auc_multiclass.classification_sparse
│ ├── algorithm_runs.arff
│ ├── configurations.csv
│ ├── description.txt
│ ├── feature_costs.arff
│ ├── feature_runstatus.arff
│ ├── feature_values.arff
│ └── readme.txt
├── __init__.py
├── input
│ ├── aslib_simple.py
│ └── __init__.py
├── metafeatures
│ ├── __init__.py
│ ├── metafeature.py
│ ├── metafeatures.py
│ └── plot_metafeatures.py
├── metalearning
│ ├── clustering
│ │ ├── cluster_instances.py
│ │ ├── gmeans.py
│ │ └── __init__.py
│ ├── create_datasets.py
│ ├── __init__.py
│ ├── kNearestDatasets
│ │ ├── __init__.py
│ │ ├── kNDEvaluateSurrogate.py
│ │ ├── kNDFeatureSelection.py
│ │ └── kND.py
│ ├── meta_base.py
│ └── metrics
│ ├── __init__.py
│ ├── misc.py
│ └── result_correlation.py
├── mismbo.py
├── optimizers
│ ├── __init__.py
│ ├── metalearn_optimizer
│ │ ├── __init__.py
│ │ ├── metalearner.py
│ │ ├── metalearn_optimizerDefault.cfg
│ │ ├── metalearn_optimizer_parser.py
│ │ └── metalearn_optimizer.py
│ └── optimizer_base.py
└── utils
├── __init__.py
└── plot_utils.py
Balancing
Balancing out classifiers on imbalanced datasets via the class_weight
parameter if supported. This is used for 'decision_tree', 'liblinear_svc', 'libsvm_svc'
.
If this is not supported, then sample weights are used. This method is used for 'adaboost', 'gradient_boosting', 'random_forest', 'extra_trees', 'sgd', 'passive_aggressive', 'xgradient_boosting'
.
I am not so sure if this function works properly. A previous test on the “seismic bumps” dataset, which is highly imbalanced, showed similar performance to an imbalanced manual approach.
Imputation
If auto-sklearn detections missing values, then it will decide to handle it using sklearn.preprocessing.Imputer
. It seems like auto-sklearn uses strategy='median'
only.
One-Hot-Encoding
Auto-sklearn will decide whether one-hot-encoding is used or not depending on the feature. If we define an input feature as 'numerical'
, then no encoding will happen. If we define it as 'categorical
or do not define it, then auto-sklearn will decide depending on the results from meta-learning. However, it does not work with classes that are defined as strings or encoded binary strings (bug report from 2016).
The other option is that 'no encoding'
is used.
Rescaling
One of the most important ways to preprocess datasets is to rescale it. Auto-sklearn chooses between the following options:
normalize
standardize
abstract_rescaling
dummy option/not implemented?robust scaler
min_max_scaler
Variance Threshold
If variance threshold is used, then features with low variance will be dropped using sklearn.feature_selection.VarianceThreshold
.
Feature preprocessing pipeline
The preprocessing pipeline consists of the following files:
.
├── densifier.py
├── extra_trees_preproc_for_classification.py
├── extra_trees_preproc_for_regression.py
├── fast_ica.py
├── feature_agglomeration.py
├── __init__.py
├── kernel_pca.py
├── kitchen_sinks.py
├── liblinear_svc_preprocessor.py
├── no_preprocessing.py
├── nystroem_sampler.py
├── pca.py
├── polynomial.py
├── random_trees_embedding.py
├── select_percentile_classification.py
├── select_percentile.py
├── select_percentile_regression.py
├── select_rates.py
└── truncatedSVD.py
The class diagram looks as follows:
The preprocessing functions usually call the function from sci-kit learn. The preprocessing functions are chosen based on meta-learning data and the results of the training and cross-validation process. This process of selecting the hyperparameters is bit intransparent since it utilizes meta-learning data from the results of training and testing > 100 datasets. There is also the option to do no feature preprocessing (no_preprocessing.py
).
Some of the hyperparameters are not suitable for all combinations. I am grouping them anyhow to make to more accessible. Further, I am displaying the possible search space and not what is used for what dataset. This would require to analyse the meta-learning data further, which is extremely intransparent as experienced before and hardly leads to better results than with my brute-force approach to feature engineering.
A single hyperparameter is chosen from the set of possible hyperparmeters via one of the following functions:
UniformIntegerHyperparameter()
UniformFloatHyperparameter()
CategoricalHyperparameter()
These functions are sampling from a given set or interval but come with a default value as well. So, they do exactly what their names suggest but come with a default choice.
Matrix decomposition
The first set of possible feature preprocessors of auto-sklearn are preprocessors that do some form of matrix decomposition.
PCA
Auto-sklearn can uses PCA (Principal Component Analysis) for feature reduction. It calls the following function
sklearn.decomposition.PCA
and chooses from a predefined set of hyperparameters:
keep_variance = UniformFloatHyperparameter("keep_variance", 0.5, 0.9999, default_value=0.9999)
whiten = CategoricalHyperparameter("whiten", ["False", "True"], default_value="False")
Kernel PCA
It can use a kernel PCA as well. In this case the Principal Component Analysis is carried out using a non-linear subspace. Again, it uses sci-kit learn’ implementation
sklearn.decomposition.KernelPCA
and chooses hyperparameters for the kernel PCA from:
n_components = UniformIntegerHyperparameter("n_components", 10, 2000, default_value=100)
kernel = CategoricalHyperparameter('kernel', ['poly', 'rbf', 'sigmoid','cosine'], 'rbf')
degree = UniformIntegerHyperparameter('degree', 2, 5, 3)
coef0 = UniformFloatHyperparameter("coef0", -1, 1, default_value=0)
gamma = UniformFloatHyperparameter("gamma", 3.0517578125e-05, 8, log=True, default_value=1.0)
Truncated SVD
Another option is to use the truncated Single Value Decomposition (SVD) which is basically a faster implementation of the standard Single Value Decomposition.
It uses sklearn.decomposition.TruncatedSVD
with the following hyperparameters:
target_dim = UniformIntegerHyperparameter("target_dim", 10, 256, default_value=128)
ICA
The last tool from matrix decomposition is the Independend Component Analysis (ICA). Auto-sklearn uses
sklearn.decomposition.FastICA
with the following hyperparameters:
n_components = UniformIntegerHyperparameter("n_components", 10, 2000, default_value=100)
algorithm = CategoricalHyperparameter('algorithm', ['parallel', 'deflation'], 'parallel')
whiten = CategoricalHyperparameter('whiten', ['False', 'True'], 'False')
fun = CategoricalHyperparameter('fun', ['logcosh', 'exp', 'cube'], 'logcosh')
Univariate feature selection
Auto-sklearn uses almost the entire feature_selection toolbox
of scikit-learn. It basically uses sklearn.feature_selection.GenericUnivariateSelect
and sklearn.feature_selection.SelectPercentile
The preprocessor is selected with the following function
self.preprocessor = sklearn.feature_selection.SelectPercentile(
score_func=self.score_func,
percentile=self.percentile)
and chooses from the following functions for feature selection:
score functions:
sklearn.feature_selection.mutual_info_regression
sklearn.feature_selection.f_regression
sklearn.feature_selection.chi2
sklearn.feature_selection.f_classif
sklearn.feature_selection.mutual_info_classif
hyperparameters:
percentile = UniformFloatHyperparameter(
name="percentile", lower=1, upper=99, default_value=50)
score_func = CategoricalHyperparameter(name="score_func",
choices=["chi2", "f_classif", "mutual_info"],
default_value="chi2"
alpha = UniformFloatHyperparameter(
name="alpha", lower=0.01, upper=0.5, default_value=0.1)
and for regression:
score_func = CategoricalHyperparameter(
name="score_func", choices=["f_regression", "mutual_info"])
Classification-based feature selection
Auto-sklearn may uses a feature selection based on the results of a linear support vector machine classifiction (sklearn.svm.LinearSVC) with the following hyperparameters:
penalty = Constant("penalty", "l1")
loss = CategoricalHyperparameter("loss", ["hinge", "squared_hinge"], default_value="squared_hinge")
dual = Constant("dual", "False")
tol = UniformFloatHyperparameter("tol", 1e-5, 1e-1, default_value=1e-4, log=True)
C = UniformFloatHyperparameter("C", 0.03125, 32768, log=True, default_value=1.0)
fit_intercept = Constant("fit_intercept", "True")
intercept_scaling = Constant("intercept_scaling", 1)
Feature clustering
Next, we have feature clustering aka feature agglomeration. This means that features with a high correlation are combined (e.g. addded). Auto-sklearn uses sklearn.cluster.FeatureAgglomeration
and feeds it with the following hyperparameters:
n_clusters = UniformIntegerHyperparameter("n_clusters", 2, 400, 25)
affinity = CategoricalHyperparameter("affinity", ["euclidean", "manhattan", "cosine"], "euclidean")
linkage = CategoricalHyperparameter("linkage", ["ward", "complete", "average"], "ward")
pooling_func = CategoricalHyperparameter("pooling_func", ["mean", "median", "max"])
Feature embeddings
Feature embeddings are features projected into a non-linear feature space. In this case it is done with random forests. All hyperparmaters are the ones of random_forest and extra_trees, there I will not post them here.
Kernel approximation
Auto-sklearn may uses kernel approximation using sklearn.kernel_approximation.RBFSampler
(random kitchen sinks) or sklearn.kernel_approximation.Nystroem
with the following hyperparameters:
Random kitchen sinks
gamma = UniformFloatHyperparameter("gamma", 3.0517578125e-05, 8, default_value=1.0, log=True)
n_components = UniformIntegerHyperparameter("n_components", 50, 10000, default_value=100, log=True)
Nystroem sampling
kernel = CategoricalHyperparameter('kernel', 'poly', 'rbf', 'sigmoid', 'cosine', 'chi2', 'rbf')
n_components = UniformIntegerHyperparameter("n_components", 50, 10000, default_value=100, log=True)
gamma = UniformFloatHyperparameter("gamma", 3.0517578125e-05, 8, log=True, default_value=0.1)
degree = UniformIntegerHyperparameter('degree', 2, 5, 3)
coef0 = UniformFloatHyperparameter("coef0", -1, 1, default_value=0)
Polynomial feature expansion
Auto-sklearn can create new features by polynomial combination of all features using sklearn.preprocessing.PolynomialFeatures
. However, it is restricted to polynomial of 3 due to computational costs. Aut-sklearn chooses between the following hyperparmeters:
# More than degree 3 is too expensive!
degree = UniformIntegerHyperparameter("degree", 2, 3, 2)
interaction_only = CategoricalHyperparameter("interaction_only", ["False", "True"], "False")
include_bias = CategoricalHyperparameter("include_bias", ["True", "False"], "True")
Sparse representation
It seems like densifying is the function of the “sparse representation”.
def transform(self, X):
from scipy import sparse
if sparse.issparse(X):
return X.todense().getA()
else:
return X