Revisiting Machine Learning Datasets

Today, we will have a look at this dataset on Concrete Slump Test by Yeh (2007) [1] as part of my “Exploring Less Known Datasets for Machine Learning” series. Let us see how state-of-the-art algorithms compare with the results from 2007. (My views on this dataset are entirely based on Yeh (2007)).

Contents

Dataset, baseline results from Yeh (2007) and some domain knowledge
Preprocessing and feature engineering
Applying classical machine learning algorithms
- Basic considerations for regression with multi variable targets
- Results of classical machine learning algorithms
Applying neural networks
Discussion of results

Dataset, baseline results from Yeh (2007) and some domain knowledge

Legal

dataset © Prof. I-Cheng Yeh [1]; published in UCI Machine Learning Repository [2].

The aim of the dataset is to predict concrete compressive strength of high performance concrete (HPC) after 28 days as well as to determine the workability with the measurements of slump and slump flow.

Therefore, our target variables are:

target Y

Concrete compressive strength after 28 days [MPa]
Slump [cm]
Flow [cm]

The original paper covers only the prediction on slump. Yeh (2007) used a dataset with 25 data points less.

To predict our variables, we have these features available:

input X

Cement \(\left[\frac{kg}{m^3}\right]\)
Fly ash \(\left[\frac{kg}{m^3}\right]\)
Slag \(\left[\frac{kg}{m^3}\right]\)
Water \(\left[\frac{kg}{m^3}\right]\)
Superplasticizer \(\left[\frac{kg}{m^3}\right]\)
Fine aggregate \(\left[\frac{kg}{m^3}\right]\)
Coarse aggregate \(\left[\frac{kg}{m^3}\right]\)

Baseline results

The models published reached the following performances

Testing results of a 2nd order regression
- R2: 0.13-0.46 (mean: 0.32)
- RMSE (cm): 10.11 - 22.29 (mean: 15.57)
Testing results of a neural networks
- R2: 0.69 - 0.81 (mean: 0.72)
- RMSE (cm): 7.51 - 9.93 (mean: 8.51)

Let us have a closer look at some of the features:

Fly ash and slag

Both features can be considered as binder together with cement. Both increase strength and durability of concrete. However, the hardening process takes longer and therefore it requires more time to reach full compressive strength.

Superplasticizer

Superplasticizer are used to ensure better flow properties because they minimize particle segregation. Further, they allow to decrease the water-cement ratio which leads to higher compressive strength.

Water

Workability is influenced by the water content. It is obvious that with increasing water content the mixture will behave more an more fluid.

The raw dataset looks like this:


    
      
      No
      Cement
      Slag
      Fly ash
      Water
      SP
      Coarse Aggr.
      Fine Aggr.
      SLUMP(cm)
      FLOW(cm)
      Compressive Strength (28-day)(Mpa)
    
    
      0
      1
      273.0
      82.0
      105.0
      210.0
      9.0
      904.0
      680.0
      23.0
      62.0
      34.99
    
    
      1
      2
      163.0
      149.0
      191.0
      180.0
      12.0
      843.0
      746.0
      0.0
      20.0
      41.14
    
    
      2
      3
      162.0
      148.0
      191.0
      179.0
      16.0
      840.0
      743.0
      1.0
      20.0
      41.81
    
    
      3
      4
      162.0
      148.0
      190.0
      179.0
      19.0
      838.0
      741.0
      3.0
      21.5
      42.08
    
    
      4
      5
      154.0
      112.0
      144.0
      220.0
      10.0
      923.0
      658.0
      20.0
      64.0
      26.82

	No	Cement	Slag	Fly ash	Water	SP	Coarse Aggr.	Fine Aggr.	SLUMP(cm)	FLOW(cm)	Compressive Strength (28-day)(Mpa)
0	1	273.0	82.0	105.0	210.0	9.0	904.0	680.0	23.0	62.0	34.99
1	2	163.0	149.0	191.0	180.0	12.0	843.0	746.0	0.0	20.0	41.14
2	3	162.0	148.0	191.0	179.0	16.0	840.0	743.0	1.0	20.0	41.81
3	4	162.0	148.0	190.0	179.0	19.0	838.0	741.0	3.0	21.5	42.08
4	5	154.0	112.0	144.0	220.0	10.0	923.0	658.0	20.0	64.0	26.82

The boxplots tell us a bit more about basic statistics of each feature:

Preprocessing and feature engineering

A key factor in concrete engineering is the water-cement ratio.
Since we are dealing with workability estimates, the superplasticizer is another key feature.

Yeh (2007) calculated to following ratios but did not use them for the NNs?

Water to cement
Water to binder
Water to solid
Superplasticizer to binder
Fly ash to binder
Slag to binder
Fly ash + slag to binder
Aggregate to binder
Fine aggregate to coarse aggregate

This raises a few questions. Mainly: what exactly is binder?

Water to cement ratio

Water to cement seems to be the simplest:

\[\text{w/c} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right]}\]

Water to binder ratio

Depending on the definition we could count fly ash and slag as binder as well. Usually they are weighted with k-values which we do not have. Hence, we have to try without them:

\[\text{w/b} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right] + \text{Fly ash} \left[\frac{kg}{m^3}\right] + \text{Blast furnance slag} \left[\frac{kg}{m^3}\right]}\]

input_data_wc_ratio = input_data.copy()
input_data_wc_ratio.insert(input_data_wc_ratio.shape[-1]-3 , 'wc_ratio', input_data_wc_ratio['Water']/input_data_wc_ratio['Cement'])
input_data_wc_ratio.drop(['Water', 'Cement'], inplace=True, axis=1)

input_data_wb_ratio = input_data.copy()
input_data_wb_ratio.insert(input_data_wb_ratio.shape[-1]-3, 'wb_ratio',input_data_wb_ratio['Water']/(input_data_wb_ratio['Cement'] +
                                                                input_data_wb_ratio['Fly ash'] +
                                                                input_data_wb_ratio['Slag']))
input_data_wb_ratio.drop(['Water', 'Cement', 'Fly ash', 'Slag'], inplace=True, axis=1)


    
      
      Slag
      Fly ash
      SP
      Coarse Aggr.
      Fine Aggr.
      wc_ratio
      SLUMP(cm)
      FLOW(cm)
      Compressive Strength (28-day)(Mpa)
    
    
      0
      82.0
      105.0
      9.0
      904.0
      680.0
      0.769231
      23.0
      62.0
      34.99
    
    
      1
      149.0
      191.0
      12.0
      843.0
      746.0
      1.104294
      0.0
      20.0
      41.14
    
    
      2
      148.0
      191.0
      16.0
      840.0
      743.0
      1.104938
      1.0
      20.0
      41.81
    
    
      3
      148.0
      190.0
      19.0
      838.0
      741.0
      1.104938
      3.0
      21.5
      42.08
    
    
      4
      112.0
      144.0
      10.0
      923.0
      658.0
      1.428571
      20.0
      64.0
      26.82

	Slag	Fly ash	SP	Coarse Aggr.	Fine Aggr.	wc_ratio	SLUMP(cm)	FLOW(cm)	Compressive Strength (28-day)(Mpa)
0	82.0	105.0	9.0	904.0	680.0	0.769231	23.0	62.0	34.99
1	149.0	191.0	12.0	843.0	746.0	1.104294	0.0	20.0	41.14
2	148.0	191.0	16.0	840.0	743.0	1.104938	1.0	20.0	41.81
3	148.0	190.0	19.0	838.0	741.0	1.104938	3.0	21.5	42.08
4	112.0	144.0	10.0	923.0	658.0	1.428571	20.0	64.0	26.82

And this is how the dataset looks like if we use the water-binder ratio:


    
      
      SP
      Coarse Aggr.
      Fine Aggr.
      wb_ratio
      SLUMP(cm)
      FLOW(cm)
      Compressive Strength (28-day)(Mpa)
    
    
      0
      9.0
      904.0
      680.0
      0.456522
      23.0
      62.0
      34.99
    
    
      1
      12.0
      843.0
      746.0
      0.357853
      0.0
      20.0
      41.14
    
    
      2
      16.0
      840.0
      743.0
      0.357285
      1.0
      20.0
      41.81
    
    
      3
      19.0
      838.0
      741.0
      0.358000
      3.0
      21.5
      42.08
    
    
      4
      10.0
      923.0
      658.0
      0.536585
      20.0
      64.0
      26.82

	SP	Coarse Aggr.	Fine Aggr.	wb_ratio	SLUMP(cm)	FLOW(cm)	Compressive Strength (28-day)(Mpa)
0	9.0	904.0	680.0	0.456522	23.0	62.0	34.99
1	12.0	843.0	746.0	0.357853	0.0	20.0	41.14
2	16.0	840.0	743.0	0.357285	1.0	20.0	41.81
3	19.0	838.0	741.0	0.358000	3.0	21.5	42.08
4	10.0	923.0	658.0	0.536585	20.0	64.0	26.82

Since SLUMP and FLOW both describe the consistency (rheologic properties) of fresh concrete, it is reasonable that they correlate somehow. Mechtcherine and Shyshko (2015) prodive a mathematical description of this process using a discrete element approach [3].
Moreover, we can assume that both are not correlated with compressive strength since that is the result of other processes during hardening (aging) of the concrete.
The following table shows the Pearson correlation coefficients for all 3 target features:

	SLUMP	FLOW	UCS
SLUMP	1.000000	0.906135	-0.223358
FLOW	0.906135	1.000000	-0.124029
UCS	-0.223358	-0.124029	1.000000

Further, we have to rescale our data to make it suitable for most ML algorithms.

Applying classical machine learning algorithms

Basic considerations for regression with multi variable targets

As mentioned in a previous blog post, I have a bit different view on multi variable prediction than many other people.

Further, we are dealing with data that has dimensions in the real world. Therefore, we cannot mix up errors in cm and MPa. Moreover, SLUMP and FLOW both are measured in the units of cm however, the database has completely different dimensions.
Hence, we are more or less forced by physics to build 3 models and not one.

Since the dataset is small, we can start to train our models directly with hyperparameter optimization using grid search:

def train_test_random_forest_regression(X_train, X_test, y_train, y_test,scorer,dataset_id):
    random_forest_regression = RandomForestRegressor(random_state=42)
    grid_parameters_random_forest_regression = {'n_estimators' : [3,5,10,15,18],
                                     'max_depth' : [None, 2,3,5,7,9]}
    start_time = time.time()
    grid_obj = GridSearchCV(random_forest_regression, param_grid=grid_parameters_random_forest_regression, cv=kfold_vs_size, n_jobs=-1, scoring=scorer, verbose=0)
    grid_fit = grid_obj.fit(X_train, y_train)
    training_time = time.time() - start_time
    best_random_forest_regression = grid_fit.best_estimator_
    prediction = best_random_forest_regression.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-(i+1)]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-(i+1)]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    return {'Regression type' : 'Random Forest Regression', 'model' : grid_fit, 'Predictions' : prediction, 'R2' : r2,
            'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale,
            'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale,
            'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : str(dataset_id) + str(-(i+1))}

and we can simply iterate over all three targets:

for dataset in datasets:
    X_train, X_test, y_train, y_test = datasets[dataset]['X_train'], datasets[dataset]['X_test'], datasets[dataset]['y_train'], datasets[dataset]['y_test']
    for i in range(y_test.shape[1]):
        results[counter] = train_test_linear_regression(X_train, X_test, y_train[:,-(i+1)], y_test[:,-(i+1)],scorer,dataset)
        ....
        ....

Results of classical machine learning algorithms

The performance for predicting the compressive strength is similar to the performance of ML algorithms on the concrete compressive strength dataset.


    
      Regression type
      R2 for Compressive strength on scaled data
      R2 for Compressive strength on scaled data with wc ratio
      R2 for Compressive strength on scaled data with wb ratio
    
    
      Linear Regression
      0.915434
      0.870247
      0.575527
    
    
      Decision Tree Regression
      0.659896
      0.406098
      0.222941
    
    
      SVM Regression
      0.932872
      0.936492
      0.557471
    
    
      Random Forest Regression
      0.751478
      0.835204
      0.602958
    
    
      AdaBoost Regression
      0.797401
      0.811232
      0.582100
    
    
      XGBoost Regression
      0.886348
      0.885758
      0.681528
    
    
      
      MAE for Compressive strength on scaled data
      MAE for Compressive strength on scaled data with wc ratio
      MAE for Compressive strength on scaled data with wb ratio
    
    
      Linear Regression
      1.731016
      2.137284
      4.165805
    
    
      Decision Tree Regression
      3.173140
      3.581667
      4.676940
    
    
      SVM Regression
      1.335125
      1.268278
      4.238225
    
    
      Random Forest Regression
      2.602524
      2.248668
      3.879512
    
    
      AdaBoost Regression
      2.525000
      2.441642
      4.144793
    
    
      XGBoost Regression
      1.856098
      1.856570
      3.446844
    
    
      
      RMSE for Compressive strength on scaled data
      RMSE for Compressive strength on scaled data with wc ratio
      RMSE for Compressive strength on scaled data with wb ratio
    
    
      Linear Regression
      2.068523
      2.562253
      4.634345
    
    
      Decision Tree Regression
      4.148294
      5.481768
      6.270327
    
    
      SVM Regression
      1.842961
      1.792570
      4.731885
    
    
      Random Forest Regression
      3.546059
      2.887601
      4.482099
    
    
      AdaBoost Regression
      3.201709
      3.090490
      4.598325
    
    
      XGBoost Regression
      2.398015
      2.404230
      4.014202

Regression type	R2 for Compressive strength on scaled data	R2 for Compressive strength on scaled data with wc ratio	R2 for Compressive strength on scaled data with wb ratio
Linear Regression	0.915434	0.870247	0.575527
Decision Tree Regression	0.659896	0.406098	0.222941
SVM Regression	0.932872	0.936492	0.557471
Random Forest Regression	0.751478	0.835204	0.602958
AdaBoost Regression	0.797401	0.811232	0.582100
XGBoost Regression	0.886348	0.885758	0.681528
	MAE for Compressive strength on scaled data	MAE for Compressive strength on scaled data with wc ratio	MAE for Compressive strength on scaled data with wb ratio
Linear Regression	1.731016	2.137284	4.165805
Decision Tree Regression	3.173140	3.581667	4.676940
SVM Regression	1.335125	1.268278	4.238225
Random Forest Regression	2.602524	2.248668	3.879512
AdaBoost Regression	2.525000	2.441642	4.144793
XGBoost Regression	1.856098	1.856570	3.446844
	RMSE for Compressive strength on scaled data	RMSE for Compressive strength on scaled data with wc ratio	RMSE for Compressive strength on scaled data with wb ratio
Linear Regression	2.068523	2.562253	4.634345
Decision Tree Regression	4.148294	5.481768	6.270327
SVM Regression	1.842961	1.792570	4.731885
Random Forest Regression	3.546059	2.887601	4.482099
AdaBoost Regression	3.201709	3.090490	4.598325
XGBoost Regression	2.398015	2.404230	4.014202

The results on the FLOW variable look less optimistic:


    
      Regression type
      R2 for FLOW on scaled data
      R2 for FLOW on scaled data with wc ratio
      R2 for FLOW on scaled data with wb ratio
    
    
      Linear Regression
      0.386403
      0.332854
      0.146184
    
    
      Decision Tree Regression
      0.176075
      -0.191239
      -0.489613
    
    
      SVM Regression
      0.607207
      0.590282
      0.284920
    
    
      Random Forest Regression
      -0.268938
      -0.478631
      -0.560620
    
    
      AdaBoost Regression
      0.041484
      0.154690
      -0.350432
    
    
      XGBoost Regression
      0.278578
      0.057298
      0.292440
    
    
      
      MAE for FLOW on scaled data
      MAE for FLOW on scaled data with wc ratio
      MAE for FLOW on scaled data with wb ratio
    
    
      Linear Regression
      9.643626
      9.987325
      11.792952
    
    
      Decision Tree Regression
      10.836241
      12.602535
      14.315440
    
    
      SVM Regression
      7.460085
      7.845292
      10.232747
    
    
      Random Forest Regression
      14.319428
      15.263048
      16.267756
    
    
      AdaBoost Regression
      11.576389
      10.678649
      13.111255
    
    
      XGBoost Regression
      10.434241
      11.828690
      10.306791
    
    
      
      RMSE for FLOW on scaled data
      RMSE for FLOW on scaled data with wc ratio
      RMSE for FLOW on scaled data with wb ratio
    
    
      Linear Regression
      11.831774
      12.337268
      13.956955
    
    
      Decision Tree Regression
      13.710464
      16.485715
      18.435082
    
    
      SVM Regression
      9.466519
      9.668319
      12.772792
    
    
      Random Forest Regression
      17.014866
      18.367002
      18.869352
    
    
      AdaBoost Regression
      14.787951
      13.887252
      17.552731
    
    
      XGBoost Regression
      12.829306
      14.665458
      12.705446

Regression type	R2 for FLOW on scaled data	R2 for FLOW on scaled data with wc ratio	R2 for FLOW on scaled data with wb ratio
Linear Regression	0.386403	0.332854	0.146184
Decision Tree Regression	0.176075	-0.191239	-0.489613
SVM Regression	0.607207	0.590282	0.284920
Random Forest Regression	-0.268938	-0.478631	-0.560620
AdaBoost Regression	0.041484	0.154690	-0.350432
XGBoost Regression	0.278578	0.057298	0.292440
	MAE for FLOW on scaled data	MAE for FLOW on scaled data with wc ratio	MAE for FLOW on scaled data with wb ratio
Linear Regression	9.643626	9.987325	11.792952
Decision Tree Regression	10.836241	12.602535	14.315440
SVM Regression	7.460085	7.845292	10.232747
Random Forest Regression	14.319428	15.263048	16.267756
AdaBoost Regression	11.576389	10.678649	13.111255
XGBoost Regression	10.434241	11.828690	10.306791
	RMSE for FLOW on scaled data	RMSE for FLOW on scaled data with wc ratio	RMSE for FLOW on scaled data with wb ratio
Linear Regression	11.831774	12.337268	13.956955
Decision Tree Regression	13.710464	16.485715	18.435082
SVM Regression	9.466519	9.668319	12.772792
Random Forest Regression	17.014866	18.367002	18.869352
AdaBoost Regression	14.787951	13.887252	17.552731
XGBoost Regression	12.829306	14.665458	12.705446

The results on the SLUMP variable look less optimistic as well:


    
      Regression type
      R2 for SLUMP on scaled data
      R2 for SLUMP on scaled data with wc ratio
      R2 for SLUMP on scaled data with wb ratio
    
    
      Linear Regression
      0.256969
      0.231385
      0.129299
    
    
      Decision Tree Regression
      -0.287176
      -0.000154
      -0.301644
    
    
      SVM Regression
      0.519161
      0.495967
      0.252016
    
    
      Random Forest Regression
      -0.200935
      -0.337925
      -0.330451
    
    
      AdaBoost Regression
      -0.353549
      0.035814
      0.105920
    
    
      XGBoost Regression
      0.278448
      -0.025920
      0.570008
    
    
      
      MAE for SLUMP on scaled data
      MAE for SLUMP on scaled data with wc ratio
      MAE for SLUMP on scaled data with wb ratio
    
    
      Linear Regression
      5.411023
      5.442704
      5.639649
    
    
      Decision Tree Regression
      5.777566
      5.519243
      5.831611
    
    
      SVM Regression
      3.986152
      4.076479
      5.383039
    
    
      Random Forest Regression
      6.355653
      6.797921
      6.579892
    
    
      AdaBoost Regression
      6.284446
      5.299650
      5.177292
    
    
      XGBoost Regression
      4.851443
      5.401885
      3.871492
    
    
      
      RMSE for SLUMP on scaled data
      RMSE for SLUMP on scaled data with wc ratio
      RMSE for SLUMP on scaled data with wb ratio
    
    
      Linear Regression
      6.251594
      6.358312
      6.767399
    
    
      Decision Tree Regression
      8.228225
      7.253051
      8.274339
    
    
      SVM Regression
      5.029064
      5.148925
      6.272395
    
    
      Random Forest Regression
      7.947802
      8.388863
      8.365398
    
    
      AdaBoost Regression
      8.437704
      7.121438
      6.857651
    
    
      XGBoost Regression
      6.160571
      7.345883
      4.755730

Regression type	R2 for SLUMP on scaled data	R2 for SLUMP on scaled data with wc ratio	R2 for SLUMP on scaled data with wb ratio
Linear Regression	0.256969	0.231385	0.129299
Decision Tree Regression	-0.287176	-0.000154	-0.301644
SVM Regression	0.519161	0.495967	0.252016
Random Forest Regression	-0.200935	-0.337925	-0.330451
AdaBoost Regression	-0.353549	0.035814	0.105920
XGBoost Regression	0.278448	-0.025920	0.570008
	MAE for SLUMP on scaled data	MAE for SLUMP on scaled data with wc ratio	MAE for SLUMP on scaled data with wb ratio
Linear Regression	5.411023	5.442704	5.639649
Decision Tree Regression	5.777566	5.519243	5.831611
SVM Regression	3.986152	4.076479	5.383039
Random Forest Regression	6.355653	6.797921	6.579892
AdaBoost Regression	6.284446	5.299650	5.177292
XGBoost Regression	4.851443	5.401885	3.871492
	RMSE for SLUMP on scaled data	RMSE for SLUMP on scaled data with wc ratio	RMSE for SLUMP on scaled data with wb ratio
Linear Regression	6.251594	6.358312	6.767399
Decision Tree Regression	8.228225	7.253051	8.274339
SVM Regression	5.029064	5.148925	6.272395
Random Forest Regression	7.947802	8.388863	8.365398
AdaBoost Regression	8.437704	7.121438	6.857651
XGBoost Regression	6.160571	7.345883	4.755730

Applying neural networks

Let us test some neural networks that are close to the one in the original paper.
The paper describes a model that is trained with:

no. hidden layers: {0,1,2}
no. hidden units: {5,7,10,14}
learning rates: {0.1, 0.3, 1.0, 3.0}
momentum: {0.0, 0.25, 0.5, 0.75}
iterations: {500, 1000, 2000, 5000}

I did not test all these combinations. I skipped gradient descent variations and used adam.

However, it is not noted what model architecture of the ones mentioned in the original paper leads to the baseline performance.
Moreover, there is no information available if the input data was normalized/scaled or used raw. Yeh (2007) did not used real testing sets but did cross-validation only and therefore the testing results are biased.

The baseline model is build with the following function:

def build_baseline_model_orig(input_dim, units, layers):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    for layer in range(layers):
        model.add(Dense(units, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

In total 144 models are trained with this basis function (48 per dataset).

The results are not very promising for SLUMP and FLOW. Apperently, the results vary a lot from the original publication.

Next, we variate the base model a bit and use different activation functions:

def build_baseline_model_modified_activation(input_dim):    
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(7, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

def build_baseline_model_modified_activations(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(7, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

Some more deep neural networks to test out some more architectures:

def build_model_1(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

def build_model_2(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(12, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

def build_model_3(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(5, activation='relu'))
    model.add(Dense(12, activation='relu'))
    model.add(Dense(5, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

def build_model_4(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(40, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

All models are trained with a batch size of 16, a validation split of 0.2 and on 1000 epochs.

Again it is not surprising that the prediction of compressive strength yield reasonable results:


    
      Regression type
      R2 for Compressive strength on scaled data
      R2 for Compressive strength on scaled data with wc ratio
      R2 for Compressive strength on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      0.881841
      0.776062
      0.563822
    
    
      Baseline NN with adam and relus
      0.909270
      0.876102
      -0.074257
    
    
      Model 1
      0.891060
      0.733605
      0.500460
    
    
      Model 2
      0.868767
      0.925675
      0.539743
    
    
      Model 3
      0.774544
      0.830290
      0.378531
    
    
      Model 4
      0.615885
      -0.008820
      -0.163422
    
    
      
      MAE for Compressive strength on scaled data
      MAE for Compressive strength on scaled data with wc ratio
      MAE for Compressive strength on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      1.911479
      2.808184
      4.072204
    
    
      Baseline NN with adam and relus
      1.891581
      1.966362
      5.927383
    
    
      Model 1
      2.027864
      2.962191
      4.383823
    
    
      Model 2
      2.219140
      1.481955
      4.251803
    
    
      Model 3
      2.676268
      2.295996
      4.256732
    
    
      Model 4
      3.815574
      4.277224
      6.414563
    
    
      
      RMSE for Compressive strength on scaled data
      RMSE for Compressive strength on scaled data with wc ratio
      RMSE for Compressive strength on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      2.445096
      3.366102
      4.697807
    
    
      Baseline NN with adam and relus
      2.142593
      2.503772
      7.372544
    
    
      Model 1
      2.347781
      3.671354
      5.027460
    
    
      Model 2
      2.576822
      1.939240
      4.825736
    
    
      Model 3
      3.377495
      2.930336
      5.607549
    
    
      Model 4
      4.408530
      7.144473
      7.672412

Regression type	R2 for Compressive strength on scaled data	R2 for Compressive strength on scaled data with wc ratio	R2 for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu	0.881841	0.776062	0.563822
Baseline NN with adam and relus	0.909270	0.876102	-0.074257
Model 1	0.891060	0.733605	0.500460
Model 2	0.868767	0.925675	0.539743
Model 3	0.774544	0.830290	0.378531
Model 4	0.615885	-0.008820	-0.163422
	MAE for Compressive strength on scaled data	MAE for Compressive strength on scaled data with wc ratio	MAE for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu	1.911479	2.808184	4.072204
Baseline NN with adam and relus	1.891581	1.966362	5.927383
Model 1	2.027864	2.962191	4.383823
Model 2	2.219140	1.481955	4.251803
Model 3	2.676268	2.295996	4.256732
Model 4	3.815574	4.277224	6.414563
	RMSE for Compressive strength on scaled data	RMSE for Compressive strength on scaled data with wc ratio	RMSE for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu	2.445096	3.366102	4.697807
Baseline NN with adam and relus	2.142593	2.503772	7.372544
Model 1	2.347781	3.671354	5.027460
Model 2	2.576822	1.939240	4.825736
Model 3	3.377495	2.930336	5.607549
Model 4	4.408530	7.144473	7.672412

Again the predictions for FLOW are questionable:


    
      Regression type
      R2 for FLOW on scaled data
      R2 for FLOW on scaled data with wc ratio
      R2 for FLOW on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      0.217175
      0.083128
      0.105846
    
    
      Baseline NN with adam and relus
      0.633208
      0.029447
      0.058350
    
    
      Model 1
      0.496916
      0.228210
      0.029159
    
    
      Model 2
      0.437770
      0.100727
      0.111989
    
    
      Model 3
      0.051917
      0.557840
      -0.168060
    
    
      Model 4
      -0.293932
      -0.420341
      -0.347595
    
    
      Regression type
      MAE for FLOW on scaled data
      MAE for FLOW on scaled data with wc ratio
      MAE for FLOW on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      8.553436
      9.286669
      9.267833
    
    
      Baseline NN with adam and relus
      5.601089
      9.232718
      9.444457
    
    
      Model 1
      6.468055
      8.454191
      9.748035
    
    
      Model 2
      7.076157
      8.531679
      8.998165
    
    
      Model 3
      8.129520
      6.284620
      9.569247
    
    
      Model 4
      9.932128
      11.630792
      11.138774
    
    
      Regression type
      RMSE for FLOW on scaled data
      RMSE for FLOW on scaled data with wc ratio
      RMSE for FLOW on scaled data with wb ratio
    
     
      Baseline NN with adam and relu
      10.028236
      10.852924
      10.717625
    
    
      Baseline NN with adam and relus
      6.864392
      11.166113
      10.998594
    
    
      Model 1
      8.039198
      9.957310
      11.167768
    
    
      Model 2
      8.498636
      10.748260
      10.680741
    
    
      Model 3
      11.036095
      7.536716
      12.249687
    
    
      Model 4
      12.892823
      13.507923
      13.157461

Regression type	R2 for FLOW on scaled data	R2 for FLOW on scaled data with wc ratio	R2 for FLOW on scaled data with wb ratio
Baseline NN with adam and relu	0.217175	0.083128	0.105846
Baseline NN with adam and relus	0.633208	0.029447	0.058350
Model 1	0.496916	0.228210	0.029159
Model 2	0.437770	0.100727	0.111989
Model 3	0.051917	0.557840	-0.168060
Model 4	-0.293932	-0.420341	-0.347595
Regression type	MAE for FLOW on scaled data	MAE for FLOW on scaled data with wc ratio	MAE for FLOW on scaled data with wb ratio
Baseline NN with adam and relu	8.553436	9.286669	9.267833
Baseline NN with adam and relus	5.601089	9.232718	9.444457
Model 1	6.468055	8.454191	9.748035
Model 2	7.076157	8.531679	8.998165
Model 3	8.129520	6.284620	9.569247
Model 4	9.932128	11.630792	11.138774
Regression type	RMSE for FLOW on scaled data	RMSE for FLOW on scaled data with wc ratio	RMSE for FLOW on scaled data with wb ratio
Baseline NN with adam and relu	10.028236	10.852924	10.717625
Baseline NN with adam and relus	6.864392	11.166113	10.998594
Model 1	8.039198	9.957310	11.167768
Model 2	8.498636	10.748260	10.680741
Model 3	11.036095	7.536716	12.249687
Model 4	12.892823	13.507923	13.157461

It does not look better for SLUMP:


    
      Regression type
      R2 for SLUMP on scaled data
      R2 for SLUMP on scaled data with wc ratio
      R2 for SLUMP on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      0.155706
      0.037265
      0.077118
    
    
      Baseline NN with adam and relus
      0.478263
      0.030323
      -0.116686
    
    
      Model 1
      0.505250
      0.367836
      0.281306
    
    
      Model 2
      -0.136331
      0.600081
      0.259247
    
    
      Model 3
      -0.013574
      0.276476
      0.179358
    
    
      Model 4
      -0.883918
      -1.996697
      -2.179000
    
    
      
      MAE for SLUMP on scaled data
      MAE for SLUMP on scaled data with wc ratio
      MAE for SLUMP on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      11.613087
      12.152679
      12.163982
    
    
      Baseline NN with adam and relus
      8.866301
      11.672534
      13.491269
    
    
      Model 1
      8.626318
      9.675867
      10.362348
    
    
       Model 2
      13.269832
      7.183069
      10.811023
    
    
      Model 3
      10.425955
      10.210033
      9.762248
    
    
      Model 4
      12.573368
      16.200413
      18.518716
    
    
      
      RMSE for SLUMP on scaled data
      RMSE for SLUMP on scaled data with wc ratio
      RMSE for SLUMP on scaled data with wb ratio
    
    
      Baseline NN with adam and relu
      13.449766
      14.362210
      14.061798
    
    
      Baseline NN with adam and relus
      10.572891
      14.413895
      15.467968
    
    
      Model 1
      10.295811
      11.638119
      12.409085
    
    
      Model 2
      15.603433
      9.256648
      12.598081
    
    
      Model 3
      14.736539
      12.450711
      13.260030
    
    
      Model 4
      20.090872
      25.338983
      26.098349

Regression type	R2 for SLUMP on scaled data	R2 for SLUMP on scaled data with wc ratio	R2 for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu	0.155706	0.037265	0.077118
Baseline NN with adam and relus	0.478263	0.030323	-0.116686
Model 1	0.505250	0.367836	0.281306
Model 2	-0.136331	0.600081	0.259247
Model 3	-0.013574	0.276476	0.179358
Model 4	-0.883918	-1.996697	-2.179000
	MAE for SLUMP on scaled data	MAE for SLUMP on scaled data with wc ratio	MAE for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu	11.613087	12.152679	12.163982
Baseline NN with adam and relus	8.866301	11.672534	13.491269
Model 1	8.626318	9.675867	10.362348
Model 2	13.269832	7.183069	10.811023
Model 3	10.425955	10.210033	9.762248
Model 4	12.573368	16.200413	18.518716
	RMSE for SLUMP on scaled data	RMSE for SLUMP on scaled data with wc ratio	RMSE for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu	13.449766	14.362210	14.061798
Baseline NN with adam and relus	10.572891	14.413895	15.467968
Model 1	10.295811	11.638119	12.409085
Model 2	15.603433	9.256648	12.598081
Model 3	14.736539	12.450711	13.260030
Model 4	20.090872	25.338983	26.098349

I tried many other neural network architectures. It seems that the dataset is too small for train/valid/test splitting. All standard methods to fine tune neural networks (regularization, deeper nets, more units, different number of iterations, different batch sizes etc.) by changing one parameter at a time did not yield useable information how to improve it.

Possible improvements

The results are not satisfying for FLOW and SLUMP. Similar to the concrete compressive strength dataset, the performance of many machine learning algorithms is acceptable for predicting the compressive strength after 28 days.

The situation is as follows:

We have a very small dataset on which proper train/validation/test splitting is almost questionable.
The two worse target variables are correlated significantly.

One idea is to use the best model of each FLOW and SLUMP (R2 ~ 0.6) to predict the other value and add the prediction to the input of a model to predict the other variable. This is not an elegant solution in times of end-to-end deep learning solutions but could help a lot here.
Another approach is to move the water content back into the datasets that contain water to binder and water to cement ratios. Perhaps adding the fly ash back in since the original paper found it to be significant.
Since there is no information on preprocessing in the original paper, we may use unscaled raw data and see what happens. However, this against all good standards and would probably lead to a less generalized model considering completely new input with bigger ranges in input and output.
If this does not help, then we could try to augment the dataset. This is a bit tricky because we are dealing with somewhat uncertain data from experiment. Hence, we could use numerical simulations however that is not feasible just for “revisiting some machine learning dataset”. Our other option is to add some additional data points by using the existing ones and add some noise. The latter one would resemble uncertainty in lab testing as long as it is not too much.

Results of using water-cement ratio and water as features

This approach leads to similar or slightly lower performances as using the water-cement ratio.

Results of data augmentation

Initially, we had 82 samples for training and cross-validation.
By adding some noise to the data, we can extent to 656 and 2624 samples. The original testing data is not changed to assure comparability.

The changes for predicting compressive strength are neglectable.
There is some improvement for some regressors for predicting FLOW. However, it seems like the amount generated samples affects different algorithms differently:

FLOW (R2 values)

	82 samples	656 samples	2624 samples
Linear regression	0.39	0.42	0.38
Decision tree	0.18	0.09	0.06
SVM	0.61	0.71	0.70
Random Forest	-0.26	-0.52	-0.47
AdaBoost	0.04	0.66	0.21
XGBoost	0.28	0.57	0.64

I did not spent more time on training NNs. Some initial tests did not show performance higher than SVM and therefore it would have been a waste of time and electricty to train more.

At least the SVM approaches the results from the original paper.
The result for the SVM looks as follows:

SLUMP (R2 values)

	82 samples	656 samples	2624 samples
Linear regression	0.26	0.28	0.28
Decision tree	-0.29	0.78	0.09*
SVM	0.52	0.61	0.59
Random Forest	-0.2	-0.43	-0.40
AdaBoost	-0.35	0.46	0.24*
XGBoost	0.28	0.42	0.44

*it seems that DTs and AdaBoost are much more sensitive to the random noise

Results of sequential model

This does not help either. Though slump and flow show high correlation this approach did not yield better results.
Predicting flow using results from predicting slump leads to a R2 of 0.52. The other way around we end up with a R2 of 0.56.
In both cases we end up with a worse results than single predictions using SVMs. Apparently, in both cases the initial prediction is not good enough.

Discussion of results

This dataset points one of the main problems in (building) material science. Everyday, a few thousand tests are performed world-wide. However, almost non of the data is stored and used to make better models. It is not even about publishing the datasets and making them available.

We could try to find better models using automated machine learning toolkits or automate feature engineering and so on.
However, there is a fundamental question left:

Is the input data suitable to model FLOW and SLUMP?

Since our main goal is to predict rheological properties for workability, we might be better off if we would know other parameters such as:

Water temperature, and therefore rheological properties (especially viscosity)
Rheological properties of the superplastizier (though it mainly separates aggreagtes)
Shear stresses during preparation
Gas (air) content

References

[1] Yeh, I.-C. (2007): Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 (6), 474 - 480. doi: 10.1016/j.cemconcomp.2007.02.001.

[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

[3] Mechtcherine, V.; Shyshko, S. (2015): Simulating the behaviour of fresh concrete with the Distinct Element Method - Deriving model parameters related to the yield stress. Cement & Concrete Composites 55, 81 - 90. doi:10.1016/j.cemconcomp.2014.08.004.

Acknowledgements

I would like to thank I-Cheng Yeh for making the dataset available.