Today, we will have a look at this dataset on Concrete Slump Test by Yeh (2007) [1] as part of my “Exploring Less Known Datasets for Machine Learning” series. Let us see how state-of-the-art algorithms compare with the results from 2007. (My views on this dataset are entirely based on Yeh (2007)).


Contents


Dataset, baseline results from Yeh (2007) and some domain knowledge


Legal

dataset © Prof. I-Cheng Yeh [1]; published in UCI Machine Learning Repository [2].




The aim of the dataset is to predict concrete compressive strength of high performance concrete (HPC) after 28 days as well as to determine the workability with the measurements of slump and slump flow.

Therefore, our target variables are:

target Y

  • Concrete compressive strength after 28 days [MPa]
  • Slump [cm]
  • Flow [cm]

The original paper covers only the prediction on slump. Yeh (2007) used a dataset with 25 data points less.

To predict our variables, we have these features available:

input X

  • Cement \(\left[\frac{kg}{m^3}\right]\)
  • Fly ash \(\left[\frac{kg}{m^3}\right]\)
  • Slag \(\left[\frac{kg}{m^3}\right]\)
  • Water \(\left[\frac{kg}{m^3}\right]\)
  • Superplasticizer \(\left[\frac{kg}{m^3}\right]\)
  • Fine aggregate \(\left[\frac{kg}{m^3}\right]\)
  • Coarse aggregate \(\left[\frac{kg}{m^3}\right]\)

Baseline results

The models published reached the following performances

  • Testing results of a 2nd order regression
    • R2: 0.13-0.46 (mean: 0.32)
    • RMSE (cm): 10.11 - 22.29 (mean: 15.57)
  • Testing results of a neural networks
    • R2: 0.69 - 0.81 (mean: 0.72)
    • RMSE (cm): 7.51 - 9.93 (mean: 8.51)

Let us have a closer look at some of the features:

Fly ash and slag

Both features can be considered as binder together with cement. Both increase strength and durability of concrete. However, the hardening process takes longer and therefore it requires more time to reach full compressive strength.

Superplasticizer

Superplasticizer are used to ensure better flow properties because they minimize particle segregation. Further, they allow to decrease the water-cement ratio which leads to higher compressive strength.

Water

Workability is influenced by the water content. It is obvious that with increasing water content the mixture will behave more an more fluid.

The raw dataset looks like this:

No Cement Slag Fly ash Water SP Coarse Aggr. Fine Aggr. SLUMP(cm) FLOW(cm) Compressive Strength (28-day)(Mpa)
0 1 273.0 82.0 105.0 210.0 9.0 904.0 680.0 23.0 62.0 34.99
1 2 163.0 149.0 191.0 180.0 12.0 843.0 746.0 0.0 20.0 41.14
2 3 162.0 148.0 191.0 179.0 16.0 840.0 743.0 1.0 20.0 41.81
3 4 162.0 148.0 190.0 179.0 19.0 838.0 741.0 3.0 21.5 42.08
4 5 154.0 112.0 144.0 220.0 10.0 923.0 658.0 20.0 64.0 26.82



The boxplots tell us a bit more about basic statistics of each feature:



Preprocessing and feature engineering

A key factor in concrete engineering is the water-cement ratio.
Since we are dealing with workability estimates, the superplasticizer is another key feature.

Yeh (2007) calculated to following ratios but did not use them for the NNs?

  • Water to cement
  • Water to binder
  • Water to solid
  • Superplasticizer to binder
  • Fly ash to binder
  • Slag to binder
  • Fly ash + slag to binder
  • Aggregate to binder
  • Fine aggregate to coarse aggregate

This raises a few questions. Mainly: what exactly is binder?

Water to cement ratio

Water to cement seems to be the simplest:

\[\text{w/c} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right]}\]

Water to binder ratio

Depending on the definition we could count fly ash and slag as binder as well. Usually they are weighted with k-values which we do not have. Hence, we have to try without them:

\[\text{w/b} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right] + \text{Fly ash} \left[\frac{kg}{m^3}\right] + \text{Blast furnance slag} \left[\frac{kg}{m^3}\right]}\]
input_data_wc_ratio = input_data.copy()
input_data_wc_ratio.insert(input_data_wc_ratio.shape[-1]-3 , 'wc_ratio', input_data_wc_ratio['Water']/input_data_wc_ratio['Cement'])
input_data_wc_ratio.drop(['Water', 'Cement'], inplace=True, axis=1)

input_data_wb_ratio = input_data.copy()
input_data_wb_ratio.insert(input_data_wb_ratio.shape[-1]-3, 'wb_ratio',input_data_wb_ratio['Water']/(input_data_wb_ratio['Cement'] +
                                                                input_data_wb_ratio['Fly ash'] +
                                                                input_data_wb_ratio['Slag']))
input_data_wb_ratio.drop(['Water', 'Cement', 'Fly ash', 'Slag'], inplace=True, axis=1)
Slag Fly ash SP Coarse Aggr. Fine Aggr. wc_ratio SLUMP(cm) FLOW(cm) Compressive Strength (28-day)(Mpa)
0 82.0 105.0 9.0 904.0 680.0 0.769231 23.0 62.0 34.99
1 149.0 191.0 12.0 843.0 746.0 1.104294 0.0 20.0 41.14
2 148.0 191.0 16.0 840.0 743.0 1.104938 1.0 20.0 41.81
3 148.0 190.0 19.0 838.0 741.0 1.104938 3.0 21.5 42.08
4 112.0 144.0 10.0 923.0 658.0 1.428571 20.0 64.0 26.82


And this is how the dataset looks like if we use the water-binder ratio:

SP Coarse Aggr. Fine Aggr. wb_ratio SLUMP(cm) FLOW(cm) Compressive Strength (28-day)(Mpa)
0 9.0 904.0 680.0 0.456522 23.0 62.0 34.99
1 12.0 843.0 746.0 0.357853 0.0 20.0 41.14
2 16.0 840.0 743.0 0.357285 1.0 20.0 41.81
3 19.0 838.0 741.0 0.358000 3.0 21.5 42.08
4 10.0 923.0 658.0 0.536585 20.0 64.0 26.82


Since SLUMP and FLOW both describe the consistency (rheologic properties) of fresh concrete, it is reasonable that they correlate somehow. Mechtcherine and Shyshko (2015) prodive a mathematical description of this process using a discrete element approach [3].
Moreover, we can assume that both are not correlated with compressive strength since that is the result of other processes during hardening (aging) of the concrete.
The following table shows the Pearson correlation coefficients for all 3 target features:

SLUMP FLOW UCS
SLUMP 1.000000 0.906135 -0.223358
FLOW 0.906135 1.000000 -0.124029
UCS -0.223358 -0.124029 1.000000


Further, we have to rescale our data to make it suitable for most ML algorithms.

Applying classical machine learning algorithms

Basic considerations for regression with multi variable targets

As mentioned in a previous blog post, I have a bit different view on multi variable prediction than many other people.

Further, we are dealing with data that has dimensions in the real world. Therefore, we cannot mix up errors in cm and MPa. Moreover, SLUMP and FLOW both are measured in the units of cm however, the database has completely different dimensions.
Hence, we are more or less forced by physics to build 3 models and not one.

Since the dataset is small, we can start to train our models directly with hyperparameter optimization using grid search:

def train_test_random_forest_regression(X_train, X_test, y_train, y_test,scorer,dataset_id):
    random_forest_regression = RandomForestRegressor(random_state=42)
    grid_parameters_random_forest_regression = {'n_estimators' : [3,5,10,15,18],
                                     'max_depth' : [None, 2,3,5,7,9]}
    start_time = time.time()
    grid_obj = GridSearchCV(random_forest_regression, param_grid=grid_parameters_random_forest_regression, cv=kfold_vs_size, n_jobs=-1, scoring=scorer, verbose=0)
    grid_fit = grid_obj.fit(X_train, y_train)
    training_time = time.time() - start_time
    best_random_forest_regression = grid_fit.best_estimator_
    prediction = best_random_forest_regression.predict(X_test)
    r2 = r2_score(y_test, prediction)
    mse = mean_squared_error(y_test, prediction)
    mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
    
    # metrics for true values
    # r2 remains unchanged, mse, mea will change and cannot be scaled
    # because there is some physical meaning behind it
    prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-(i+1)]
    y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-(i+1)]
    mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
    
    return {'Regression type' : 'Random Forest Regression', 'model' : grid_fit, 'Predictions' : prediction, 'R2' : r2,
            'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale,
            'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale,
            'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : str(dataset_id) + str(-(i+1))}

and we can simply iterate over all three targets:

for dataset in datasets:
    X_train, X_test, y_train, y_test = datasets[dataset]['X_train'], datasets[dataset]['X_test'], datasets[dataset]['y_train'], datasets[dataset]['y_test']
    for i in range(y_test.shape[1]):
        results[counter] = train_test_linear_regression(X_train, X_test, y_train[:,-(i+1)], y_test[:,-(i+1)],scorer,dataset)
        ....
        ....

Results of classical machine learning algorithms

The performance for predicting the compressive strength is similar to the performance of ML algorithms on the concrete compressive strength dataset.

Regression type R2 for Compressive strength on scaled data R2 for Compressive strength on scaled data with wc ratio R2 for Compressive strength on scaled data with wb ratio
Linear Regression 0.915434 0.870247 0.575527
Decision Tree Regression 0.659896 0.406098 0.222941
SVM Regression 0.932872 0.936492 0.557471
Random Forest Regression 0.751478 0.835204 0.602958
AdaBoost Regression 0.797401 0.811232 0.582100
XGBoost Regression 0.886348 0.885758 0.681528
MAE for Compressive strength on scaled data MAE for Compressive strength on scaled data with wc ratio MAE for Compressive strength on scaled data with wb ratio
Linear Regression 1.731016 2.137284 4.165805
Decision Tree Regression 3.173140 3.581667 4.676940
SVM Regression 1.335125 1.268278 4.238225
Random Forest Regression 2.602524 2.248668 3.879512
AdaBoost Regression 2.525000 2.441642 4.144793
XGBoost Regression 1.856098 1.856570 3.446844
RMSE for Compressive strength on scaled data RMSE for Compressive strength on scaled data with wc ratio RMSE for Compressive strength on scaled data with wb ratio
Linear Regression 2.068523 2.562253 4.634345
Decision Tree Regression 4.148294 5.481768 6.270327
SVM Regression 1.842961 1.792570 4.731885
Random Forest Regression 3.546059 2.887601 4.482099
AdaBoost Regression 3.201709 3.090490 4.598325
XGBoost Regression 2.398015 2.404230 4.014202



The results on the FLOW variable look less optimistic:

Regression type R2 for FLOW on scaled data R2 for FLOW on scaled data with wc ratio R2 for FLOW on scaled data with wb ratio
Linear Regression 0.386403 0.332854 0.146184
Decision Tree Regression 0.176075 -0.191239 -0.489613
SVM Regression 0.607207 0.590282 0.284920
Random Forest Regression -0.268938 -0.478631 -0.560620
AdaBoost Regression 0.041484 0.154690 -0.350432
XGBoost Regression 0.278578 0.057298 0.292440
MAE for FLOW on scaled data MAE for FLOW on scaled data with wc ratio MAE for FLOW on scaled data with wb ratio
Linear Regression 9.643626 9.987325 11.792952
Decision Tree Regression 10.836241 12.602535 14.315440
SVM Regression 7.460085 7.845292 10.232747
Random Forest Regression 14.319428 15.263048 16.267756
AdaBoost Regression 11.576389 10.678649 13.111255
XGBoost Regression 10.434241 11.828690 10.306791
RMSE for FLOW on scaled data RMSE for FLOW on scaled data with wc ratio RMSE for FLOW on scaled data with wb ratio
Linear Regression 11.831774 12.337268 13.956955
Decision Tree Regression 13.710464 16.485715 18.435082
SVM Regression 9.466519 9.668319 12.772792
Random Forest Regression 17.014866 18.367002 18.869352
AdaBoost Regression 14.787951 13.887252 17.552731
XGBoost Regression 12.829306 14.665458 12.705446



The results on the SLUMP variable look less optimistic as well:

Regression type R2 for SLUMP on scaled data R2 for SLUMP on scaled data with wc ratio R2 for SLUMP on scaled data with wb ratio
Linear Regression 0.256969 0.231385 0.129299
Decision Tree Regression -0.287176 -0.000154 -0.301644
SVM Regression 0.519161 0.495967 0.252016
Random Forest Regression -0.200935 -0.337925 -0.330451
AdaBoost Regression -0.353549 0.035814 0.105920
XGBoost Regression 0.278448 -0.025920 0.570008
MAE for SLUMP on scaled data MAE for SLUMP on scaled data with wc ratio MAE for SLUMP on scaled data with wb ratio
Linear Regression 5.411023 5.442704 5.639649
Decision Tree Regression 5.777566 5.519243 5.831611
SVM Regression 3.986152 4.076479 5.383039
Random Forest Regression 6.355653 6.797921 6.579892
AdaBoost Regression 6.284446 5.299650 5.177292
XGBoost Regression 4.851443 5.401885 3.871492
RMSE for SLUMP on scaled data RMSE for SLUMP on scaled data with wc ratio RMSE for SLUMP on scaled data with wb ratio
Linear Regression 6.251594 6.358312 6.767399
Decision Tree Regression 8.228225 7.253051 8.274339
SVM Regression 5.029064 5.148925 6.272395
Random Forest Regression 7.947802 8.388863 8.365398
AdaBoost Regression 8.437704 7.121438 6.857651
XGBoost Regression 6.160571 7.345883 4.755730


Applying neural networks

Let us test some neural networks that are close to the one in the original paper.
The paper describes a model that is trained with:

  • no. hidden layers: {0,1,2}
  • no. hidden units: {5,7,10,14}
  • learning rates: {0.1, 0.3, 1.0, 3.0}
  • momentum: {0.0, 0.25, 0.5, 0.75}
  • iterations: {500, 1000, 2000, 5000}

I did not test all these combinations. I skipped gradient descent variations and used adam.

However, it is not noted what model architecture of the ones mentioned in the original paper leads to the baseline performance.
Moreover, there is no information available if the input data was normalized/scaled or used raw. Yeh (2007) did not used real testing sets but did cross-validation only and therefore the testing results are biased.

The baseline model is build with the following function:

def build_baseline_model_orig(input_dim, units, layers):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    for layer in range(layers):
        model.add(Dense(units, activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

In total 144 models are trained with this basis function (48 per dataset).

The results are not very promising for SLUMP and FLOW. Apperently, the results vary a lot from the original publication.

Next, we variate the base model a bit and use different activation functions:

def build_baseline_model_modified_activation(input_dim):    
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
    model.add(Dense(7, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
def build_baseline_model_modified_activations(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(7, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

Some more deep neural networks to test out some more architectures:

def build_model_1(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
def build_model_2(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(12, activation='relu'))
    model.add(Dense(6, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
def build_model_3(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(5, activation='relu'))
    model.add(Dense(12, activation='relu'))
    model.add(Dense(5, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model
def build_model_4(input_dim):
    model = Sequential()
    model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(40, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(30, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
    return model

All models are trained with a batch size of 16, a validation split of 0.2 and on 1000 epochs.

Again it is not surprising that the prediction of compressive strength yield reasonable results:

Regression type R2 for Compressive strength on scaled data R2 for Compressive strength on scaled data with wc ratio R2 for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu 0.881841 0.776062 0.563822
Baseline NN with adam and relus 0.909270 0.876102 -0.074257
Model 1 0.891060 0.733605 0.500460
Model 2 0.868767 0.925675 0.539743
Model 3 0.774544 0.830290 0.378531
Model 4 0.615885 -0.008820 -0.163422
MAE for Compressive strength on scaled data MAE for Compressive strength on scaled data with wc ratio MAE for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu 1.911479 2.808184 4.072204
Baseline NN with adam and relus 1.891581 1.966362 5.927383
Model 1 2.027864 2.962191 4.383823
Model 2 2.219140 1.481955 4.251803
Model 3 2.676268 2.295996 4.256732
Model 4 3.815574 4.277224 6.414563
RMSE for Compressive strength on scaled data RMSE for Compressive strength on scaled data with wc ratio RMSE for Compressive strength on scaled data with wb ratio
Baseline NN with adam and relu 2.445096 3.366102 4.697807
Baseline NN with adam and relus 2.142593 2.503772 7.372544
Model 1 2.347781 3.671354 5.027460
Model 2 2.576822 1.939240 4.825736
Model 3 3.377495 2.930336 5.607549
Model 4 4.408530 7.144473 7.672412



Again the predictions for FLOW are questionable:

Regression type R2 for FLOW on scaled data R2 for FLOW on scaled data with wc ratio R2 for FLOW on scaled data with wb ratio
Baseline NN with adam and relu 0.217175 0.083128 0.105846
Baseline NN with adam and relus 0.633208 0.029447 0.058350
Model 1 0.496916 0.228210 0.029159
Model 2 0.437770 0.100727 0.111989
Model 3 0.051917 0.557840 -0.168060
Model 4 -0.293932 -0.420341 -0.347595
Regression type MAE for FLOW on scaled data MAE for FLOW on scaled data with wc ratio MAE for FLOW on scaled data with wb ratio
Baseline NN with adam and relu 8.553436 9.286669 9.267833
Baseline NN with adam and relus 5.601089 9.232718 9.444457
Model 1 6.468055 8.454191 9.748035
Model 2 7.076157 8.531679 8.998165
Model 3 8.129520 6.284620 9.569247
Model 4 9.932128 11.630792 11.138774
Regression type RMSE for FLOW on scaled data RMSE for FLOW on scaled data with wc ratio RMSE for FLOW on scaled data with wb ratio
Baseline NN with adam and relu 10.028236 10.852924 10.717625
Baseline NN with adam and relus 6.864392 11.166113 10.998594
Model 1 8.039198 9.957310 11.167768
Model 2 8.498636 10.748260 10.680741
Model 3 11.036095 7.536716 12.249687
Model 4 12.892823 13.507923 13.157461



It does not look better for SLUMP:

Regression type R2 for SLUMP on scaled data R2 for SLUMP on scaled data with wc ratio R2 for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu 0.155706 0.037265 0.077118
Baseline NN with adam and relus 0.478263 0.030323 -0.116686
Model 1 0.505250 0.367836 0.281306
Model 2 -0.136331 0.600081 0.259247
Model 3 -0.013574 0.276476 0.179358
Model 4 -0.883918 -1.996697 -2.179000
MAE for SLUMP on scaled data MAE for SLUMP on scaled data with wc ratio MAE for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu 11.613087 12.152679 12.163982
Baseline NN with adam and relus 8.866301 11.672534 13.491269
Model 1 8.626318 9.675867 10.362348
Model 2 13.269832 7.183069 10.811023
Model 3 10.425955 10.210033 9.762248
Model 4 12.573368 16.200413 18.518716
RMSE for SLUMP on scaled data RMSE for SLUMP on scaled data with wc ratio RMSE for SLUMP on scaled data with wb ratio
Baseline NN with adam and relu 13.449766 14.362210 14.061798
Baseline NN with adam and relus 10.572891 14.413895 15.467968
Model 1 10.295811 11.638119 12.409085
Model 2 15.603433 9.256648 12.598081
Model 3 14.736539 12.450711 13.260030
Model 4 20.090872 25.338983 26.098349


I tried many other neural network architectures. It seems that the dataset is too small for train/valid/test splitting. All standard methods to fine tune neural networks (regularization, deeper nets, more units, different number of iterations, different batch sizes etc.) by changing one parameter at a time did not yield useable information how to improve it.

Possible improvements

The results are not satisfying for FLOW and SLUMP. Similar to the concrete compressive strength dataset, the performance of many machine learning algorithms is acceptable for predicting the compressive strength after 28 days.

The situation is as follows:

  1. We have a very small dataset on which proper train/validation/test splitting is almost questionable.
  2. The two worse target variables are correlated significantly.
  • One idea is to use the best model of each FLOW and SLUMP (R2 ~ 0.6) to predict the other value and add the prediction to the input of a model to predict the other variable. This is not an elegant solution in times of end-to-end deep learning solutions but could help a lot here.
  • Another approach is to move the water content back into the datasets that contain water to binder and water to cement ratios. Perhaps adding the fly ash back in since the original paper found it to be significant.
  • Since there is no information on preprocessing in the original paper, we may use unscaled raw data and see what happens. However, this against all good standards and would probably lead to a less generalized model considering completely new input with bigger ranges in input and output.
  • If this does not help, then we could try to augment the dataset. This is a bit tricky because we are dealing with somewhat uncertain data from experiment. Hence, we could use numerical simulations however that is not feasible just for “revisiting some machine learning dataset”. Our other option is to add some additional data points by using the existing ones and add some noise. The latter one would resemble uncertainty in lab testing as long as it is not too much.

Results of using water-cement ratio and water as features

This approach leads to similar or slightly lower performances as using the water-cement ratio.

Results of data augmentation

Initially, we had 82 samples for training and cross-validation.
By adding some noise to the data, we can extent to 656 and 2624 samples. The original testing data is not changed to assure comparability.

The changes for predicting compressive strength are neglectable.
There is some improvement for some regressors for predicting FLOW. However, it seems like the amount generated samples affects different algorithms differently:

FLOW (R2 values)

  82 samples 656 samples 2624 samples
Linear regression 0.39 0.42 0.38
Decision tree 0.18 0.09 0.06
SVM 0.61 0.71 0.70
Random Forest -0.26 -0.52 -0.47
AdaBoost 0.04 0.66 0.21
XGBoost 0.28 0.57 0.64

I did not spent more time on training NNs. Some initial tests did not show performance higher than SVM and therefore it would have been a waste of time and electricty to train more.

At least the SVM approaches the results from the original paper.
The result for the SVM looks as follows:


SLUMP (R2 values)

  82 samples 656 samples 2624 samples
Linear regression 0.26 0.28 0.28
Decision tree -0.29 0.78 0.09*
SVM 0.52 0.61 0.59
Random Forest -0.2 -0.43 -0.40
AdaBoost -0.35 0.46 0.24*
XGBoost 0.28 0.42 0.44

*it seems that DTs and AdaBoost are much more sensitive to the random noise


Results of sequential model

This does not help either. Though slump and flow show high correlation this approach did not yield better results.
Predicting flow using results from predicting slump leads to a R2 of 0.52. The other way around we end up with a R2 of 0.56.
In both cases we end up with a worse results than single predictions using SVMs. Apparently, in both cases the initial prediction is not good enough.

Discussion of results

This dataset points one of the main problems in (building) material science. Everyday, a few thousand tests are performed world-wide. However, almost non of the data is stored and used to make better models. It is not even about publishing the datasets and making them available.

We could try to find better models using automated machine learning toolkits or automate feature engineering and so on.
However, there is a fundamental question left:

  • Is the input data suitable to model FLOW and SLUMP?

Since our main goal is to predict rheological properties for workability, we might be better off if we would know other parameters such as:

  • Water temperature, and therefore rheological properties (especially viscosity)
  • Rheological properties of the superplastizier (though it mainly separates aggreagtes)
  • Shear stresses during preparation
  • Gas (air) content

References

[1] Yeh, I.-C. (2007): Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 (6), 474 - 480. doi: 10.1016/j.cemconcomp.2007.02.001.

[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.

[3] Mechtcherine, V.; Shyshko, S. (2015): Simulating the behaviour of fresh concrete with the Distinct Element Method - Deriving model parameters related to the yield stress. Cement & Concrete Composites 55, 81 - 90. doi:10.1016/j.cemconcomp.2014.08.004.

Acknowledgements

I would like to thank I-Cheng Yeh for making the dataset available.