Today, we will have a look at this dataset on Concrete Slump Test by Yeh (2007) [1] as part of my “Exploring Less Known Datasets for Machine Learning” series. Let us see how state-of-the-art algorithms compare with the results from 2007. (My views on this dataset are entirely based on Yeh (2007)).
Contents
- Dataset, baseline results from Yeh (2007) and some domain knowledge
- Preprocessing and feature engineering
- Applying classical machine learning algorithms
- Applying neural networks
- Discussion of results
Dataset, baseline results from Yeh (2007) and some domain knowledge
Legal
dataset © Prof. I-Cheng Yeh [1]; published in UCI Machine Learning Repository [2].
The aim of the dataset is to predict concrete compressive strength of high performance concrete (HPC) after 28 days as well as to determine the workability with the measurements of slump and slump flow.
Therefore, our target variables are:
target Y
- Concrete compressive strength after 28 days [MPa]
- Slump [cm]
- Flow [cm]
The original paper covers only the prediction on slump. Yeh (2007) used a dataset with 25 data points less.
To predict our variables, we have these features available:
input X
- Cement \(\left[\frac{kg}{m^3}\right]\)
- Fly ash \(\left[\frac{kg}{m^3}\right]\)
- Slag \(\left[\frac{kg}{m^3}\right]\)
- Water \(\left[\frac{kg}{m^3}\right]\)
- Superplasticizer \(\left[\frac{kg}{m^3}\right]\)
- Fine aggregate \(\left[\frac{kg}{m^3}\right]\)
- Coarse aggregate \(\left[\frac{kg}{m^3}\right]\)
Baseline results
The models published reached the following performances
- Testing results of a 2nd order regression
- R2: 0.13-0.46 (mean: 0.32)
- RMSE (cm): 10.11 - 22.29 (mean: 15.57)
- Testing results of a neural networks
- R2: 0.69 - 0.81 (mean: 0.72)
- RMSE (cm): 7.51 - 9.93 (mean: 8.51)
Let us have a closer look at some of the features:
Fly ash and slag
Both features can be considered as binder together with cement. Both increase strength and durability of concrete. However, the hardening process takes longer and therefore it requires more time to reach full compressive strength.
Superplasticizer
Superplasticizer are used to ensure better flow properties because they minimize particle segregation. Further, they allow to decrease the water-cement ratio which leads to higher compressive strength.
Water
Workability is influenced by the water content. It is obvious that with increasing water content the mixture will behave more an more fluid.
The raw dataset looks like this:
No | Cement | Slag | Fly ash | Water | SP | Coarse Aggr. | Fine Aggr. | SLUMP(cm) | FLOW(cm) | Compressive Strength (28-day)(Mpa) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 273.0 | 82.0 | 105.0 | 210.0 | 9.0 | 904.0 | 680.0 | 23.0 | 62.0 | 34.99 |
1 | 2 | 163.0 | 149.0 | 191.0 | 180.0 | 12.0 | 843.0 | 746.0 | 0.0 | 20.0 | 41.14 |
2 | 3 | 162.0 | 148.0 | 191.0 | 179.0 | 16.0 | 840.0 | 743.0 | 1.0 | 20.0 | 41.81 |
3 | 4 | 162.0 | 148.0 | 190.0 | 179.0 | 19.0 | 838.0 | 741.0 | 3.0 | 21.5 | 42.08 |
4 | 5 | 154.0 | 112.0 | 144.0 | 220.0 | 10.0 | 923.0 | 658.0 | 20.0 | 64.0 | 26.82 |
The boxplots tell us a bit more about basic statistics of each feature:
Preprocessing and feature engineering
A key factor in concrete engineering is the water-cement ratio.
Since we are dealing with workability estimates, the superplasticizer is another key feature.
Yeh (2007) calculated to following ratios but did not use them for the NNs?
- Water to cement
- Water to binder
- Water to solid
- Superplasticizer to binder
- Fly ash to binder
- Slag to binder
- Fly ash + slag to binder
- Aggregate to binder
- Fine aggregate to coarse aggregate
This raises a few questions. Mainly: what exactly is binder?
Water to cement ratio
Water to cement seems to be the simplest:
\[\text{w/c} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right]}\]Water to binder ratio
Depending on the definition we could count fly ash and slag as binder as well. Usually they are weighted with k-values which we do not have. Hence, we have to try without them:
\[\text{w/b} = \frac{\text{Water} \left[\frac{kg}{m^3}\right]}{\text{Cement} \left[\frac{kg}{m^3}\right] + \text{Fly ash} \left[\frac{kg}{m^3}\right] + \text{Blast furnance slag} \left[\frac{kg}{m^3}\right]}\]input_data_wc_ratio = input_data.copy()
input_data_wc_ratio.insert(input_data_wc_ratio.shape[-1]-3 , 'wc_ratio', input_data_wc_ratio['Water']/input_data_wc_ratio['Cement'])
input_data_wc_ratio.drop(['Water', 'Cement'], inplace=True, axis=1)
input_data_wb_ratio = input_data.copy()
input_data_wb_ratio.insert(input_data_wb_ratio.shape[-1]-3, 'wb_ratio',input_data_wb_ratio['Water']/(input_data_wb_ratio['Cement'] +
input_data_wb_ratio['Fly ash'] +
input_data_wb_ratio['Slag']))
input_data_wb_ratio.drop(['Water', 'Cement', 'Fly ash', 'Slag'], inplace=True, axis=1)
Slag | Fly ash | SP | Coarse Aggr. | Fine Aggr. | wc_ratio | SLUMP(cm) | FLOW(cm) | Compressive Strength (28-day)(Mpa) | |
---|---|---|---|---|---|---|---|---|---|
0 | 82.0 | 105.0 | 9.0 | 904.0 | 680.0 | 0.769231 | 23.0 | 62.0 | 34.99 |
1 | 149.0 | 191.0 | 12.0 | 843.0 | 746.0 | 1.104294 | 0.0 | 20.0 | 41.14 |
2 | 148.0 | 191.0 | 16.0 | 840.0 | 743.0 | 1.104938 | 1.0 | 20.0 | 41.81 |
3 | 148.0 | 190.0 | 19.0 | 838.0 | 741.0 | 1.104938 | 3.0 | 21.5 | 42.08 |
4 | 112.0 | 144.0 | 10.0 | 923.0 | 658.0 | 1.428571 | 20.0 | 64.0 | 26.82 |
And this is how the dataset looks like if we use the water-binder ratio:
SP | Coarse Aggr. | Fine Aggr. | wb_ratio | SLUMP(cm) | FLOW(cm) | Compressive Strength (28-day)(Mpa) | |
---|---|---|---|---|---|---|---|
0 | 9.0 | 904.0 | 680.0 | 0.456522 | 23.0 | 62.0 | 34.99 |
1 | 12.0 | 843.0 | 746.0 | 0.357853 | 0.0 | 20.0 | 41.14 |
2 | 16.0 | 840.0 | 743.0 | 0.357285 | 1.0 | 20.0 | 41.81 |
3 | 19.0 | 838.0 | 741.0 | 0.358000 | 3.0 | 21.5 | 42.08 |
4 | 10.0 | 923.0 | 658.0 | 0.536585 | 20.0 | 64.0 | 26.82 |
Since SLUMP and FLOW both describe the consistency (rheologic properties) of fresh concrete, it is reasonable that they correlate somehow. Mechtcherine and Shyshko (2015) prodive a mathematical description of this process using a discrete element approach [3].
Moreover, we can assume that both are not correlated with compressive strength since that is the result of other processes during hardening (aging) of the concrete.
The following table shows the Pearson correlation coefficients for all 3 target features:
SLUMP | FLOW | UCS | |
---|---|---|---|
SLUMP | 1.000000 | 0.906135 | -0.223358 |
FLOW | 0.906135 | 1.000000 | -0.124029 |
UCS | -0.223358 | -0.124029 | 1.000000 |
Further, we have to rescale our data to make it suitable for most ML algorithms.
Applying classical machine learning algorithms
Basic considerations for regression with multi variable targets
As mentioned in a previous blog post, I have a bit different view on multi variable prediction than many other people.
Further, we are dealing with data that has dimensions in the real world. Therefore, we cannot mix up errors in cm and MPa. Moreover, SLUMP and FLOW both are measured in the units of cm however, the database has completely different dimensions.
Hence, we are more or less forced by physics to build 3 models and not one.
Since the dataset is small, we can start to train our models directly with hyperparameter optimization using grid search:
def train_test_random_forest_regression(X_train, X_test, y_train, y_test,scorer,dataset_id):
random_forest_regression = RandomForestRegressor(random_state=42)
grid_parameters_random_forest_regression = {'n_estimators' : [3,5,10,15,18],
'max_depth' : [None, 2,3,5,7,9]}
start_time = time.time()
grid_obj = GridSearchCV(random_forest_regression, param_grid=grid_parameters_random_forest_regression, cv=kfold_vs_size, n_jobs=-1, scoring=scorer, verbose=0)
grid_fit = grid_obj.fit(X_train, y_train)
training_time = time.time() - start_time
best_random_forest_regression = grid_fit.best_estimator_
prediction = best_random_forest_regression.predict(X_test)
r2 = r2_score(y_test, prediction)
mse = mean_squared_error(y_test, prediction)
mae = mean_absolute_error(y_true=y_test, y_pred=prediction)
# metrics for true values
# r2 remains unchanged, mse, mea will change and cannot be scaled
# because there is some physical meaning behind it
prediction_true_scale = prediction * datasets[dataset_id]['scaler_array'][:,-(i+1)]
y_test_true_scale = y_test * datasets[dataset_id]['scaler_array'][:,-(i+1)]
mae_true_scale = mean_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
medae_true_scale = median_absolute_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
mse_true_scale = mean_squared_error(y_true=y_test_true_scale, y_pred=prediction_true_scale)
return {'Regression type' : 'Random Forest Regression', 'model' : grid_fit, 'Predictions' : prediction, 'R2' : r2,
'MSE' : mse, 'MAE' : mae, 'MSE_true_scale' : mse_true_scale,
'RMSE_true_scale' : np.sqrt(mse_true_scale), 'MAE_true_scale' : mae_true_scale,
'MedAE_true_scale' : medae_true_scale ,'Training time' : training_time, 'dataset' : str(dataset_id) + str(-(i+1))}
and we can simply iterate over all three targets:
for dataset in datasets:
X_train, X_test, y_train, y_test = datasets[dataset]['X_train'], datasets[dataset]['X_test'], datasets[dataset]['y_train'], datasets[dataset]['y_test']
for i in range(y_test.shape[1]):
results[counter] = train_test_linear_regression(X_train, X_test, y_train[:,-(i+1)], y_test[:,-(i+1)],scorer,dataset)
....
....
Results of classical machine learning algorithms
The performance for predicting the compressive strength is similar to the performance of ML algorithms on the concrete compressive strength dataset.
Regression type | R2 for Compressive strength on scaled data | R2 for Compressive strength on scaled data with wc ratio | R2 for Compressive strength on scaled data with wb ratio |
---|---|---|---|
Linear Regression | 0.915434 | 0.870247 | 0.575527 |
Decision Tree Regression | 0.659896 | 0.406098 | 0.222941 |
SVM Regression | 0.932872 | 0.936492 | 0.557471 |
Random Forest Regression | 0.751478 | 0.835204 | 0.602958 |
AdaBoost Regression | 0.797401 | 0.811232 | 0.582100 |
XGBoost Regression | 0.886348 | 0.885758 | 0.681528 |
MAE for Compressive strength on scaled data | MAE for Compressive strength on scaled data with wc ratio | MAE for Compressive strength on scaled data with wb ratio | |
Linear Regression | 1.731016 | 2.137284 | 4.165805 |
Decision Tree Regression | 3.173140 | 3.581667 | 4.676940 |
SVM Regression | 1.335125 | 1.268278 | 4.238225 |
Random Forest Regression | 2.602524 | 2.248668 | 3.879512 |
AdaBoost Regression | 2.525000 | 2.441642 | 4.144793 |
XGBoost Regression | 1.856098 | 1.856570 | 3.446844 |
RMSE for Compressive strength on scaled data | RMSE for Compressive strength on scaled data with wc ratio | RMSE for Compressive strength on scaled data with wb ratio | |
Linear Regression | 2.068523 | 2.562253 | 4.634345 |
Decision Tree Regression | 4.148294 | 5.481768 | 6.270327 |
SVM Regression | 1.842961 | 1.792570 | 4.731885 |
Random Forest Regression | 3.546059 | 2.887601 | 4.482099 |
AdaBoost Regression | 3.201709 | 3.090490 | 4.598325 |
XGBoost Regression | 2.398015 | 2.404230 | 4.014202 |
The results on the FLOW variable look less optimistic:
Regression type | R2 for FLOW on scaled data | R2 for FLOW on scaled data with wc ratio | R2 for FLOW on scaled data with wb ratio |
---|---|---|---|
Linear Regression | 0.386403 | 0.332854 | 0.146184 |
Decision Tree Regression | 0.176075 | -0.191239 | -0.489613 |
SVM Regression | 0.607207 | 0.590282 | 0.284920 |
Random Forest Regression | -0.268938 | -0.478631 | -0.560620 |
AdaBoost Regression | 0.041484 | 0.154690 | -0.350432 |
XGBoost Regression | 0.278578 | 0.057298 | 0.292440 |
MAE for FLOW on scaled data | MAE for FLOW on scaled data with wc ratio | MAE for FLOW on scaled data with wb ratio | |
Linear Regression | 9.643626 | 9.987325 | 11.792952 |
Decision Tree Regression | 10.836241 | 12.602535 | 14.315440 |
SVM Regression | 7.460085 | 7.845292 | 10.232747 |
Random Forest Regression | 14.319428 | 15.263048 | 16.267756 |
AdaBoost Regression | 11.576389 | 10.678649 | 13.111255 |
XGBoost Regression | 10.434241 | 11.828690 | 10.306791 |
RMSE for FLOW on scaled data | RMSE for FLOW on scaled data with wc ratio | RMSE for FLOW on scaled data with wb ratio | |
Linear Regression | 11.831774 | 12.337268 | 13.956955 |
Decision Tree Regression | 13.710464 | 16.485715 | 18.435082 |
SVM Regression | 9.466519 | 9.668319 | 12.772792 |
Random Forest Regression | 17.014866 | 18.367002 | 18.869352 |
AdaBoost Regression | 14.787951 | 13.887252 | 17.552731 |
XGBoost Regression | 12.829306 | 14.665458 | 12.705446 |
The results on the SLUMP variable look less optimistic as well:
Regression type | R2 for SLUMP on scaled data | R2 for SLUMP on scaled data with wc ratio | R2 for SLUMP on scaled data with wb ratio |
---|---|---|---|
Linear Regression | 0.256969 | 0.231385 | 0.129299 |
Decision Tree Regression | -0.287176 | -0.000154 | -0.301644 |
SVM Regression | 0.519161 | 0.495967 | 0.252016 |
Random Forest Regression | -0.200935 | -0.337925 | -0.330451 |
AdaBoost Regression | -0.353549 | 0.035814 | 0.105920 |
XGBoost Regression | 0.278448 | -0.025920 | 0.570008 |
MAE for SLUMP on scaled data | MAE for SLUMP on scaled data with wc ratio | MAE for SLUMP on scaled data with wb ratio | |
Linear Regression | 5.411023 | 5.442704 | 5.639649 |
Decision Tree Regression | 5.777566 | 5.519243 | 5.831611 |
SVM Regression | 3.986152 | 4.076479 | 5.383039 |
Random Forest Regression | 6.355653 | 6.797921 | 6.579892 |
AdaBoost Regression | 6.284446 | 5.299650 | 5.177292 |
XGBoost Regression | 4.851443 | 5.401885 | 3.871492 |
RMSE for SLUMP on scaled data | RMSE for SLUMP on scaled data with wc ratio | RMSE for SLUMP on scaled data with wb ratio | |
Linear Regression | 6.251594 | 6.358312 | 6.767399 |
Decision Tree Regression | 8.228225 | 7.253051 | 8.274339 |
SVM Regression | 5.029064 | 5.148925 | 6.272395 |
Random Forest Regression | 7.947802 | 8.388863 | 8.365398 |
AdaBoost Regression | 8.437704 | 7.121438 | 6.857651 |
XGBoost Regression | 6.160571 | 7.345883 | 4.755730 |
Applying neural networks
Let us test some neural networks that are close to the one in the original paper.
The paper describes a model that is trained with:
- no. hidden layers: {0,1,2}
- no. hidden units: {5,7,10,14}
- learning rates: {0.1, 0.3, 1.0, 3.0}
- momentum: {0.0, 0.25, 0.5, 0.75}
- iterations: {500, 1000, 2000, 5000}
I did not test all these combinations. I skipped gradient descent variations and used adam.
However, it is not noted what model architecture of the ones mentioned in the original paper leads to the baseline performance.
Moreover, there is no information available if the input data was normalized/scaled or used raw. Yeh (2007) did not used real testing sets but did cross-validation only and therefore the testing results are biased.
The baseline model is build with the following function:
def build_baseline_model_orig(input_dim, units, layers):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
for layer in range(layers):
model.add(Dense(units, activation='sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
In total 144 models are trained with this basis function (48 per dataset).
The results are not very promising for SLUMP and FLOW. Apperently, the results vary a lot from the original publication.
Next, we variate the base model a bit and use different activation functions:
def build_baseline_model_modified_activation(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='sigmoid'))
model.add(Dense(7, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
def build_baseline_model_modified_activations(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
Some more deep neural networks to test out some more architectures:
def build_model_1(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
def build_model_2(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
def build_model_3(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(5, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
def build_model_4(input_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(40, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(30, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
return model
All models are trained with a batch size of 16, a validation split of 0.2 and on 1000 epochs.
Again it is not surprising that the prediction of compressive strength yield reasonable results:
Regression type | R2 for Compressive strength on scaled data | R2 for Compressive strength on scaled data with wc ratio | R2 for Compressive strength on scaled data with wb ratio |
---|---|---|---|
Baseline NN with adam and relu | 0.881841 | 0.776062 | 0.563822 |
Baseline NN with adam and relus | 0.909270 | 0.876102 | -0.074257 |
Model 1 | 0.891060 | 0.733605 | 0.500460 |
Model 2 | 0.868767 | 0.925675 | 0.539743 |
Model 3 | 0.774544 | 0.830290 | 0.378531 |
Model 4 | 0.615885 | -0.008820 | -0.163422 |
MAE for Compressive strength on scaled data | MAE for Compressive strength on scaled data with wc ratio | MAE for Compressive strength on scaled data with wb ratio | |
Baseline NN with adam and relu | 1.911479 | 2.808184 | 4.072204 |
Baseline NN with adam and relus | 1.891581 | 1.966362 | 5.927383 |
Model 1 | 2.027864 | 2.962191 | 4.383823 |
Model 2 | 2.219140 | 1.481955 | 4.251803 |
Model 3 | 2.676268 | 2.295996 | 4.256732 |
Model 4 | 3.815574 | 4.277224 | 6.414563 |
RMSE for Compressive strength on scaled data | RMSE for Compressive strength on scaled data with wc ratio | RMSE for Compressive strength on scaled data with wb ratio | |
Baseline NN with adam and relu | 2.445096 | 3.366102 | 4.697807 |
Baseline NN with adam and relus | 2.142593 | 2.503772 | 7.372544 |
Model 1 | 2.347781 | 3.671354 | 5.027460 |
Model 2 | 2.576822 | 1.939240 | 4.825736 |
Model 3 | 3.377495 | 2.930336 | 5.607549 |
Model 4 | 4.408530 | 7.144473 | 7.672412 |
Again the predictions for FLOW are questionable:
Regression type | R2 for FLOW on scaled data | R2 for FLOW on scaled data with wc ratio | R2 for FLOW on scaled data with wb ratio |
---|---|---|---|
Baseline NN with adam and relu | 0.217175 | 0.083128 | 0.105846 |
Baseline NN with adam and relus | 0.633208 | 0.029447 | 0.058350 |
Model 1 | 0.496916 | 0.228210 | 0.029159 |
Model 2 | 0.437770 | 0.100727 | 0.111989 |
Model 3 | 0.051917 | 0.557840 | -0.168060 |
Model 4 | -0.293932 | -0.420341 | -0.347595 |
Regression type | MAE for FLOW on scaled data | MAE for FLOW on scaled data with wc ratio | MAE for FLOW on scaled data with wb ratio |
Baseline NN with adam and relu | 8.553436 | 9.286669 | 9.267833 |
Baseline NN with adam and relus | 5.601089 | 9.232718 | 9.444457 |
Model 1 | 6.468055 | 8.454191 | 9.748035 |
Model 2 | 7.076157 | 8.531679 | 8.998165 |
Model 3 | 8.129520 | 6.284620 | 9.569247 |
Model 4 | 9.932128 | 11.630792 | 11.138774 |
Regression type | RMSE for FLOW on scaled data | RMSE for FLOW on scaled data with wc ratio | RMSE for FLOW on scaled data with wb ratio |
Baseline NN with adam and relu | 10.028236 | 10.852924 | 10.717625 |
Baseline NN with adam and relus | 6.864392 | 11.166113 | 10.998594 |
Model 1 | 8.039198 | 9.957310 | 11.167768 |
Model 2 | 8.498636 | 10.748260 | 10.680741 |
Model 3 | 11.036095 | 7.536716 | 12.249687 |
Model 4 | 12.892823 | 13.507923 | 13.157461 |
It does not look better for SLUMP:
Regression type | R2 for SLUMP on scaled data | R2 for SLUMP on scaled data with wc ratio | R2 for SLUMP on scaled data with wb ratio |
---|---|---|---|
Baseline NN with adam and relu | 0.155706 | 0.037265 | 0.077118 |
Baseline NN with adam and relus | 0.478263 | 0.030323 | -0.116686 |
Model 1 | 0.505250 | 0.367836 | 0.281306 |
Model 2 | -0.136331 | 0.600081 | 0.259247 |
Model 3 | -0.013574 | 0.276476 | 0.179358 |
Model 4 | -0.883918 | -1.996697 | -2.179000 |
MAE for SLUMP on scaled data | MAE for SLUMP on scaled data with wc ratio | MAE for SLUMP on scaled data with wb ratio | |
Baseline NN with adam and relu | 11.613087 | 12.152679 | 12.163982 |
Baseline NN with adam and relus | 8.866301 | 11.672534 | 13.491269 |
Model 1 | 8.626318 | 9.675867 | 10.362348 |
Model 2 | 13.269832 | 7.183069 | 10.811023 |
Model 3 | 10.425955 | 10.210033 | 9.762248 |
Model 4 | 12.573368 | 16.200413 | 18.518716 |
RMSE for SLUMP on scaled data | RMSE for SLUMP on scaled data with wc ratio | RMSE for SLUMP on scaled data with wb ratio | |
Baseline NN with adam and relu | 13.449766 | 14.362210 | 14.061798 |
Baseline NN with adam and relus | 10.572891 | 14.413895 | 15.467968 |
Model 1 | 10.295811 | 11.638119 | 12.409085 |
Model 2 | 15.603433 | 9.256648 | 12.598081 |
Model 3 | 14.736539 | 12.450711 | 13.260030 |
Model 4 | 20.090872 | 25.338983 | 26.098349 |
I tried many other neural network architectures. It seems that the dataset is too small for train/valid/test splitting. All standard methods to fine tune neural networks (regularization, deeper nets, more units, different number of iterations, different batch sizes etc.) by changing one parameter at a time did not yield useable information how to improve it.
Possible improvements
The results are not satisfying for FLOW and SLUMP. Similar to the concrete compressive strength dataset, the performance of many machine learning algorithms is acceptable for predicting the compressive strength after 28 days.
The situation is as follows:
- We have a very small dataset on which proper train/validation/test splitting is almost questionable.
- The two worse target variables are correlated significantly.
- One idea is to use the best model of each FLOW and SLUMP (R2 ~ 0.6) to predict the other value and add the prediction to the input of a model to predict the other variable. This is not an elegant solution in times of end-to-end deep learning solutions but could help a lot here.
- Another approach is to move the water content back into the datasets that contain water to binder and water to cement ratios. Perhaps adding the fly ash back in since the original paper found it to be significant.
- Since there is no information on preprocessing in the original paper, we may use unscaled raw data and see what happens. However, this against all good standards and would probably lead to a less generalized model considering completely new input with bigger ranges in input and output.
- If this does not help, then we could try to augment the dataset. This is a bit tricky because we are dealing with somewhat uncertain data from experiment. Hence, we could use numerical simulations however that is not feasible just for “revisiting some machine learning dataset”. Our other option is to add some additional data points by using the existing ones and add some noise. The latter one would resemble uncertainty in lab testing as long as it is not too much.
Results of using water-cement ratio and water as features
This approach leads to similar or slightly lower performances as using the water-cement ratio.
Results of data augmentation
Initially, we had 82 samples for training and cross-validation.
By adding some noise to the data, we can extent to 656 and 2624 samples. The original testing data is not changed to assure comparability.
The changes for predicting compressive strength are neglectable.
There is some improvement for some regressors for predicting FLOW. However, it seems like the amount generated samples affects different algorithms differently:
FLOW (R2 values)
82 samples | 656 samples | 2624 samples | |
---|---|---|---|
Linear regression | 0.39 | 0.42 | 0.38 |
Decision tree | 0.18 | 0.09 | 0.06 |
SVM | 0.61 | 0.71 | 0.70 |
Random Forest | -0.26 | -0.52 | -0.47 |
AdaBoost | 0.04 | 0.66 | 0.21 |
XGBoost | 0.28 | 0.57 | 0.64 |
I did not spent more time on training NNs. Some initial tests did not show performance higher than SVM and therefore it would have been a waste of time and electricty to train more.
At least the SVM approaches the results from the original paper.
The result for the SVM looks as follows:
SLUMP (R2 values)
82 samples | 656 samples | 2624 samples | |
---|---|---|---|
Linear regression | 0.26 | 0.28 | 0.28 |
Decision tree | -0.29 | 0.78 | 0.09* |
SVM | 0.52 | 0.61 | 0.59 |
Random Forest | -0.2 | -0.43 | -0.40 |
AdaBoost | -0.35 | 0.46 | 0.24* |
XGBoost | 0.28 | 0.42 | 0.44 |
*it seems that DTs and AdaBoost are much more sensitive to the random noise
Results of sequential model
This does not help either. Though slump and flow show high correlation this approach did not yield better results.
Predicting flow using results from predicting slump leads to a R2 of 0.52. The other way around we end up with a R2 of 0.56.
In both cases we end up with a worse results than single predictions using SVMs. Apparently, in both cases the initial prediction is not good enough.
Discussion of results
This dataset points one of the main problems in (building) material science. Everyday, a few thousand tests are performed world-wide. However, almost non of the data is stored and used to make better models. It is not even about publishing the datasets and making them available.
We could try to find better models using automated machine learning toolkits or automate feature engineering and so on.
However, there is a fundamental question left:
- Is the input data suitable to model FLOW and SLUMP?
Since our main goal is to predict rheological properties for workability, we might be better off if we would know other parameters such as:
- Water temperature, and therefore rheological properties (especially viscosity)
- Rheological properties of the superplastizier (though it mainly separates aggreagtes)
- Shear stresses during preparation
- Gas (air) content
References
[1] Yeh, I.-C. (2007): Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 (6), 474 - 480. doi: 10.1016/j.cemconcomp.2007.02.001.
[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
[3] Mechtcherine, V.; Shyshko, S. (2015): Simulating the behaviour of fresh concrete with the Distinct Element Method - Deriving model parameters related to the yield stress. Cement & Concrete Composites 55, 81 - 90. doi:10.1016/j.cemconcomp.2014.08.004.
Acknowledgements
I would like to thank I-Cheng Yeh for making the dataset available.