Today, we’ll have a look at this dataset on Steel Plate Faults by the Semeion Research Center of Sciences of Communication as part of my Exploring Less Known Datasets for Machine Learning series.
The dataset deals with detecting surface defects in stainless steel plates [1].
Let’s see how classical ML algorithms and some simple DNNs compare to the ANNs used in the original publication.
Contents
Dataset exploration and preprocessing
The steel plate dataset is hosted on the UCI Machine Learning Repository. The dataset is split into the data and a file that contains the header. Hence, we have to load both and combine them:
InputDataHeader = pd.read_csv("./data/Faults27x7_var",
header=None)
display(InputDataHeader.values)
array([['X_Minimum'],
['X_Maximum'],
['Y_Minimum'],
['Y_Maximum'],
['Pixels_Areas'],
['X_Perimeter'],
['Y_Perimeter'],
['Sum_of_Luminosity'],
['Minimum_of_Luminosity'],
['Maximum_of_Luminosity'],
['Length_of_Conveyer'],
['TypeOfSteel_A300'],
['TypeOfSteel_A400'],
['Steel_Plate_Thickness'],
['Edges_Index'],
['Empty_Index'],
['Square_Index'],
['Outside_X_Index'],
['Edges_X_Index'],
['Edges_Y_Index'],
['Outside_Global_Index'],
['LogOfAreas'],
['Log_X_Index'],
['Log_Y_Index'],
['Orientation_Index'],
['Luminosity_Index'],
['SigmoidOfAreas'],
['Pastry'],
['Z_Scratch'],
['K_Scatch'],
['Stains'],
['Dirtiness'],
['Bumps'],
['Other_Faults']], dtype=object)
InputData = pd.read_csv("./data/Faults.NNA",
header=None, sep="\t")
InputData.set_axis(InputDataHeader.values.flatten(),
axis=1,
inplace=True)
Next, we can have a look at the dataset:
display(InputData.head(2))
display(InputData.tail(2))
display(InputData.describe())
X_Minimum | X_Maximum | Y_Minimum | Y_Maximum | Pixels_Areas | X_Perimeter | Y_Perimeter | Sum_of_Luminosity | Minimum_of_Luminosity | Maximum_of_Luminosity | ... | Orientation_Index | Luminosity_Index | SigmoidOfAreas | Pastry | Z_Scratch | K_Scatch | Stains | Dirtiness | Bumps | Other_Faults | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42 | 50 | 270900 | 270944 | 267 | 17 | 44 | 24220 | 76 | 108 | ... | 0.8182 | -0.2913 | 0.5822 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 645 | 651 | 2538079 | 2538108 | 108 | 10 | 30 | 11397 | 84 | 123 | ... | 0.7931 | -0.1756 | 0.2984 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
X_Minimum | X_Maximum | Y_Minimum | Y_Maximum | Pixels_Areas | X_Perimeter | Y_Perimeter | Sum_of_Luminosity | Minimum_of_Luminosity | Maximum_of_Luminosity | ... | Orientation_Index | Luminosity_Index | SigmoidOfAreas | Pastry | Z_Scratch | K_Scatch | Stains | Dirtiness | Bumps | Other_Faults | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1939 | 137 | 170 | 422497 | 422528 | 419 | 97 | 47 | 52715 | 117 | 140 | ... | -0.0606 | -0.0171 | 0.9919 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
1940 | 1261 | 1281 | 87951 | 87967 | 103 | 26 | 22 | 11682 | 101 | 133 | ... | -0.2000 | -0.1139 | 0.5296 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
X_Minimum | X_Maximum | Y_Minimum | Y_Maximum | Pixels_Areas | X_Perimeter | Y_Perimeter | Sum_of_Luminosity | Minimum_of_Luminosity | Maximum_of_Luminosity | ... | Orientation_Index | Luminosity_Index | SigmoidOfAreas | Pastry | Z_Scratch | K_Scatch | Stains | Dirtiness | Bumps | Other_Faults | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1941.000000 | 1941.000000 | 1.941000e+03 | 1.941000e+03 | 1941.000000 | 1941.000000 | 1941.000000 | 1.941000e+03 | 1941.000000 | 1941.000000 | ... | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 | 1941.000000 |
mean | 571.136012 | 617.964451 | 1.650685e+06 | 1.650739e+06 | 1893.878413 | 111.855229 | 82.965997 | 2.063121e+05 | 84.548686 | 130.193715 | ... | 0.083288 | -0.131305 | 0.585420 | 0.081401 | 0.097888 | 0.201443 | 0.037094 | 0.028336 | 0.207110 | 0.346728 |
std | 520.690671 | 497.627410 | 1.774578e+06 | 1.774590e+06 | 5168.459560 | 301.209187 | 426.482879 | 5.122936e+05 | 32.134276 | 18.690992 | ... | 0.500868 | 0.148767 | 0.339452 | 0.273521 | 0.297239 | 0.401181 | 0.189042 | 0.165973 | 0.405339 | 0.476051 |
min | 0.000000 | 4.000000 | 6.712000e+03 | 6.724000e+03 | 2.000000 | 2.000000 | 1.000000 | 2.500000e+02 | 0.000000 | 37.000000 | ... | -0.991000 | -0.998900 | 0.119000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 51.000000 | 192.000000 | 4.712530e+05 | 4.712810e+05 | 84.000000 | 15.000000 | 13.000000 | 9.522000e+03 | 63.000000 | 124.000000 | ... | -0.333300 | -0.195000 | 0.248200 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 435.000000 | 467.000000 | 1.204128e+06 | 1.204136e+06 | 174.000000 | 26.000000 | 25.000000 | 1.920200e+04 | 90.000000 | 127.000000 | ... | 0.095200 | -0.133000 | 0.506300 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
75% | 1053.000000 | 1072.000000 | 2.183073e+06 | 2.183084e+06 | 822.000000 | 84.000000 | 83.000000 | 8.301100e+04 | 106.000000 | 140.000000 | ... | 0.511600 | -0.066600 | 0.999800 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
max | 1705.000000 | 1713.000000 | 1.298766e+07 | 1.298769e+07 | 152655.000000 | 10449.000000 | 18152.000000 | 1.159141e+07 | 203.000000 | 253.000000 | ... | 0.991700 | 0.642100 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
We can detect one problem here. Our target variables are One-Hot-Encoded already. Hence, we have to reverse the process to satisfy scikit-lear. It’s also helpful to gain a better, visual impression of the dataset:
X_df = InputData.copy()
X_df.drop(["Pastry","Z_Scratch","K_Scatch","Stains","Dirtiness","Bumps","Other_Faults"], axis=1,inplace=True)
y_df = InputData[["Pastry","Z_Scratch","K_Scatch","Stains","Dirtiness","Bumps","Other_Faults"]].copy()
# prepare y for scikit-learn
y = []
for i in range(y_df.shape[0]):
if y_df["Pastry"].values[i] == 1:
y.append("Pastry")
elif y_df["Z_Scratch"].values[i] == 1:
y.append("Z_Scratch")
elif y_df["K_Scatch"].values[i] == 1:
y.append("K_Scatch")
elif y_df["Stains"].values[i] == 1:
y.append("Stains")
elif y_df["Dirtiness"].values[i] == 1:
y.append("Dirtiness")
elif y_df["Bumps"].values[i] == 1:
y.append("Bumps")
else:
y.append("Other_Faults")
FailureModeDistribution = {}
for FailureMode in y_df:
FailureModeDistribution[FailureMode] = np.bincount(y_df[FailureMode])[1]
FailureModeCheckSum = np.sum([FailureModeDistribution[FailureMode] for FailureMode in FailureModeDistribution])
We end up with 1941 detected faults which means that there are no doubles. However, we deal with an uneven distribution:
I don’t know what K_Scatch is; I assume that it should be a “Scratch” as well.
Let’s have a look at the input features and their distributions:
Some input features are distributed evenly, others contain a lot of outliers which may be significant to predict certain failures.
Let’s rescale the data and see if we can extract a bit more by visual assessment:
scaler = sklearn.preprocessing.MaxAbsScaler()
scaler.fit(X_df)
X_scaled = scaler.transform(X_df)
X_df_scaled = pd.DataFrame(X_scaled)
X_df_scaled.set_axis(InputDataHeader.values.flatten()[:-y_df.shape[1]],
axis=1,
inplace=True)
plt.figure(figsize=(11,9))
for FailureMode in y_df:
plt.plot(X_df_scaled[y_df[FailureMode] == 1].values[0], label=FailureMode)
plt.title("Examples for each failure mode (max abs scaled data)")
plt.legend()
plt.show()
Well, it looks like there are some differences visible.
Machine Learning Algorithms
We are going to use my standard “brute force routine” that performed quite well so far. We are going to use a full grid search with the following Hyperparameters for GridSearchCV:
# no parameter variation for GaussianNaiveBayes
grid_parameters_decision_tree_classification = {'max_depth' : [None, 3,5,7,9,10,11]}
grid_parameters_random_forest_classification = {'n_estimators' : [3,5,10,15,18], 'max_depth' : [None, 2,3,5,7,9]}
grid_parameters_adaboost_classifier = {'n_estimators' : [3,5,10,20,50,60,80,100,200,250,300,350,400],
'learning_rate' : [0.001, 0.01, 0.1, 0.8, 1.0]}
grid_parameters_x_gradient_boosting_classification = {'n_estimators' : [3,5,10,15,18,20,25,50,60,80,100,120,150,200],
'max_depth' : [1,2,3,5,7,9,10,11,15],
'learning_rate' :[0.001, 0.01, 0.1],
'booster' : ['gbtree', 'dart']}
-
SVM Classifier seems to have some internal errors - I didn’t had the time to resolve them.
-
I didn’t managed to get CatBoost running. Somehow it did not accept the target data as one-hot-encoded nor as classes like scikit-learn requires. CatBoost’s own one-hot-encoding lead to the same error messages as providing it with one-hot-encoded targets manually.
Furthermoore, we can use a few simple NNs (just testing how they perform, no optimization to the dataset):
def build_baseline_model_1(input_dim,output_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(input_dim//2, activation='relu'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return model
def build_baseline_model_2(input_dim,output_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim//2, activation='relu'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return model
def build_baseline_model_3(input_dim,output_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim//2, activation='relu'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return model
def build_baseline_model_4(input_dim,output_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim//2, activation='relu'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return model
def build_baseline_model_5(input_dim,output_dim):
model = Sequential()
model.add(Dense(input_dim, input_dim=input_dim, activation='relu'))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*3, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim*2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(input_dim//2, activation='relu'))
model.add(Dense(output_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
return model
All neural networks are trained for 1500 epochs and use a batch size of 32. The best model (criterion: validation loss) is selected for final testing.
Results
Before we have a look at the results, we should see what the the baseline results from the original paper were:
“Class A” - Class F”? It is not noted how they correlate to the dataset. Nevertheless, the results average around 89 % overall accuracy. At this point I ran into my usual problem: It is not documented if there was train-test-splitting and what set sizes have been used or if it was simple KFold cross validation. Further, the dataset was used as an example to demonstrade some patented neural network.
Okay, let’s look at the results:
It seems like AdaBoost performs worst since it tries to predict mainly “Other_Faults”.
Decision Tree and Gaussian Naive Bayes perform as expected. Random Forest and the neural networks perform with a similar overall accuracies. However, there is some variation between accuracies of classes. XGBoost performs best.
Discussion
I’m not happy with these results, eventhough we didn’t optimize much and didn’t do any feature engineering. It is also not clear how good these results are because it is not clear how the baseline results were obtained.
update
TPOT and Auto-Sklearn lead to similar results as XGBoost.
References
[1] Buscema, M. (1998): MetaNet: The Theory of Independent Judges, in Substance Use & Misuse, 33(2), 439-461.