Today, we will have a look at the NASA Airfoil Noise dataset as part of my “Exploring Less Known Datasets for Machine Learning” series.
Contents
Exploring the dataset
This dataset contains results of NASA airfoil testing in 1989 [1] and is published in the UCI Machine Learning Repository [2]. It is about self-induced or better self-caused noise due to airflow over an airfoil. In this blog post, we will look at the dataset only and do not investigate aeroacoustics. Lighthill (1992) [3] wrote a short introduction to aeroacoustics that is highly recommended to read. The results are based on (scaled!!!) windtunnel experiments. (Scaled experiments in scaled windtunnels are extremly difficult to “back-scale” to full scale and involve some nasty sides of fluid dynamics and thermodynamics (gas dynamics). Pure CFD was not feasible back then and replaces a lot of windtunnel testing nowadays. Windtunnel tests are really only good for transient and hypersonic experiments since that are the areas that are still difficult to model numerically.) Another approach to estimate airfoil induced noise is to perform numerical simulations that cover fluid-structure interaction.
Let’s load the dataset and have a look at it. If we load a .dat
file with pandas.csv, then we have to set delim_whitespace
to True
.
filepath_input_data = "./data/airfoil_self_noise.dat"
input_data_df = pd.read_csv(filepath_input_data, delim_whitespace=True,
names=['Frequency (Hz)',
'Angle of Attack (deg)',
'Chord length (m)',
'Free-stream velocity (m/s)',
'Suction side displacement thickness (m)',
'Noise (dB)'])
display(input_data_df.head(3))
display(input_data_df.tail(3))
input_data_df.describe()
Frequency (Hz) | Angle of Attack (deg) | Chord length (m) | Free-stream velocity (m/s) | Suction side displacement thickness (m) | Noise (dB) | |
---|---|---|---|---|---|---|
0 | 800 | 0.0 | 0.3048 | 71.3 | 0.002663 | 126.201 |
1 | 1000 | 0.0 | 0.3048 | 71.3 | 0.002663 | 125.201 |
2 | 1250 | 0.0 | 0.3048 | 71.3 | 0.002663 | 125.951 |
Frequency (Hz) | Angle of Attack (deg) | Chord length (m) | Free-stream velocity (m/s) | Suction side displacement thickness (m) | Noise (dB) | |
---|---|---|---|---|---|---|
1500 | 4000 | 15.6 | 0.1016 | 39.6 | 0.052849 | 106.604 |
1501 | 5000 | 15.6 | 0.1016 | 39.6 | 0.052849 | 106.224 |
1502 | 6300 | 15.6 | 0.1016 | 39.6 | 0.052849 | 104.204 |
Frequency (Hz) | Angle of Attack (deg) | Chord length (m) | Free-stream velocity (m/s) | Suction side displacement thickness (m) | Noise (dB) | |
---|---|---|---|---|---|---|
count | 1503.000000 | 1503.000000 | 1503.000000 | 1503.000000 | 1503.000000 | 1503.000000 |
mean | 2886.380572 | 6.782302 | 0.136548 | 50.860745 | 0.011140 | 124.835943 |
std | 3152.573137 | 5.918128 | 0.093541 | 15.572784 | 0.013150 | 6.898657 |
min | 200.000000 | 0.000000 | 0.025400 | 31.700000 | 0.000401 | 103.380000 |
25% | 800.000000 | 2.000000 | 0.050800 | 39.600000 | 0.002535 | 120.191000 |
50% | 1600.000000 | 5.400000 | 0.101600 | 39.600000 | 0.004957 | 125.721000 |
75% | 4000.000000 | 9.900000 | 0.228600 | 71.300000 | 0.015576 | 129.995500 |
max | 20000.000000 | 22.200000 | 0.304800 | 71.300000 | 0.058411 | 140.987000 |
The original publication [1] contains detailed information on all features and experimental settings used for these measurements.
Boxplots are a bit better for visual understanding than a summary table:
The dataset shows some indications of outliers for several features.
Let’s see if any of the features are correlated directly to the noise level:
And as a correlation matrix:
It looks like there is no single variable short-cut here ;).
Next, we have to rescale the dataset and perform train-test splitting:
from sklearn.preprocessing import MaxAbsScaler
input_data_scaled_df = input_data_df.copy()
scaler = MaxAbsScaler()
input_data_scaled = scaler.fit_transform(input_data_df)
input_data_scaled_df.loc[:,:] = input_data_scaled
scaler_params = scaler.get_params()
# We are dealing with physics here, hence we need the unscaled values
extract_scaling_function = np.ones((1,input_data_scaled_df.shape[1]))
extract_scaling_function = scaler.inverse_transform(extract_scaling_function)
display(input_data_scaled_df.head(3))
from sklearn.model_selection import train_test_split
y = input_data_scaled_df['Noise (dB)'].values.reshape(-1,1)
X_df = input_data_scaled_df.copy()
X_df.drop(['Noise (dB)'], axis=1, inplace=True)
X = X_df.values
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42,
shuffle=True)
Applying ML algorithms
Let’s throw a smal set of ML algorithms at the dataset with hyperparameter optimization using a full grid search:
grid_parameters_linear_regression = {'fit_intercept' : [False, True]}
grid_parameters_decision_tree_regression = {'max_depth' : [None, 3,5,7,9,10,11]}
grid_parameters_SVR_regression = {'C' : [1, 5, 7, 10, 30, 50],
'epsilon' : [0.001, 0.01, 0.1, 0.2, 0.5, 0.6,0.8],
'kernel' : ['rbf', 'linear'],
'shrinking' : [False, True],
'tol' : [0.001, 0.0001, 0.00001]}
grid_parameters_random_forest_regression = {'n_estimators' : [3,5,10,15,18],
'max_depth' : [None, 2,3,5,7,9]}
grid_parameters_adaboost_regression = {'n_estimators' : [3,5,10,15,18,20,25,50,60,80,100,120],
'loss' : ['linear', 'square', 'exponential'],
'learning_rate' : [0.001, 0.01, 0.1, 0.8, 1.0]}
grid_parameters_xgboost_regression = {'n_estimators' : [3,5,10,15,18,20,25,50,60,80,100,120,150,200,300],
'max_depth' : [1,2,1015],
'learning_rate' :[ 0.0001, 0.001, 0.01, 0.1, 0.15, 0.2, 0.8, 1.0]}
Furthermore, We are going to run two simple neural networks on it.
Results
Well, the results contains some ups and downs. Random Forests and XGBoost show quite good results. At this point I would like to compare it the original publication as well as to an PhD [4] and master [5] thesis, however it is somewhat how they achieved their results (train-valid-test splitting etc.) and I did not do any manual feature engineering/selection.
References
[1] Brooks, T.F.; Pope, D.S. and M.A. Marcolini (1989): Airfoil Self-Noise and Prediction. NASA Technical Report 1218. online available at: https://ntrs.nasa.gov/search.jsp?R=19890016302
[2] Dua, D.; Taniskidou, K.E. (2018). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
[3] Lighthill, J. (1992): A General Introduction to Aeroacoustics and Atmospheric Sound. NASA Technical Report 92-52/189717. online available at: https://www.archive.org/details/DTIC_ADA257887
[4] Lopez, R. (2008): Neural Networks for Variational Problems in Engineering. PhD Thesis. online available at: https://www.cimne.com/flood/docs/PhDThesis.pdf.
[5] Errasquin, L. (2014): Airfoil Self-Noise Prediction Using Neural Networks for Wind Turbines. Master Thesis. online available at: https://vtechworks.lib.vt.edu/handle/10919/35193.