It is time to have a look at another dataset as part of my exploring less-known datasets series.
This time, we’ll look at the Forest Fires dataset by Cortex and Morais (2007).
Contents
*Some words up front. The first parts cover a classical brute-force data science approach. Interesting and valuable are only the results obtained. The last part covers a manual geospatial approach including a lot of domain knowledge.
Dataset Exploration and Preparation
import time
sys_start = time.time()
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
%matplotlib inline
# not gentlemen-like but it helps to keep the notebook somewhat clean ;)
import warnings
warnings.simplefilter('ignore')
from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from skgarden import MondrianForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer, mean_absolute_error, median_absolute_error
import xgboost
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
#config.gpu_options.per_process_gpu_memory_fraction = 0.90
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))
from keras.models import Sequential
from keras.layers import (Input, Dense, BatchNormalization)
from keras import optimizers
from keras import callbacks
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
input_data = pd.read_csv("./data/forestfires.csv")
display(input_data.sample(10))
input_data['month'] = pd.Categorical(input_data['month']).codes
input_data['day'] = pd.Categorical(input_data['day']).codes
display(input_data.describe())
X | Y | month | day | FFMC | DMC | DC | ISI | temp | RH | wind | rain | area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
99 | 3 | 4 | aug | sun | 91.4 | 142.4 | 601.4 | 10.6 | 19.8 | 39 | 5.4 | 0.0 | 0.00 |
462 | 1 | 4 | sep | sun | 91.0 | 276.3 | 825.1 | 7.1 | 14.5 | 76 | 7.6 | 0.0 | 3.71 |
8 | 8 | 6 | sep | tue | 91.0 | 129.5 | 692.6 | 7.0 | 13.1 | 63 | 5.4 | 0.0 | 0.00 |
24 | 7 | 4 | aug | sat | 93.5 | 139.4 | 594.2 | 20.3 | 23.7 | 32 | 5.8 | 0.0 | 0.00 |
360 | 6 | 5 | sep | fri | 92.5 | 122.0 | 789.7 | 10.2 | 18.4 | 42 | 2.2 | 0.0 | 1.09 |
127 | 3 | 5 | sep | fri | 93.5 | 149.3 | 728.6 | 8.1 | 17.2 | 43 | 3.1 | 0.0 | 0.00 |
449 | 7 | 4 | aug | sun | 91.6 | 181.3 | 613.0 | 7.6 | 19.3 | 61 | 4.9 | 0.0 | 0.00 |
48 | 4 | 4 | mar | mon | 87.2 | 23.9 | 64.7 | 4.1 | 11.8 | 35 | 1.8 | 0.0 | 0.00 |
219 | 6 | 5 | mar | mon | 90.1 | 39.7 | 86.6 | 6.2 | 15.2 | 27 | 3.1 | 0.0 | 31.86 |
511 | 8 | 6 | aug | sun | 81.6 | 56.7 | 665.6 | 1.9 | 27.8 | 35 | 2.7 | 0.0 | 0.00 |
X | Y | month | day | FFMC | DMC | DC | ISI | temp | RH | wind | rain | area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 | 517.000000 |
mean | 4.669246 | 4.299807 | 5.758221 | 2.736944 | 90.644681 | 110.872340 | 547.940039 | 9.021663 | 18.889168 | 44.288201 | 4.017602 | 0.021663 | 12.847292 |
std | 2.313778 | 1.229900 | 4.373275 | 1.925061 | 5.520111 | 64.046482 | 248.066192 | 4.559477 | 5.806625 | 16.317469 | 1.791653 | 0.295959 | 63.655818 |
min | 1.000000 | 2.000000 | 0.000000 | 0.000000 | 18.700000 | 1.100000 | 7.900000 | 0.000000 | 2.200000 | 15.000000 | 0.400000 | 0.000000 | 0.000000 |
25% | 3.000000 | 4.000000 | 1.000000 | 1.000000 | 90.200000 | 68.600000 | 437.700000 | 6.500000 | 15.500000 | 33.000000 | 2.700000 | 0.000000 | 0.000000 |
50% | 4.000000 | 4.000000 | 6.000000 | 3.000000 | 91.600000 | 108.300000 | 664.200000 | 8.400000 | 19.300000 | 42.000000 | 4.000000 | 0.000000 | 0.520000 |
75% | 7.000000 | 5.000000 | 11.000000 | 4.000000 | 92.900000 | 142.400000 | 713.900000 | 10.800000 | 22.800000 | 53.000000 | 4.900000 | 0.000000 | 6.570000 |
max | 9.000000 | 9.000000 | 11.000000 | 6.000000 | 96.200000 | 291.300000 | 860.600000 | 56.100000 | 33.300000 | 100.000000 | 9.400000 | 6.400000 | 1090.840000 |
Let’s have a look at box plots of each feature:
That looks tricky! Let’s start out by creating four different datasets using previous experiences shown in the forestfires.names
file.
# split data into X and y
y = input_data['area'].copy(deep=True)
X = input_data.copy(deep=True)
X.drop(['area'], inplace=True, axis=1)
scaler = MaxAbsScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X,
y,
shuffle=True,
test_size=0.25,
random_state=42)
y_train_log = np.log(y_train+1)
y_test_log = np.log(y_test+1)
X2 = input_data.copy(deep=True)
X2.drop(['area','X','Y','month','day'], inplace=True, axis=1)
scaler2 = MaxAbsScaler()
X2 = scaler2.fit_transform(X2)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2,
y,
shuffle=True,
test_size=0.25,
random_state=42)
y_train_log2 = np.log(y_train2+1)
y_test_log2 = np.log(y_test2+1)
X3 = input_data.copy(deep=True)
X3.drop(['area','X','Y','month','day','FFMC','DMC','DC','ISI'], inplace=True, axis=1)
scaler3 = MaxAbsScaler()
X3 = scaler3.fit_transform(X3)
X_train3, X_test3, y_train3, y_test3 = train_test_split(X3,
y,
shuffle=True,
test_size=0.25,
random_state=42)
y_train_log3 = np.log(y_train3+1)
y_test_log3 = np.log(y_test3+1)
datasets = {}
datasets[0] = {'X_train': X_train,
'X_test' : X_test,
'y_train': y_train,
'y_test' : y_test,
'y_transform_inv' : False,
'comment' : 'X scaled',
'dataset' : 0}
datasets[1] = {'X_train': X_train,
'X_test' : X_test,
'y_train': y_train_log,
'y_test' : y_test_log,
'y_transform_inv' : True,
'comment' : 'X scaled, y log transformed',
'dataset' : 1}
datasets[2] = {'X_train': X_train2,
'X_test' : X_test2,
'y_train': y_train_log2,
'y_test' : y_test_log2,
'y_transform_inv' : True,
'comment' : 'X scaled and reduced by datetime and loc, y log transformed',
'dataset' : 2}
datasets[3] = {'X_train': X_train3,
'X_test' : X_test3,
'y_train': y_train_log3,
'y_test' : y_test_log3,
'y_transform_inv' : True,
'comment' : 'X scaled and weather only, y log transformed',
'dataset' : 3}
Results Brute Force Approach
Regression type | model | Predictions | R2 | MSE | MAE | MSE_true_scale | RMSE_true_scale | MAE_true_scale | MedAE_true_scale | Training time | dataset | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.1336006349223275, 18.024697070677405, 30.28... | 0.005595 | 9421.177329 | 21.753111 | 9421.177329 | 97.062749 | 21.753111 | 8.604523 | 1.146974 | 0 |
1 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [7.4856478304490395, 15.885277931047936, 16.32... | -0.000958 | 9483.270085 | 21.086756 | 9483.270085 | 97.382083 | 21.086756 | 9.987340 | 0.121314 | 0 |
2 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [5.467536231884058, 3.5923684210526314, 12.035... | 0.104634 | 8482.866665 | 20.081024 | 8482.866665 | 92.102479 | 20.081024 | 5.536407 | 0.119878 | 0 |
3 | SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.8784510179695628, 1.8335317307927035, 1.237... | -0.024547 | 9706.756450 | 16.566693 | 9706.756450 | 98.522873 | 16.566693 | 1.453210 | 8.089113 | 0 |
4 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [3.494984220308278, 16.81784269471455, 5.36831... | -0.024691 | 9708.113030 | 24.982574 | 9708.113030 | 98.529757 | 24.982574 | 7.167931 | 0.535538 | 0 |
5 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [10.11812391281128, 13.818091583251952, 16.458... | 0.001626 | 9458.788109 | 21.200896 | 9458.788109 | 97.256301 | 21.200896 | 10.118124 | 1.697901 | 0 |
6 | XGBoost Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [5.440169, 5.440169, 16.284695, 5.440169, 13.8... | -0.013411 | 9601.244885 | 18.742720 | 9601.244885 | 97.985942 | 18.742720 | 5.440169 | 40.454583 | 0 |
7 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.7784028274834758, 2.6704760888536603, 3.015... | -0.019210 | 2.026366 | 1.159967 | 9674.405235 | 98.358554 | 16.754297 | 2.111086 | 0.069126 | 1 |
8 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.8727122438654171, 2.5253682396537913, 2.331... | -0.008174 | 2.004424 | 1.153220 | 9680.445897 | 98.389257 | 16.727040 | 2.021494 | 0.126212 | 1 |
9 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [3.5235342935362075, 1.503042439312948, 4.8024... | -0.038682 | 2.065078 | 1.192341 | 9628.467445 | 98.124754 | 16.856583 | 3.523534 | 0.102233 | 1 |
10 | SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.1993260062463706, 1.549642211136983, 1.3196... | -0.032809 | 2.053403 | 1.113957 | 9708.228401 | 98.530343 | 16.619267 | 1.340590 | 170.962216 | 1 |
11 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.5753320287968324, 2.8780748919609684, 3.014... | 0.013024 | 1.962278 | 1.133680 | 9678.172354 | 98.377703 | 16.666083 | 2.210309 | 0.559681 | 1 |
12 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.9223503878136885, 2.1185169269986996, 2.299... | 0.002920 | 1.982367 | 1.147979 | 9687.698154 | 98.426105 | 16.708450 | 2.000706 | 1.713547 | 1 |
13 | XGBoost Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.327445, 2.278426, 2.4698129, 1.6745203, 2.0... | 0.014851 | 1.958646 | 1.128802 | 9685.235186 | 98.413592 | 16.654300 | 2.031970 | 38.020106 | 1 |
14 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.4548053379730796, 2.531372206626312, 2.6917... | -0.002020 | 1.992188 | 1.140820 | 9685.191742 | 98.413372 | 16.705558 | 2.030815 | 0.063612 | 2 |
15 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.5774838029414924, 2.6458780883509987, 2.199... | 0.000047 | 1.988080 | 1.142000 | 9688.058218 | 98.427934 | 16.694051 | 2.081045 | 0.091859 | 2 |
16 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.180799874529276, 1.5030424393129471, 4.8024... | -0.003088 | 1.994312 | 1.135828 | 9632.247893 | 98.144016 | 16.710017 | 1.645063 | 0.106734 | 2 |
17 | SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.221631784574699, 1.2814208845094655, 1.5542... | -0.030154 | 2.048123 | 1.116163 | 9710.010402 | 98.539385 | 16.614545 | 1.387431 | 109.919360 | 2 |
18 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.5959264758488882, 2.9487540752348043, 2.588... | 0.010478 | 1.967341 | 1.134372 | 9673.312710 | 98.353001 | 16.683293 | 2.100372 | 0.624302 | 2 |
19 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.8782471761387511, 2.0755232547428095, 2.265... | 0.003965 | 1.980289 | 1.145367 | 9689.354387 | 98.434518 | 16.699032 | 2.020898 | 1.575429 | 2 |
20 | XGBoost Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.4204493, 1.9515479, 2.4139762, 1.4615562, 2... | 0.025659 | 1.937157 | 1.114734 | 9682.458130 | 98.399482 | 16.609898 | 1.960499 | 32.396255 | 2 |
21 | Linear Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.9919665877611115, 2.565979785661881, 1.9427... | 0.013862 | 1.960613 | 1.144219 | 9686.180235 | 98.418394 | 16.696826 | 2.041645 | 0.063567 | 3 |
22 | Bayesian Ridge Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.9986848203967886, 1.9994354455602261, 1.998... | -0.001235 | 1.990627 | 1.151480 | 9689.514035 | 98.435329 | 16.717860 | 1.999147 | 0.086282 | 3 |
23 | Decision Tree Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [0.0, 1.959316397576199, 1.959316397576199, 1.... | 0.011838 | 1.964637 | 1.126846 | 9686.498863 | 98.420013 | 16.560336 | 1.959316 | 0.083100 | 3 |
24 | SVM Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.1942361947655091, 1.2247511587539073, 1.407... | -0.019878 | 2.027694 | 1.109437 | 9709.159186 | 98.535066 | 16.592333 | 1.364851 | 115.218320 | 3 |
25 | Random Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.6919650113973606, 2.7418197996767755, 2.291... | 0.025093 | 1.938283 | 1.125163 | 9681.165271 | 98.392913 | 16.615699 | 2.072220 | 0.455700 | 3 |
26 | Mondrian Forest Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.924433519444516, 1.7146863106008867, 1.9136... | 0.026622 | 1.935243 | 1.129337 | 9688.899688 | 98.432209 | 16.628424 | 1.988877 | 1.360185 | 3 |
27 | XGBoost Regression | GridSearchCV(cv=5, error_score='raise-deprecat... | [1.417011, 1.9887323, 2.1122983, 1.9887323, 2.... | 0.025338 | 1.937796 | 1.123950 | 9687.442751 | 98.424808 | 16.627276 | 1.868815 | 25.879341 | 3 |
That looks awful. Let’s visualize it:
That is a complete failure! Yes, a complete failure. However, we should have a look at the results from the official paper as well. They run 300 simulations using 30 runs of a 10-fold cross-validation. That is close to leave-one-out cross-validation and therefore almost a pure dataset description. Therefore, they achieved MAD scores around 13 - 18 and RMSE scores of 63 - 64. With both metrics we are speaking about an error in ha. Yes, RMSE is not exact and susceptible to outliers (high values) though the unit is correct. This is as useless as the results we obtained by throwing a brute-force approach at it. Well, in fact due to the single run 5 fold cross-validation, the results obtained brute forcing it may generalize even better than what is presented in the original paper. We have to remember that most wild fires are smaller than 1/100 ha. That renders both results absolute pointless!
Manual Approach
Apparently, this is a problem that still requires natural intelligence ;). Spoiler: TPOT doesn’t perform any better. Let’s see how we can solve this problem using a more hand-engineering approach.
First, we have to ask ourselves why we want to predict forest fire sizes. The dataset deals with wildfires in the Montesinho Natural Park which is located at the northern border of Portugal. Unlike most public/published opinions, I state that wildfires don’t necessarily destroy ecosystems. They are a part of it, especially in the Mediterranean climate. If we look at this map of the habitats of Quercus suber (cork oak), then we can assume that we find some in the Montesinho natural park. Why is this important? Well, this tree is a pyrophyte, meaning that this species is adapted to wildfires that are part of Mediterranean ecosystems. There are probably more species in this park that are pyrophytes as well. Some of these species are depending on their fire resistance to reproduce. It’s the survival of the fittest (best adapted).
Let’s find other reasons why wildfires are of any concern here. Well, there are 92 villages in this area (and an airport). Further, the authors claim that most fires are caused by people. To see potential disaster, we have to lay the map from the original publication
on top of openstreetmap or any other map that shows more than squares. Unfortunately, openlayers can’t be used to display geotiffs directly without creating map tiles from them. Therefore, we have to deal with a screenshot of the map on top of OSM:
For those who don’t work with geospatial data and maps. I georeferenced the map from the publication and projected it to UTM 29N (based on WGS 84) using QGIS. Therefore, the overlayed map looks deformed. However, this is much closer to reality (2D plane wrapped on a (deformed) 3D sphere) than what was/is presented in the paper. An illustrated map is provided by the park.
This shows us a few things. First, the grid cells do not resemble equal axis scaling and probably 1/3 to 1/2 of the cells are outside the national park. Unfortunately, no real coordinates are provided with the dataset. This is particullary bad because we can’t use topographic features and landcover data evaluate if this would correlate. Moreover, the original publications tells us that the data was collected from January 2000 to December 2003. Unfortunately, the data is not ordered sequentially. This is tricky since it contains climate proxies and usually we use 30 year time series (at least) for everything climate related. Otherwise we will detect extreme values that aren’t any. Consindering that many climate factors have periodicities of decades, centuries up to thousands of years, even 30 years are tricky. Furthermore, it might be useful to know how infrastructure and technologies, especially heating, types of electricity supply and road usage, have evolved over a period of 30 years there. Since it is a natural park, we can assume that land cover probably hasn’t changed so much. Other landcover-related aspects would be the estimation ofr available fuel for wildfires.
We can conclude, we are dealing with a temporal-geosptatial problem and not with a simple dataset. Moreover, we can conclude that the dataset is not suitable considering periodicity of climate variability. That is not all. We will face another challenge. The dataset is about predicting wildfire sizes not about if/when they occur. Therefore, we don’t have any references if any of the climate indicators is really correlated or not. Let’s see what we can make out of it anyhow.
First, we should see how many wildfires occured in each grid cell over the period January 2000 to December 2003. We’re going to start with a simple count of wildfires per grid cell.
Certainly, there is some spatial importance here. Let us see how it looks if we extend it into the time domain as well.
This confirms the assumption that we are dealing with a spatial-temporal problem. Let’s see if we can observe something similar with fire sizes.
Well, it seems like there could be some spatial reason behind fire sizes, for example kind and amount of fuel available (what can burn and how much is there). Let’s extend it over the time domain as well.
Basically, we end up with three solutions considering that we are missing so much (potentially) useful data:
- How relevant are wildfire sizes anyhow? Wouldn’t it be better predict small once with a higher precision and fight them so that they can’t grow? Therefore, it should be enough to predict all fires smaller or equal to small size (e.g. 20 ha).
- How relevant are small sized fires? Do they cause any harm (see above) or do they simply occurr naturally and are extinguished after a few hours anyhow (e.g. lack of fuel and dryness). In terms of managing fire fighting, wouldn’t it be better focus on the big ones first? Therefore, we could try to predict everything that is not smaller than 0 ha.
- Since there is some spatial variability and we lack of any real data on what is happening from a causation point of view, we may simply build a separate model for every cell in which fires occurred
A major challenge remains. We still don’t know anything about environmental conditions for all the cases without wildfires. Hence, no matter how good predictions are, they are practically pointless.
If we want to limit our model to certain fire sizes, then we have to decide what the maximum size of fires is supposed to be for training our model. A basic description of fire sizes yields:
count 517.000000
mean 12.847292
std 63.655818
min 0.000000
25% 0.000000
50% 0.520000
75% 6.570000
max 1090.840000
We learned from brute-forcing a model that a log(y+1) transformation showed some improvement. But what are useful limits in terms of fire sizes. In order to answer this question, we can look at quantile plots.
It could make sense to build models to predict the following ranges of fire sizes:
- 0 - 5 ha
- 0.1 - 5 ha
- 0.1 - 60 ha
- 0.1 - 1090 ha This probably causes problems with cross-validation due to a lack of great fires.
- 0.1 - 200 ha
- 1 - 60 ha
inb!: area = 0.00 ha means that a fire had an extend of less than 1/100 ha
So, what features are available and which should we select? Let’s start with a simple correlation matrix and see what is offered.
Anyhow, we have to scale it. Btw, scaling has a small impact on the correlation matrix.
X and Y
X and Y are coordinates of the local coordinate system in which a fire occurred. It might be interesting to know who fires advance through different sectors of the map and how this would be resembled in this dataset.
month and day
This is clear. Unfortunately, now year is provided. That would enable us to treat these fires as a time series. Furthermore, it would have enabled us to use proper weather records.
climate data
There is one problem with the following four features. We don’t know when these features were collected and what they resemble. They have been sampled on the day a fire occurred. However, it is unclear when. Let us consider an automatic weather station. Such a station provides continuous measurements. Therefore, the question remains when the values were extracted. In contrast, manual weather stations rely on 3 - 4 daily measurements at defined hours. These manual measurements are usually transformed into daily mean values.
We could consider these features as daily average for simplicity. However, we should remember that the weather (state of the atmosphere at a given time) during which a fire is detected is not really useful since the last xx days or weeks are more important. If it rained too much in the days before, then it doesn’t matter how hot it is. Try to imagine natural rainforest fires, even setting it on fire (slash and burn agriculture) takes a major effort. Considering that we want to predict fire sizes and not just occurrence, we still should keep in mind that we end up predicting correlations and no causations.
temp
temp is the temperature in degrees Celsius.
RH
RH is the relative humidity. For those of you without a climate/engineering background: relative humidity is the ratio between partial pressure of water vapor to the equlilibrium pressure of water vapor, which is temperature depended.
wind
This is the wind speed in km/h. We should keep in mind that wind is a velocity vector and therefore has not only a magnitude (speed) but a direction as well.
rain
Rainfall is given in mm/m2. It is reasonable that this is the total amount of rainfall on a day.
Derived forest fire features
There are four more features. They are part of the Canadian Forest Fire Danger Rating System.
FFMC
FFMC is the Fine Fuel Moisture Code. It is an estimation of the moisture content of the surface litter to estimate ignition and spread. It requires the last 16 hours of all four climatic parameters.
DMC
DMC is the Duff Moisture Code. It requires a time series of 12 days of rain, relative humidity and temperature and estimates soil moisture conditions (shallow layers).
DC
DC is the Drought Code and requires 52 days of rain and temperature data. It estimates deeper soil moisture content.
ISI
ISI is the Initial Spread Index and estimates spread speed.
It really doesn’t matter how many ranges of fire sizes and feature sets we choose, since we can simply automate it with a few lines of Python code. We’ll use the following datasets for building models.
Unlike shown in common mistakes in data science and machine learning, I’m neglecting cross-validation on purpose because the dataset would become too small - even for leave one out cross-validation. All of the 30 datasets are of different sizes and a model is trained using a 5 fold cross-validation. The final model evaluation is done on the full dataset (one of the 30) and therefore is purely descriptive.
Let’s run the same pipeline as above on 30 different datasets. Here are the results.
Regression type | R2 | MSE | MAE | MSE_true_scale | RMSE_true_scale | MAE_true_scale | MedAE_true_scale | Training time | dataset | AreaMin | AreaMax | Number of datapoints | Features | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Linear Regression | 0.036993 | 0.289357 | 0.450349 | 1.533182 | 1.238217 | 0.818386 | 0.452919 | 0.761870 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
1 | Bayesian Ridge Regression | 0.004069 | 0.299250 | 0.463300 | 1.580447 | 1.257158 | 0.833270 | 0.409910 | 0.069403 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
2 | Decision Tree Regression | 0.111554 | 0.266954 | 0.420469 | 1.405103 | 1.185371 | 0.769177 | 0.560928 | 0.074945 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
3 | SVM Regression | -0.190832 | 0.357812 | 0.375489 | 1.842199 | 1.357276 | 0.723560 | 0.105601 | 44.626076 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
4 | Random Forest Regression | 0.106003 | 0.268621 | 0.436830 | 1.451650 | 1.204845 | 0.795650 | 0.418875 | 0.517976 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
5 | Mondrian Forest Regression | 0.037346 | 0.289251 | 0.454477 | 1.545040 | 1.242996 | 0.821024 | 0.415502 | 1.720303 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
6 | XGBoost Regression | -0.047431 | 0.314724 | 0.518575 | 1.485507 | 1.218814 | 0.916338 | 0.634528 | 35.860502 | 0 | 0.0 | 5.0 | 366 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
7 | Linear Regression | 0.023898 | 0.293292 | 0.454742 | 1.552903 | 1.246155 | 0.823643 | 0.447149 | 0.070181 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
8 | Bayesian Ridge Regression | 0.016911 | 0.295391 | 0.455017 | 1.570750 | 1.253296 | 0.822787 | 0.431328 | 0.077154 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
9 | Decision Tree Regression | 0.108201 | 0.267961 | 0.422561 | 1.402095 | 1.184101 | 0.764242 | 0.483437 | 0.082053 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
10 | SVM Regression | -0.193070 | 0.358485 | 0.375906 | 1.843778 | 1.357858 | 0.724022 | 0.105182 | 35.784489 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
11 | Random Forest Regression | 0.104629 | 0.269034 | 0.435082 | 1.455961 | 1.206632 | 0.794622 | 0.423130 | 0.509909 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
12 | Mondrian Forest Regression | 0.022123 | 0.293825 | 0.457700 | 1.564101 | 1.250640 | 0.825705 | 0.425543 | 1.517942 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
13 | XGBoost Regression | -0.033673 | 0.310590 | 0.513916 | 1.478767 | 1.216046 | 0.908795 | 0.614595 | 31.098840 | 1 | 0.0 | 5.0 | 366 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
14 | Linear Regression | 0.023801 | 0.293321 | 0.453771 | 1.553045 | 1.246212 | 0.822133 | 0.450306 | 0.021066 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
15 | Bayesian Ridge Regression | 0.016922 | 0.295388 | 0.456077 | 1.568567 | 1.252424 | 0.824238 | 0.437291 | 0.074578 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
16 | Decision Tree Regression | 0.108201 | 0.267961 | 0.422561 | 1.402095 | 1.184101 | 0.764242 | 0.483437 | 0.072301 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
17 | SVM Regression | -0.193078 | 0.358487 | 0.375907 | 1.843783 | 1.357860 | 0.724023 | 0.105180 | 21.134526 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
18 | Random Forest Regression | 0.104667 | 0.269023 | 0.435336 | 1.455258 | 1.206341 | 0.795113 | 0.437903 | 0.531732 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
19 | Mondrian Forest Regression | 0.023784 | 0.293326 | 0.455813 | 1.562702 | 1.250081 | 0.823492 | 0.431456 | 1.570227 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
20 | XGBoost Regression | -0.043266 | 0.313473 | 0.516000 | 1.489683 | 1.220526 | 0.912180 | 0.623727 | 29.676452 | 2 | 0.0 | 5.0 | 366 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
21 | Linear Regression | 0.012660 | 0.296668 | 0.459248 | 1.569067 | 1.252624 | 0.828054 | 0.427256 | 0.055820 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
22 | Bayesian Ridge Regression | 0.011890 | 0.296900 | 0.456970 | 1.574863 | 1.254936 | 0.824941 | 0.417176 | 0.077074 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
23 | Decision Tree Regression | 0.065432 | 0.280812 | 0.436527 | 1.475400 | 1.214660 | 0.784589 | 0.402813 | 0.085352 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
24 | SVM Regression | -0.192637 | 0.358354 | 0.375813 | 1.843471 | 1.357745 | 0.723919 | 0.105273 | 1.244197 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
25 | Random Forest Regression | 0.073131 | 0.278499 | 0.445301 | 1.491058 | 1.221089 | 0.806817 | 0.424915 | 0.467741 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
26 | Mondrian Forest Regression | 0.056244 | 0.283573 | 0.448705 | 1.516553 | 1.231484 | 0.810572 | 0.404195 | 1.314378 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
27 | XGBoost Regression | -0.032702 | 0.310298 | 0.509423 | 1.498286 | 1.224045 | 0.901386 | 0.578400 | 24.760447 | 3 | 0.0 | 5.0 | 366 | [rain, RH, wind, temp] |
28 | Linear Regression | 0.012981 | 0.296572 | 0.458688 | 1.567481 | 1.251991 | 0.828100 | 0.456725 | 0.056864 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
29 | Bayesian Ridge Regression | 0.012313 | 0.296773 | 0.456124 | 1.573468 | 1.254379 | 0.824565 | 0.441740 | 0.069708 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
30 | Decision Tree Regression | 0.108875 | 0.267758 | 0.425162 | 1.376610 | 1.173290 | 0.768924 | 0.481253 | 0.070310 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
31 | SVM Regression | -0.193078 | 0.358487 | 0.375906 | 1.843783 | 1.357860 | 0.724022 | 0.105172 | 1.407259 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
32 | Random Forest Regression | 0.081713 | 0.275920 | 0.439929 | 1.479050 | 1.216162 | 0.801920 | 0.452215 | 0.483249 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
33 | Mondrian Forest Regression | 0.018068 | 0.295043 | 0.457961 | 1.565101 | 1.251040 | 0.826676 | 0.457056 | 1.084028 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
34 | XGBoost Regression | -0.063390 | 0.319519 | 0.522166 | 1.503408 | 1.226135 | 0.922153 | 0.634684 | 25.179769 | 4 | 0.0 | 5.0 | 366 | [DMC, FFMC, DC, ISI] |
35 | Linear Regression | 0.139219 | 0.151556 | 0.320090 | 1.387809 | 1.178053 | 0.945395 | 0.866292 | 0.056266 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
36 | Bayesian Ridge Regression | 0.003063 | 0.175529 | 0.346762 | 1.653474 | 1.285875 | 1.030543 | 0.903256 | 0.095309 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
37 | Decision Tree Regression | 0.272846 | 0.128029 | 0.290674 | 1.231867 | 1.109895 | 0.870912 | 0.720000 | 0.071607 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
38 | SVM Regression | 0.097159 | 0.158961 | 0.322195 | 1.452612 | 1.205244 | 0.955651 | 0.888855 | 0.408338 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
39 | Random Forest Regression | 0.236863 | 0.134364 | 0.302131 | 1.286515 | 1.134247 | 0.901620 | 0.877673 | 0.440848 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
40 | Mondrian Forest Regression | 0.864586 | 0.023842 | 0.114294 | 0.256671 | 0.506627 | 0.349479 | 0.219568 | 0.934590 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
41 | XGBoost Regression | 0.166479 | 0.146756 | 0.318951 | 1.447718 | 1.203212 | 0.952517 | 0.902110 | 23.067257 | 5 | 0.1 | 5.0 | 118 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
42 | Linear Regression | 0.099758 | 0.158504 | 0.328253 | 1.474158 | 1.214149 | 0.972062 | 0.869027 | 0.055260 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
43 | Bayesian Ridge Regression | 0.002672 | 0.175597 | 0.346800 | 1.654228 | 1.286168 | 1.030652 | 0.906024 | 0.102510 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
44 | Decision Tree Regression | 0.272846 | 0.128029 | 0.290674 | 1.231867 | 1.109895 | 0.870912 | 0.720000 | 0.067233 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
45 | SVM Regression | 0.070061 | 0.163732 | 0.324603 | 1.512059 | 1.229658 | 0.963866 | 0.846885 | 0.465582 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
46 | Random Forest Regression | 0.220092 | 0.137317 | 0.304992 | 1.309569 | 1.144364 | 0.908550 | 0.882586 | 0.436600 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
47 | Mondrian Forest Regression | 0.132683 | 0.152707 | 0.323161 | 1.444799 | 1.201998 | 0.960841 | 0.862463 | 1.195543 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
48 | XGBoost Regression | 0.120455 | 0.154860 | 0.327494 | 1.545141 | 1.243037 | 0.976201 | 0.873858 | 22.625245 | 6 | 0.1 | 5.0 | 118 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
49 | Linear Regression | 0.097521 | 0.158897 | 0.328588 | 1.476999 | 1.215319 | 0.972613 | 0.850691 | 0.053982 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
50 | Bayesian Ridge Regression | 0.039062 | 0.169190 | 0.337692 | 1.592360 | 1.261888 | 1.003815 | 0.889502 | 0.087339 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
51 | Decision Tree Regression | 0.272846 | 0.128029 | 0.290674 | 1.231867 | 1.109895 | 0.870912 | 0.720000 | 0.069422 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
52 | SVM Regression | 0.054713 | 0.166435 | 0.325338 | 1.542318 | 1.241901 | 0.965052 | 0.799573 | 0.303247 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
53 | Random Forest Regression | 0.203007 | 0.140325 | 0.307177 | 1.329921 | 1.153222 | 0.915026 | 0.867277 | 0.468100 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
54 | Mondrian Forest Regression | 0.079131 | 0.162135 | 0.329406 | 1.520738 | 1.233182 | 0.977887 | 0.886207 | 0.988298 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
55 | XGBoost Regression | 0.112866 | 0.156196 | 0.328001 | 1.555300 | 1.247117 | 0.978134 | 0.890496 | 22.145435 | 7 | 0.1 | 5.0 | 118 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
56 | Linear Regression | 0.039947 | 0.169034 | 0.339276 | 1.581008 | 1.257381 | 1.007331 | 0.935088 | 0.018930 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
57 | Bayesian Ridge Regression | 0.029330 | 0.170904 | 0.339598 | 1.607543 | 1.267889 | 1.009331 | 0.908884 | 0.066081 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
58 | Decision Tree Regression | 0.168795 | 0.146349 | 0.314138 | 1.347306 | 1.160735 | 0.926528 | 0.771988 | 0.062309 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
59 | SVM Regression | 0.033985 | 0.170084 | 0.336051 | 1.607412 | 1.267837 | 0.998987 | 0.911783 | 0.243271 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
60 | Random Forest Regression | 0.142468 | 0.150984 | 0.320372 | 1.420241 | 1.191739 | 0.951535 | 0.906098 | 0.411116 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
61 | Mondrian Forest Regression | 0.081758 | 0.161673 | 0.328507 | 1.521735 | 1.233586 | 0.975767 | 0.859526 | 0.913395 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
62 | XGBoost Regression | 0.083912 | 0.161294 | 0.328899 | 1.562243 | 1.249897 | 0.978451 | 0.881312 | 21.515475 | 8 | 0.1 | 5.0 | 118 | [rain, RH, wind, temp] |
63 | Linear Regression | 0.059140 | 0.165655 | 0.335228 | 1.543789 | 1.242493 | 0.995253 | 0.929790 | 0.056756 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
64 | Bayesian Ridge Regression | 0.000105 | 0.176049 | 0.347195 | 1.657844 | 1.287573 | 1.031789 | 0.905061 | 0.081088 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
65 | Decision Tree Regression | 0.216521 | 0.137945 | 0.304689 | 1.333948 | 1.154967 | 0.917257 | 0.750000 | 0.063691 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
66 | SVM Regression | 0.023198 | 0.171983 | 0.333757 | 1.552367 | 1.245940 | 0.989684 | 0.822640 | 0.233162 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
67 | Random Forest Regression | 0.173489 | 0.145522 | 0.318531 | 1.383005 | 1.176012 | 0.948364 | 0.870869 | 0.412509 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
68 | Mondrian Forest Regression | 0.131467 | 0.152921 | 0.324794 | 1.446059 | 1.202522 | 0.965480 | 0.881857 | 0.834365 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
69 | XGBoost Regression | 0.084505 | 0.161189 | 0.335594 | 1.609262 | 1.268567 | 0.999189 | 0.882940 | 21.365956 | 9 | 0.1 | 5.0 | 118 | [DMC, FFMC, DC, ISI] |
70 | Linear Regression | 0.038983 | 0.918217 | 0.786867 | 165.906788 | 12.880481 | 7.571652 | 3.935535 | 0.020302 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
71 | Bayesian Ridge Regression | 0.013378 | 0.942682 | 0.806777 | 167.810455 | 12.954167 | 7.719108 | 4.054009 | 0.076280 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
72 | Decision Tree Regression | 0.135117 | 0.826365 | 0.736421 | 150.486406 | 12.267290 | 7.078912 | 3.777397 | 0.083555 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
73 | SVM Regression | 0.034857 | 0.922160 | 0.790833 | 169.035179 | 13.001353 | 7.613949 | 3.711818 | 0.713855 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
74 | Random Forest Regression | 0.137207 | 0.824368 | 0.748327 | 155.887276 | 12.485483 | 7.293031 | 3.767959 | 0.551865 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
75 | Mondrian Forest Regression | 0.155953 | 0.806457 | 0.741187 | 156.292670 | 12.501707 | 7.256523 | 3.569706 | 1.294329 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
76 | XGBoost Regression | 0.057590 | 0.900439 | 0.780924 | 171.317775 | 13.088842 | 7.567099 | 3.508582 | 28.549180 | 10 | 0.1 | 60.0 | 249 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
77 | Linear Regression | 0.037404 | 0.919726 | 0.787378 | 166.298929 | 12.895694 | 7.572539 | 3.853146 | 0.065805 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
78 | Bayesian Ridge Regression | 0.016999 | 0.939223 | 0.805558 | 167.548174 | 12.944040 | 7.710936 | 4.136503 | 0.077227 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
79 | Decision Tree Regression | 0.135117 | 0.826365 | 0.736421 | 150.486406 | 12.267290 | 7.078912 | 3.777397 | 0.077101 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
80 | SVM Regression | 0.035354 | 0.921685 | 0.789499 | 169.280842 | 13.010797 | 7.604308 | 3.710722 | 0.746246 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
81 | Random Forest Regression | 0.137962 | 0.823647 | 0.748040 | 155.530397 | 12.471183 | 7.288500 | 3.718753 | 0.481519 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
82 | Mondrian Forest Regression | 0.035977 | 0.921090 | 0.796877 | 166.077284 | 12.887098 | 7.656943 | 3.972689 | 1.279077 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
83 | XGBoost Regression | 0.057632 | 0.900399 | 0.780494 | 171.309600 | 13.088529 | 7.561741 | 3.483446 | 58.766617 | 11 | 0.1 | 60.0 | 249 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
84 | Linear Regression | 0.021842 | 0.934595 | 0.800804 | 167.096813 | 12.926593 | 7.653198 | 3.910206 | 0.062698 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
85 | Bayesian Ridge Regression | 0.009262 | 0.946616 | 0.808435 | 168.152782 | 12.967374 | 7.729273 | 4.065888 | 0.093470 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
86 | Decision Tree Regression | 0.135117 | 0.826365 | 0.736421 | 150.486406 | 12.267290 | 7.078912 | 3.777397 | 0.080095 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
87 | SVM Regression | 0.025652 | 0.930955 | 0.796223 | 169.696498 | 13.026761 | 7.640109 | 3.651617 | 0.402736 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
88 | Random Forest Regression | 0.132676 | 0.828698 | 0.749689 | 155.120050 | 12.454720 | 7.289173 | 3.691036 | 0.520683 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
89 | Mondrian Forest Regression | 0.037320 | 0.919807 | 0.791296 | 166.048865 | 12.885995 | 7.601418 | 3.853596 | 1.271431 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
90 | XGBoost Regression | 0.055400 | 0.902532 | 0.781095 | 171.089789 | 13.080130 | 7.559520 | 3.430965 | 25.947325 | 12 | 0.1 | 60.0 | 249 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
91 | Linear Regression | 0.024016 | 0.932518 | 0.797407 | 167.165357 | 12.929244 | 7.630484 | 4.005220 | 0.023515 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
92 | Bayesian Ridge Regression | 0.016118 | 0.940065 | 0.806021 | 167.771282 | 12.952655 | 7.710099 | 3.964757 | 0.072249 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
93 | Decision Tree Regression | 0.158958 | 0.803586 | 0.737991 | 143.906316 | 11.996096 | 7.133695 | 3.614348 | 0.068854 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
94 | SVM Regression | 0.026131 | 0.930498 | 0.794450 | 169.076560 | 13.002944 | 7.615531 | 3.810518 | 0.462222 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
95 | Random Forest Regression | 0.117138 | 0.843544 | 0.759292 | 155.477990 | 12.469081 | 7.353594 | 3.734825 | 0.455932 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
96 | Mondrian Forest Regression | 0.102416 | 0.857610 | 0.755685 | 160.140005 | 12.654644 | 7.313286 | 3.581796 | 1.101429 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
97 | XGBoost Regression | 0.074904 | 0.883896 | 0.776558 | 166.831405 | 12.916323 | 7.516460 | 3.573941 | 22.841846 | 13 | 0.1 | 60.0 | 249 | [rain, RH, wind, temp] |
98 | Linear Regression | 0.001652 | 0.953886 | 0.811116 | 168.211376 | 12.969633 | 7.746065 | 4.040523 | 0.023907 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
99 | Bayesian Ridge Regression | 0.000270 | 0.955207 | 0.811027 | 168.652263 | 12.986619 | 7.747678 | 4.110734 | 0.075341 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
100 | Decision Tree Regression | 0.131897 | 0.829441 | 0.742974 | 144.030310 | 12.001263 | 7.183596 | 3.449529 | 0.065950 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
101 | SVM Regression | 0.014529 | 0.941583 | 0.802994 | 170.240071 | 13.047608 | 7.695820 | 3.739669 | 0.350014 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
102 | Random Forest Regression | 0.103614 | 0.856465 | 0.762878 | 157.832182 | 12.563128 | 7.394534 | 3.826037 | 0.487014 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
103 | Mondrian Forest Regression | 0.071780 | 0.886882 | 0.774400 | 162.871083 | 12.762096 | 7.478546 | 3.665028 | 0.964429 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
104 | XGBoost Regression | 0.035162 | 0.921868 | 0.789230 | 172.662148 | 13.140097 | 7.615553 | 3.319291 | 22.928115 | 14 | 0.1 | 60.0 | 249 | [DMC, FFMC, DC, ISI] |
105 | Linear Regression | 0.044338 | 1.242567 | 0.900983 | 824.888416 | 28.720871 | 12.978586 | 4.832605 | 0.021889 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
106 | Bayesian Ridge Regression | 0.000163 | 1.300004 | 0.921119 | 841.073925 | 29.001275 | 13.106282 | 4.922538 | 0.082161 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
107 | Decision Tree Regression | 0.137184 | 1.121847 | 0.850194 | 762.362259 | 27.610908 | 12.482921 | 4.197569 | 0.084071 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
108 | SVM Regression | 0.021480 | 1.272287 | 0.898976 | 851.993925 | 29.188935 | 12.946297 | 4.318662 | 0.872925 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
109 | Random Forest Regression | 0.124465 | 1.138384 | 0.865012 | 805.258252 | 28.377073 | 12.646089 | 4.483286 | 0.559857 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
110 | Mondrian Forest Regression | 0.011538 | 1.285213 | 0.915453 | 838.750369 | 28.961187 | 13.059733 | 4.986360 | 1.280887 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
111 | XGBoost Regression | 0.019974 | 1.274245 | 0.896553 | 859.701546 | 29.320668 | 12.943308 | 3.736910 | 64.223513 | 15 | 0.1 | 200.0 | 264 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
112 | Linear Regression | 0.043522 | 1.243628 | 0.902616 | 825.444084 | 28.730543 | 12.989507 | 4.813046 | 0.056413 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
113 | Bayesian Ridge Regression | 0.000223 | 1.299925 | 0.921086 | 841.073252 | 29.001263 | 13.106018 | 4.922277 | 0.102148 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
114 | Decision Tree Regression | 0.161389 | 1.090375 | 0.840542 | 751.009259 | 27.404548 | 12.412694 | 3.992360 | 0.080813 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
115 | SVM Regression | 0.023096 | 1.270186 | 0.895812 | 851.819396 | 29.185945 | 12.923075 | 4.365209 | 0.568480 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
116 | Random Forest Regression | 0.116603 | 1.148606 | 0.867246 | 812.139594 | 28.498063 | 12.667611 | 4.411177 | 0.531950 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
117 | Mondrian Forest Regression | 0.021111 | 1.272767 | 0.911819 | 839.012701 | 28.965716 | 13.040354 | 4.920706 | 1.334749 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
118 | XGBoost Regression | 0.046829 | 1.239328 | 0.890655 | 847.157950 | 29.105978 | 12.892733 | 3.984052 | 60.216328 | 16 | 0.1 | 200.0 | 264 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
119 | Linear Regression | 0.031286 | 1.259537 | 0.907801 | 828.681596 | 28.786830 | 13.007898 | 4.901932 | 0.055729 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
120 | Bayesian Ridge Regression | 0.000067 | 1.300128 | 0.921166 | 841.080879 | 29.001394 | 13.106649 | 4.924945 | 0.103690 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
121 | Decision Tree Regression | 0.151676 | 1.103004 | 0.842207 | 777.514167 | 27.883941 | 12.473065 | 3.977827 | 0.079662 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
122 | SVM Regression | 0.013107 | 1.283173 | 0.902748 | 850.769851 | 29.167959 | 12.963377 | 4.268491 | 0.447160 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
123 | Random Forest Regression | 0.111093 | 1.155771 | 0.869270 | 814.045763 | 28.531487 | 12.686967 | 4.505582 | 0.508102 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
124 | Mondrian Forest Regression | 0.024304 | 1.268616 | 0.903907 | 839.085606 | 28.966974 | 12.963267 | 4.821784 | 1.225762 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
125 | XGBoost Regression | 0.046829 | 1.239328 | 0.890655 | 847.157950 | 29.105978 | 12.892733 | 3.984052 | 56.803942 | 17 | 0.1 | 200.0 | 264 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
126 | Linear Regression | 0.011852 | 1.284805 | 0.910193 | 839.139052 | 28.967897 | 13.010916 | 4.762860 | 0.055294 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
127 | Bayesian Ridge Regression | 0.000161 | 1.300007 | 0.921091 | 841.079113 | 29.001364 | 13.106063 | 4.926444 | 0.087328 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
128 | Decision Tree Regression | 0.119361 | 1.145021 | 0.849090 | 774.269446 | 27.825698 | 12.328808 | 4.581949 | 0.065508 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
129 | SVM Regression | 0.001350 | 1.298461 | 0.905539 | 857.412409 | 29.281605 | 12.978542 | 4.192401 | 0.343880 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
130 | Random Forest Regression | 0.070474 | 1.208585 | 0.881578 | 821.662022 | 28.664648 | 12.778480 | 4.825694 | 0.459338 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
131 | Mondrian Forest Regression | 0.012799 | 1.283574 | 0.911338 | 840.099293 | 28.984466 | 13.026499 | 4.994137 | 1.096994 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
132 | XGBoost Regression | 0.027357 | 1.264646 | 0.891559 | 853.089085 | 29.207689 | 12.883130 | 4.099006 | 23.101554 | 18 | 0.1 | 200.0 | 264 | [rain, RH, wind, temp] |
133 | Linear Regression | 0.012399 | 1.284094 | 0.921430 | 833.310761 | 28.867122 | 13.112492 | 4.968500 | 0.060112 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
134 | Bayesian Ridge Regression | 0.000022 | 1.300187 | 0.921200 | 841.082446 | 29.001421 | 13.106914 | 4.923827 | 0.077662 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
135 | Decision Tree Regression | 0.152876 | 1.101444 | 0.847533 | 786.723071 | 28.048584 | 12.669606 | 4.723775 | 0.064137 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
136 | SVM Regression | 0.002315 | 1.297206 | 0.910843 | 853.829141 | 29.220355 | 13.031309 | 4.335234 | 0.346508 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
137 | Random Forest Regression | 0.105530 | 1.163004 | 0.873718 | 813.987023 | 28.530458 | 12.729784 | 4.415391 | 0.482594 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
138 | Mondrian Forest Regression | 0.015954 | 1.279473 | 0.912224 | 839.918826 | 28.981353 | 13.035619 | 4.822362 | 1.080387 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
139 | XGBoost Regression | 0.065722 | 1.214763 | 0.896477 | 824.403761 | 28.712432 | 12.930306 | 4.432251 | 22.962852 | 19 | 0.1 | 200.0 | 264 | [DMC, FFMC, DC, ISI] |
140 | Linear Regression | 0.051044 | 1.485552 | 0.963989 | 7712.190686 | 87.819079 | 22.089165 | 5.476491 | 0.026376 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
141 | Bayesian Ridge Regression | 0.000342 | 1.564925 | 0.984081 | 7777.125498 | 88.188012 | 22.197196 | 5.519435 | 0.097937 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
142 | Decision Tree Regression | 0.165307 | 1.306679 | 0.898707 | 7182.513286 | 84.749710 | 21.393283 | 4.806355 | 0.076900 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
143 | SVM Regression | 0.012985 | 1.545132 | 0.953488 | 7811.222456 | 88.381120 | 21.968736 | 4.538897 | 0.833406 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
144 | Random Forest Regression | 0.124887 | 1.369954 | 0.935135 | 7521.501793 | 86.726592 | 21.721360 | 4.892678 | 0.563898 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
145 | Mondrian Forest Regression | 0.013411 | 1.544466 | 0.978761 | 7767.524357 | 88.133560 | 22.151461 | 5.570809 | 1.367518 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
146 | XGBoost Regression | 0.039838 | 1.503096 | 0.949860 | 7765.098152 | 88.119794 | 21.929010 | 4.212619 | 65.346306 | 20 | 0.1 | 1091.0 | 269 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
147 | Linear Regression | 0.048155 | 1.490076 | 0.966624 | 7717.330135 | 87.848336 | 22.111801 | 5.484811 | 0.055587 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
148 | Bayesian Ridge Regression | 0.000990 | 1.563911 | 0.983860 | 7776.816013 | 88.186258 | 22.195344 | 5.510330 | 0.094431 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
149 | Decision Tree Regression | 0.165307 | 1.306679 | 0.898707 | 7182.513286 | 84.749710 | 21.393283 | 4.806355 | 0.079125 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
150 | SVM Regression | 0.012654 | 1.545651 | 0.953145 | 7812.394846 | 88.387753 | 21.966828 | 4.489747 | 0.714508 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
151 | Random Forest Regression | 0.120388 | 1.376997 | 0.935461 | 7525.504295 | 86.749665 | 21.734064 | 4.949520 | 0.525605 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
152 | Mondrian Forest Regression | 0.019458 | 1.535000 | 0.974267 | 7773.399838 | 88.166886 | 22.125003 | 5.393115 | 1.313755 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
153 | XGBoost Regression | 0.039019 | 1.504377 | 0.952030 | 7766.352683 | 88.126912 | 21.945815 | 4.311559 | 28.235538 | 21 | 0.1 | 1091.0 | 269 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
154 | Linear Regression | 0.031490 | 1.516164 | 0.974128 | 7736.068523 | 87.954923 | 22.140409 | 5.414844 | 0.056916 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
155 | Bayesian Ridge Regression | 0.000039 | 1.565400 | 0.984192 | 7777.282976 | 88.188905 | 22.198135 | 5.515813 | 0.083799 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
156 | Decision Tree Regression | 0.165307 | 1.306679 | 0.898707 | 7182.513286 | 84.749710 | 21.393283 | 4.806355 | 0.070675 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
157 | SVM Regression | 0.000691 | 1.564378 | 0.961385 | 7813.896869 | 88.396249 | 22.016218 | 4.522732 | 0.533139 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
158 | Random Forest Regression | 0.122692 | 1.373391 | 0.934229 | 7515.934280 | 86.694488 | 21.700638 | 4.877561 | 0.489604 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
159 | Mondrian Forest Regression | 0.016126 | 1.540216 | 0.969646 | 7779.171871 | 88.199614 | 22.070958 | 5.425796 | 1.387820 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
160 | XGBoost Regression | 0.039019 | 1.504377 | 0.952030 | 7766.352683 | 88.126912 | 21.945815 | 4.311559 | 40.966699 | 22 | 0.1 | 1091.0 | 269 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
161 | Linear Regression | 0.008769 | 1.551733 | 0.975074 | 7765.936066 | 88.124549 | 22.117167 | 5.374795 | 0.057177 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
162 | Bayesian Ridge Regression | 0.000038 | 1.565400 | 0.984184 | 7777.286964 | 88.188928 | 22.198061 | 5.513711 | 0.080387 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
163 | Decision Tree Regression | 0.151629 | 1.328092 | 0.906292 | 6929.703510 | 83.244841 | 21.227738 | 4.973370 | 0.067078 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
164 | SVM Regression | -0.000646 | 1.566471 | 0.962752 | 7810.100418 | 88.374773 | 22.018133 | 4.613485 | 0.352632 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
165 | Random Forest Regression | 0.074119 | 1.449430 | 0.947284 | 7575.109207 | 87.035103 | 21.811447 | 5.162353 | 0.451577 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
166 | Mondrian Forest Regression | 0.010818 | 1.548525 | 0.975618 | 7771.692877 | 88.157205 | 22.123888 | 5.499732 | 1.124015 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
167 | XGBoost Regression | 0.031162 | 1.516678 | 0.948868 | 7755.787421 | 88.066949 | 21.913847 | 4.309474 | 23.000765 | 23 | 0.1 | 1091.0 | 269 | [rain, RH, wind, temp] |
168 | Linear Regression | 0.015539 | 1.541135 | 0.985738 | 7757.339935 | 88.075762 | 22.219276 | 5.540720 | 0.022336 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
169 | Bayesian Ridge Regression | 0.000021 | 1.565427 | 0.984209 | 7777.293476 | 88.188965 | 22.198278 | 5.516868 | 0.063743 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
170 | Decision Tree Regression | 0.145796 | 1.337222 | 0.913246 | 7137.493008 | 84.483685 | 21.157816 | 4.637093 | 0.069215 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
171 | SVM Regression | -0.009976 | 1.581077 | 0.969144 | 7822.871862 | 88.447000 | 22.081596 | 4.474194 | 0.341351 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
172 | Random Forest Regression | 0.122416 | 1.373824 | 0.936931 | 7541.570391 | 86.842215 | 21.762646 | 4.853987 | 0.464362 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
173 | Mondrian Forest Regression | 0.012300 | 1.546205 | 0.976399 | 7775.138183 | 88.176744 | 22.132888 | 5.409031 | 1.196045 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
174 | XGBoost Regression | 0.031187 | 1.516638 | 0.957523 | 7792.579739 | 88.275590 | 22.001707 | 4.451702 | 23.249308 | 24 | 0.1 | 1091.0 | 269 | [DMC, FFMC, DC, ISI] |
175 | Linear Regression | 0.021804 | 0.771713 | 0.720307 | 171.345080 | 13.089885 | 7.929731 | 4.267013 | 0.062712 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
176 | Bayesian Ridge Regression | 0.000169 | 0.788782 | 0.732664 | 173.302827 | 13.164453 | 8.028876 | 4.471667 | 0.090315 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
177 | Decision Tree Regression | 0.167907 | 0.656451 | 0.656553 | 147.687076 | 12.152657 | 7.321948 | 3.780174 | 0.076738 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
178 | SVM Regression | 0.019605 | 0.773448 | 0.715703 | 173.717876 | 13.180208 | 7.894367 | 4.210846 | 1.112014 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
179 | Random Forest Regression | 0.131270 | 0.685354 | 0.677682 | 160.496742 | 12.668731 | 7.562352 | 3.976736 | 0.495857 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
180 | Mondrian Forest Regression | 0.017552 | 0.775068 | 0.726228 | 171.810005 | 13.107632 | 7.975933 | 4.432691 | 1.599018 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
181 | XGBoost Regression | 0.070682 | 0.733153 | 0.700589 | 172.077914 | 13.117847 | 7.780220 | 3.694690 | 26.602059 | 25 | 1.0 | 60.0 | 223 | [ISI, DMC, Y, temp, rain, month, RH, wind, DC,... |
182 | Linear Regression | 0.017936 | 0.774765 | 0.720468 | 172.142247 | 13.120299 | 7.930705 | 4.280088 | 0.024019 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
183 | Bayesian Ridge Regression | 0.000201 | 0.788756 | 0.732648 | 173.300810 | 13.164377 | 8.028749 | 4.472481 | 0.081795 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
184 | Decision Tree Regression | 0.167907 | 0.656451 | 0.656553 | 147.687076 | 12.152657 | 7.321948 | 3.780174 | 0.082038 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
185 | SVM Regression | 0.017699 | 0.774952 | 0.716670 | 174.084981 | 13.194127 | 7.901726 | 4.227504 | 0.809767 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
186 | Random Forest Regression | 0.128519 | 0.687525 | 0.679060 | 160.766011 | 12.679354 | 7.574277 | 3.920889 | 0.533916 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
187 | Mondrian Forest Regression | 0.020994 | 0.772352 | 0.722178 | 171.381813 | 13.091288 | 7.947184 | 4.138528 | 1.271442 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
188 | XGBoost Regression | 0.040299 | 0.757122 | 0.708168 | 178.058586 | 13.343859 | 7.846690 | 3.543665 | 38.707061 | 26 | 1.0 | 60.0 | 223 | [ISI, DMC, temp, rain, month, RH, wind, DC, FF... |
189 | Linear Regression | 0.007100 | 0.783314 | 0.730838 | 172.137925 | 13.120134 | 8.006119 | 4.462997 | 0.053520 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
190 | Bayesian Ridge Regression | 0.001037 | 0.788097 | 0.732282 | 173.254108 | 13.162603 | 8.025819 | 4.468429 | 0.074020 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
191 | Decision Tree Regression | 0.167907 | 0.656451 | 0.656553 | 147.687076 | 12.152657 | 7.321948 | 3.780174 | 0.070872 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
192 | SVM Regression | 0.012399 | 0.779133 | 0.718900 | 174.505437 | 13.210051 | 7.916737 | 4.210258 | 0.361420 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
193 | Random Forest Regression | 0.129218 | 0.686973 | 0.677464 | 160.407001 | 12.665189 | 7.561658 | 3.863223 | 0.506004 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
194 | Mondrian Forest Regression | 0.029235 | 0.765851 | 0.717979 | 171.171652 | 13.083258 | 7.903659 | 4.383942 | 1.332839 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
195 | XGBoost Regression | 0.069011 | 0.734471 | 0.699951 | 172.188285 | 13.122053 | 7.771634 | 3.791768 | 24.975517 | 27 | 1.0 | 60.0 | 223 | [ISI, DMC, FFMC, rain, RH, wind, DC, temp] |
196 | Linear Regression | 0.013003 | 0.778656 | 0.723346 | 172.518512 | 13.134630 | 7.947875 | 4.450129 | 0.055028 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
197 | Bayesian Ridge Regression | 0.005386 | 0.784666 | 0.730207 | 173.024626 | 13.153883 | 8.009011 | 4.355719 | 0.070859 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
198 | Decision Tree Regression | 0.182543 | 0.644904 | 0.652837 | 139.803260 | 11.823843 | 7.235515 | 4.074115 | 0.072359 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
199 | SVM Regression | 0.008398 | 0.782290 | 0.720317 | 177.380679 | 13.318434 | 7.921627 | 3.971668 | 0.286989 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
200 | Random Forest Regression | 0.109671 | 0.702394 | 0.684503 | 161.953485 | 12.726095 | 7.619626 | 3.852910 | 0.469818 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
201 | Mondrian Forest Regression | 0.020169 | 0.773004 | 0.722417 | 172.058095 | 13.117092 | 7.943566 | 4.267944 | 1.157222 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
202 | XGBoost Regression | 0.059137 | 0.742261 | 0.705472 | 172.874733 | 13.148184 | 7.813816 | 3.852409 | 41.664515 | 28 | 1.0 | 60.0 | 223 | [rain, RH, wind, temp] |
203 | Linear Regression | 0.006492 | 0.783794 | 0.729918 | 173.032446 | 13.154180 | 8.006054 | 4.454165 | 0.060180 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
204 | Bayesian Ridge Regression | 0.000116 | 0.788824 | 0.732684 | 173.306340 | 13.164587 | 8.029037 | 4.471603 | 0.074389 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
205 | Decision Tree Regression | 0.145181 | 0.674380 | 0.668911 | 148.701898 | 12.194339 | 7.405292 | 3.811976 | 0.070708 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
206 | SVM Regression | -0.000425 | 0.789250 | 0.725599 | 177.709172 | 13.330760 | 7.975535 | 4.116303 | 0.312591 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
207 | Random Forest Regression | 0.110977 | 0.701363 | 0.691704 | 161.729107 | 12.717276 | 7.687318 | 3.872826 | 0.471841 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
208 | Mondrian Forest Regression | 0.045665 | 0.752889 | 0.710152 | 169.560380 | 13.021535 | 7.836615 | 4.310977 | 0.980947 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
209 | XGBoost Regression | 0.027890 | 0.766912 | 0.712429 | 178.740916 | 13.369402 | 7.878459 | 3.581193 | 22.170258 | 29 | 1.0 | 60.0 | 223 | [DMC, FFMC, DC, ISI] |
These results are almost as useless as the one above. However, it seems like we can generate submodels that work reasonable well considering the dataset. Let’s visualize the mess:
Let’s try a cell-wise approach as well. The basic requirements for this approach that there is a minimum of 5 samples per grid cell with no area greater than 500 ha. Here is what we can get out of it.
Let’s avoid commenting these results ;)
Conclusion
We can conclude that this dataset is of absolutely no use for prediction of forest fire sizes. The RMSE and MAD scores of the original publication indicate models that are as useless as these results here (results on log transformed sizes don’t count in the real wordl!). It would be interesting to get a full 30 year coverage including climate data to see how it turns out. We could also use BEHAVE (Fire Behavior Prediction and Fuel Monitoring System) of SAGA-GIS. Or we could treat it as a proper spatio-temporal problem and apply machine learning on it. I covered machine learning algorithms for geospatial applications last year.
I wrote my master thesis on rockfall hazard ratings and risk assessment. I’m almost tempted to write a proper paper on wildfire risk assessment to see what is out there and how that could be improved. I looke around quite a lot and most contained very low quality statistical models (if any at all).