It is time to have a look at another dataset as part of my exploring less-known datasets series.
This time, we’ll look at the Forest Fires dataset by Cortex and Morais (2007).


Contents


*Some words up front. The first parts cover a classical brute-force data science approach. Interesting and valuable are only the results obtained. The last part covers a manual geospatial approach including a lot of domain knowledge.

Dataset Exploration and Preparation

import time
sys_start = time.time()
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
%matplotlib inline

# not gentlemen-like but it helps to keep the notebook somewhat clean ;)
import warnings
warnings.simplefilter('ignore')



from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from skgarden import MondrianForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer, mean_absolute_error, median_absolute_error
import xgboost

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf 
config = tf.ConfigProto()
#config.gpu_options.per_process_gpu_memory_fraction = 0.90
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))


from keras.models import Sequential
from keras.layers import (Input, Dense, BatchNormalization)
from keras import optimizers
from keras import callbacks


from keras import backend as K
K.tensorflow_backend._get_available_gpus()


input_data = pd.read_csv("./data/forestfires.csv")
display(input_data.sample(10))
input_data['month'] = pd.Categorical(input_data['month']).codes
input_data['day'] = pd.Categorical(input_data['day']).codes
display(input_data.describe())
X Y month day FFMC DMC DC ISI temp RH wind rain area
99 3 4 aug sun 91.4 142.4 601.4 10.6 19.8 39 5.4 0.0 0.00
462 1 4 sep sun 91.0 276.3 825.1 7.1 14.5 76 7.6 0.0 3.71
8 8 6 sep tue 91.0 129.5 692.6 7.0 13.1 63 5.4 0.0 0.00
24 7 4 aug sat 93.5 139.4 594.2 20.3 23.7 32 5.8 0.0 0.00
360 6 5 sep fri 92.5 122.0 789.7 10.2 18.4 42 2.2 0.0 1.09
127 3 5 sep fri 93.5 149.3 728.6 8.1 17.2 43 3.1 0.0 0.00
449 7 4 aug sun 91.6 181.3 613.0 7.6 19.3 61 4.9 0.0 0.00
48 4 4 mar mon 87.2 23.9 64.7 4.1 11.8 35 1.8 0.0 0.00
219 6 5 mar mon 90.1 39.7 86.6 6.2 15.2 27 3.1 0.0 31.86
511 8 6 aug sun 81.6 56.7 665.6 1.9 27.8 35 2.7 0.0 0.00
X Y month day FFMC DMC DC ISI temp RH wind rain area
count 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000 517.000000
mean 4.669246 4.299807 5.758221 2.736944 90.644681 110.872340 547.940039 9.021663 18.889168 44.288201 4.017602 0.021663 12.847292
std 2.313778 1.229900 4.373275 1.925061 5.520111 64.046482 248.066192 4.559477 5.806625 16.317469 1.791653 0.295959 63.655818
min 1.000000 2.000000 0.000000 0.000000 18.700000 1.100000 7.900000 0.000000 2.200000 15.000000 0.400000 0.000000 0.000000
25% 3.000000 4.000000 1.000000 1.000000 90.200000 68.600000 437.700000 6.500000 15.500000 33.000000 2.700000 0.000000 0.000000
50% 4.000000 4.000000 6.000000 3.000000 91.600000 108.300000 664.200000 8.400000 19.300000 42.000000 4.000000 0.000000 0.520000
75% 7.000000 5.000000 11.000000 4.000000 92.900000 142.400000 713.900000 10.800000 22.800000 53.000000 4.900000 0.000000 6.570000
max 9.000000 9.000000 11.000000 6.000000 96.200000 291.300000 860.600000 56.100000 33.300000 100.000000 9.400000 6.400000 1090.840000

Let’s have a look at box plots of each feature:

That looks tricky! Let’s start out by creating four different datasets using previous experiences shown in the forestfires.names file.

# split data into X and y
y = input_data['area'].copy(deep=True)
X = input_data.copy(deep=True)
X.drop(['area'], inplace=True, axis=1)
scaler = MaxAbsScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    shuffle=True,
                                                    test_size=0.25,
                                                    random_state=42)
y_train_log = np.log(y_train+1)
y_test_log = np.log(y_test+1)
X2 = input_data.copy(deep=True)
X2.drop(['area','X','Y','month','day'], inplace=True, axis=1)
scaler2 = MaxAbsScaler()
X2 = scaler2.fit_transform(X2)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2,
                                                    y,
                                                    shuffle=True,
                                                    test_size=0.25,
                                                    random_state=42)
y_train_log2 = np.log(y_train2+1)
y_test_log2 = np.log(y_test2+1)

X3 = input_data.copy(deep=True)
X3.drop(['area','X','Y','month','day','FFMC','DMC','DC','ISI'], inplace=True, axis=1)
scaler3 = MaxAbsScaler()
X3 = scaler3.fit_transform(X3)
X_train3, X_test3, y_train3, y_test3 = train_test_split(X3,
                                                    y,
                                                    shuffle=True,
                                                    test_size=0.25,
                                                    random_state=42)
y_train_log3 = np.log(y_train3+1)
y_test_log3 = np.log(y_test3+1)

datasets = {}
datasets[0] = {'X_train': X_train,
               'X_test' : X_test,
               'y_train': y_train,
               'y_test' : y_test,
               'y_transform_inv' : False,
               'comment' : 'X scaled',
               'dataset' : 0}
datasets[1] = {'X_train': X_train,
               'X_test' : X_test,
               'y_train': y_train_log,
               'y_test' : y_test_log,
               'y_transform_inv' : True,
               'comment' : 'X scaled, y log transformed',
               'dataset' : 1}
datasets[2] = {'X_train': X_train2,
               'X_test' : X_test2,
               'y_train': y_train_log2,
               'y_test' : y_test_log2,
               'y_transform_inv' : True,
               'comment' : 'X scaled and reduced by datetime and loc, y log transformed',
               'dataset' : 2}
datasets[3] = {'X_train': X_train3,
               'X_test' : X_test3,
               'y_train': y_train_log3,
               'y_test' : y_test_log3,
               'y_transform_inv' : True,
               'comment' : 'X scaled and weather only, y log transformed',
               'dataset' : 3}                                                


Results Brute Force Approach

Regression type model Predictions R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale Training time dataset
0 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.1336006349223275, 18.024697070677405, 30.28... 0.005595 9421.177329 21.753111 9421.177329 97.062749 21.753111 8.604523 1.146974 0
1 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [7.4856478304490395, 15.885277931047936, 16.32... -0.000958 9483.270085 21.086756 9483.270085 97.382083 21.086756 9.987340 0.121314 0
2 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [5.467536231884058, 3.5923684210526314, 12.035... 0.104634 8482.866665 20.081024 8482.866665 92.102479 20.081024 5.536407 0.119878 0
3 SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.8784510179695628, 1.8335317307927035, 1.237... -0.024547 9706.756450 16.566693 9706.756450 98.522873 16.566693 1.453210 8.089113 0
4 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [3.494984220308278, 16.81784269471455, 5.36831... -0.024691 9708.113030 24.982574 9708.113030 98.529757 24.982574 7.167931 0.535538 0
5 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [10.11812391281128, 13.818091583251952, 16.458... 0.001626 9458.788109 21.200896 9458.788109 97.256301 21.200896 10.118124 1.697901 0
6 XGBoost Regression GridSearchCV(cv=5, error_score='raise-deprecat... [5.440169, 5.440169, 16.284695, 5.440169, 13.8... -0.013411 9601.244885 18.742720 9601.244885 97.985942 18.742720 5.440169 40.454583 0
7 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.7784028274834758, 2.6704760888536603, 3.015... -0.019210 2.026366 1.159967 9674.405235 98.358554 16.754297 2.111086 0.069126 1
8 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.8727122438654171, 2.5253682396537913, 2.331... -0.008174 2.004424 1.153220 9680.445897 98.389257 16.727040 2.021494 0.126212 1
9 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [3.5235342935362075, 1.503042439312948, 4.8024... -0.038682 2.065078 1.192341 9628.467445 98.124754 16.856583 3.523534 0.102233 1
10 SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.1993260062463706, 1.549642211136983, 1.3196... -0.032809 2.053403 1.113957 9708.228401 98.530343 16.619267 1.340590 170.962216 1
11 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.5753320287968324, 2.8780748919609684, 3.014... 0.013024 1.962278 1.133680 9678.172354 98.377703 16.666083 2.210309 0.559681 1
12 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.9223503878136885, 2.1185169269986996, 2.299... 0.002920 1.982367 1.147979 9687.698154 98.426105 16.708450 2.000706 1.713547 1
13 XGBoost Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.327445, 2.278426, 2.4698129, 1.6745203, 2.0... 0.014851 1.958646 1.128802 9685.235186 98.413592 16.654300 2.031970 38.020106 1
14 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.4548053379730796, 2.531372206626312, 2.6917... -0.002020 1.992188 1.140820 9685.191742 98.413372 16.705558 2.030815 0.063612 2
15 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.5774838029414924, 2.6458780883509987, 2.199... 0.000047 1.988080 1.142000 9688.058218 98.427934 16.694051 2.081045 0.091859 2
16 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.180799874529276, 1.5030424393129471, 4.8024... -0.003088 1.994312 1.135828 9632.247893 98.144016 16.710017 1.645063 0.106734 2
17 SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.221631784574699, 1.2814208845094655, 1.5542... -0.030154 2.048123 1.116163 9710.010402 98.539385 16.614545 1.387431 109.919360 2
18 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.5959264758488882, 2.9487540752348043, 2.588... 0.010478 1.967341 1.134372 9673.312710 98.353001 16.683293 2.100372 0.624302 2
19 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.8782471761387511, 2.0755232547428095, 2.265... 0.003965 1.980289 1.145367 9689.354387 98.434518 16.699032 2.020898 1.575429 2
20 XGBoost Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.4204493, 1.9515479, 2.4139762, 1.4615562, 2... 0.025659 1.937157 1.114734 9682.458130 98.399482 16.609898 1.960499 32.396255 2
21 Linear Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.9919665877611115, 2.565979785661881, 1.9427... 0.013862 1.960613 1.144219 9686.180235 98.418394 16.696826 2.041645 0.063567 3
22 Bayesian Ridge Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.9986848203967886, 1.9994354455602261, 1.998... -0.001235 1.990627 1.151480 9689.514035 98.435329 16.717860 1.999147 0.086282 3
23 Decision Tree Regression GridSearchCV(cv=5, error_score='raise-deprecat... [0.0, 1.959316397576199, 1.959316397576199, 1.... 0.011838 1.964637 1.126846 9686.498863 98.420013 16.560336 1.959316 0.083100 3
24 SVM Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.1942361947655091, 1.2247511587539073, 1.407... -0.019878 2.027694 1.109437 9709.159186 98.535066 16.592333 1.364851 115.218320 3
25 Random Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.6919650113973606, 2.7418197996767755, 2.291... 0.025093 1.938283 1.125163 9681.165271 98.392913 16.615699 2.072220 0.455700 3
26 Mondrian Forest Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.924433519444516, 1.7146863106008867, 1.9136... 0.026622 1.935243 1.129337 9688.899688 98.432209 16.628424 1.988877 1.360185 3
27 XGBoost Regression GridSearchCV(cv=5, error_score='raise-deprecat... [1.417011, 1.9887323, 2.1122983, 1.9887323, 2.... 0.025338 1.937796 1.123950 9687.442751 98.424808 16.627276 1.868815 25.879341 3

That looks awful. Let’s visualize it:

That is a complete failure! Yes, a complete failure. However, we should have a look at the results from the official paper as well. They run 300 simulations using 30 runs of a 10-fold cross-validation. That is close to leave-one-out cross-validation and therefore almost a pure dataset description. Therefore, they achieved MAD scores around 13 - 18 and RMSE scores of 63 - 64. With both metrics we are speaking about an error in ha. Yes, RMSE is not exact and susceptible to outliers (high values) though the unit is correct. This is as useless as the results we obtained by throwing a brute-force approach at it. Well, in fact due to the single run 5 fold cross-validation, the results obtained brute forcing it may generalize even better than what is presented in the original paper. We have to remember that most wild fires are smaller than 1/100 ha. That renders both results absolute pointless!

Manual Approach

Apparently, this is a problem that still requires natural intelligence ;). Spoiler: TPOT doesn’t perform any better. Let’s see how we can solve this problem using a more hand-engineering approach.

First, we have to ask ourselves why we want to predict forest fire sizes. The dataset deals with wildfires in the Montesinho Natural Park which is located at the northern border of Portugal. Unlike most public/published opinions, I state that wildfires don’t necessarily destroy ecosystems. They are a part of it, especially in the Mediterranean climate. If we look at this map of the habitats of Quercus suber (cork oak), then we can assume that we find some in the Montesinho natural park. Why is this important? Well, this tree is a pyrophyte, meaning that this species is adapted to wildfires that are part of Mediterranean ecosystems. There are probably more species in this park that are pyrophytes as well. Some of these species are depending on their fire resistance to reproduce. It’s the survival of the fittest (best adapted).

Let’s find other reasons why wildfires are of any concern here. Well, there are 92 villages in this area (and an airport). Further, the authors claim that most fires are caused by people. To see potential disaster, we have to lay the map from the original publication

on top of openstreetmap or any other map that shows more than squares. Unfortunately, openlayers can’t be used to display geotiffs directly without creating map tiles from them. Therefore, we have to deal with a screenshot of the map on top of OSM:

For those who don’t work with geospatial data and maps. I georeferenced the map from the publication and projected it to UTM 29N (based on WGS 84) using QGIS. Therefore, the overlayed map looks deformed. However, this is much closer to reality (2D plane wrapped on a (deformed) 3D sphere) than what was/is presented in the paper. An illustrated map is provided by the park.

This shows us a few things. First, the grid cells do not resemble equal axis scaling and probably 1/3 to 1/2 of the cells are outside the national park. Unfortunately, no real coordinates are provided with the dataset. This is particullary bad because we can’t use topographic features and landcover data evaluate if this would correlate. Moreover, the original publications tells us that the data was collected from January 2000 to December 2003. Unfortunately, the data is not ordered sequentially. This is tricky since it contains climate proxies and usually we use 30 year time series (at least) for everything climate related. Otherwise we will detect extreme values that aren’t any. Consindering that many climate factors have periodicities of decades, centuries up to thousands of years, even 30 years are tricky. Furthermore, it might be useful to know how infrastructure and technologies, especially heating, types of electricity supply and road usage, have evolved over a period of 30 years there. Since it is a natural park, we can assume that land cover probably hasn’t changed so much. Other landcover-related aspects would be the estimation ofr available fuel for wildfires.

We can conclude, we are dealing with a temporal-geosptatial problem and not with a simple dataset. Moreover, we can conclude that the dataset is not suitable considering periodicity of climate variability. That is not all. We will face another challenge. The dataset is about predicting wildfire sizes not about if/when they occur. Therefore, we don’t have any references if any of the climate indicators is really correlated or not. Let’s see what we can make out of it anyhow.

First, we should see how many wildfires occured in each grid cell over the period January 2000 to December 2003. We’re going to start with a simple count of wildfires per grid cell.

Certainly, there is some spatial importance here. Let us see how it looks if we extend it into the time domain as well.

This confirms the assumption that we are dealing with a spatial-temporal problem. Let’s see if we can observe something similar with fire sizes.

Well, it seems like there could be some spatial reason behind fire sizes, for example kind and amount of fuel available (what can burn and how much is there). Let’s extend it over the time domain as well.

Basically, we end up with three solutions considering that we are missing so much (potentially) useful data:

  1. How relevant are wildfire sizes anyhow? Wouldn’t it be better predict small once with a higher precision and fight them so that they can’t grow? Therefore, it should be enough to predict all fires smaller or equal to small size (e.g. 20 ha).
  2. How relevant are small sized fires? Do they cause any harm (see above) or do they simply occurr naturally and are extinguished after a few hours anyhow (e.g. lack of fuel and dryness). In terms of managing fire fighting, wouldn’t it be better focus on the big ones first? Therefore, we could try to predict everything that is not smaller than 0 ha.
  3. Since there is some spatial variability and we lack of any real data on what is happening from a causation point of view, we may simply build a separate model for every cell in which fires occurred

A major challenge remains. We still don’t know anything about environmental conditions for all the cases without wildfires. Hence, no matter how good predictions are, they are practically pointless.

If we want to limit our model to certain fire sizes, then we have to decide what the maximum size of fires is supposed to be for training our model. A basic description of fire sizes yields:

count      517.000000
mean        12.847292
std         63.655818
min          0.000000
25%          0.000000
50%          0.520000
75%          6.570000
max       1090.840000

We learned from brute-forcing a model that a log(y+1) transformation showed some improvement. But what are useful limits in terms of fire sizes. In order to answer this question, we can look at quantile plots.

It could make sense to build models to predict the following ranges of fire sizes:

  • 0 - 5 ha
  • 0.1 - 5 ha
  • 0.1 - 60 ha
  • 0.1 - 1090 ha This probably causes problems with cross-validation due to a lack of great fires.
  • 0.1 - 200 ha
  • 1 - 60 ha

inb!: area = 0.00 ha means that a fire had an extend of less than 1/100 ha

So, what features are available and which should we select? Let’s start with a simple correlation matrix and see what is offered.

Anyhow, we have to scale it. Btw, scaling has a small impact on the correlation matrix.

X and Y

X and Y are coordinates of the local coordinate system in which a fire occurred. It might be interesting to know who fires advance through different sectors of the map and how this would be resembled in this dataset.

month and day

This is clear. Unfortunately, now year is provided. That would enable us to treat these fires as a time series. Furthermore, it would have enabled us to use proper weather records.

climate data

There is one problem with the following four features. We don’t know when these features were collected and what they resemble. They have been sampled on the day a fire occurred. However, it is unclear when. Let us consider an automatic weather station. Such a station provides continuous measurements. Therefore, the question remains when the values were extracted. In contrast, manual weather stations rely on 3 - 4 daily measurements at defined hours. These manual measurements are usually transformed into daily mean values.

We could consider these features as daily average for simplicity. However, we should remember that the weather (state of the atmosphere at a given time) during which a fire is detected is not really useful since the last xx days or weeks are more important. If it rained too much in the days before, then it doesn’t matter how hot it is. Try to imagine natural rainforest fires, even setting it on fire (slash and burn agriculture) takes a major effort. Considering that we want to predict fire sizes and not just occurrence, we still should keep in mind that we end up predicting correlations and no causations.

temp

temp is the temperature in degrees Celsius.

RH

RH is the relative humidity. For those of you without a climate/engineering background: relative humidity is the ratio between partial pressure of water vapor to the equlilibrium pressure of water vapor, which is temperature depended.

wind

This is the wind speed in km/h. We should keep in mind that wind is a velocity vector and therefore has not only a magnitude (speed) but a direction as well.

rain

Rainfall is given in mm/m2. It is reasonable that this is the total amount of rainfall on a day.

Derived forest fire features

There are four more features. They are part of the Canadian Forest Fire Danger Rating System.

FFMC

FFMC is the Fine Fuel Moisture Code. It is an estimation of the moisture content of the surface litter to estimate ignition and spread. It requires the last 16 hours of all four climatic parameters.

DMC

DMC is the Duff Moisture Code. It requires a time series of 12 days of rain, relative humidity and temperature and estimates soil moisture conditions (shallow layers).

DC

DC is the Drought Code and requires 52 days of rain and temperature data. It estimates deeper soil moisture content.

ISI

ISI is the Initial Spread Index and estimates spread speed.

It really doesn’t matter how many ranges of fire sizes and feature sets we choose, since we can simply automate it with a few lines of Python code. We’ll use the following datasets for building models.

Unlike shown in common mistakes in data science and machine learning, I’m neglecting cross-validation on purpose because the dataset would become too small - even for leave one out cross-validation. All of the 30 datasets are of different sizes and a model is trained using a 5 fold cross-validation. The final model evaluation is done on the full dataset (one of the 30) and therefore is purely descriptive.

Let’s run the same pipeline as above on 30 different datasets. Here are the results.

Regression type R2 MSE MAE MSE_true_scale RMSE_true_scale MAE_true_scale MedAE_true_scale Training time dataset AreaMin AreaMax Number of datapoints Features
0 Linear Regression 0.036993 0.289357 0.450349 1.533182 1.238217 0.818386 0.452919 0.761870 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
1 Bayesian Ridge Regression 0.004069 0.299250 0.463300 1.580447 1.257158 0.833270 0.409910 0.069403 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
2 Decision Tree Regression 0.111554 0.266954 0.420469 1.405103 1.185371 0.769177 0.560928 0.074945 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
3 SVM Regression -0.190832 0.357812 0.375489 1.842199 1.357276 0.723560 0.105601 44.626076 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
4 Random Forest Regression 0.106003 0.268621 0.436830 1.451650 1.204845 0.795650 0.418875 0.517976 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
5 Mondrian Forest Regression 0.037346 0.289251 0.454477 1.545040 1.242996 0.821024 0.415502 1.720303 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
6 XGBoost Regression -0.047431 0.314724 0.518575 1.485507 1.218814 0.916338 0.634528 35.860502 0 0.0 5.0 366 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
7 Linear Regression 0.023898 0.293292 0.454742 1.552903 1.246155 0.823643 0.447149 0.070181 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
8 Bayesian Ridge Regression 0.016911 0.295391 0.455017 1.570750 1.253296 0.822787 0.431328 0.077154 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
9 Decision Tree Regression 0.108201 0.267961 0.422561 1.402095 1.184101 0.764242 0.483437 0.082053 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
10 SVM Regression -0.193070 0.358485 0.375906 1.843778 1.357858 0.724022 0.105182 35.784489 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
11 Random Forest Regression 0.104629 0.269034 0.435082 1.455961 1.206632 0.794622 0.423130 0.509909 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
12 Mondrian Forest Regression 0.022123 0.293825 0.457700 1.564101 1.250640 0.825705 0.425543 1.517942 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
13 XGBoost Regression -0.033673 0.310590 0.513916 1.478767 1.216046 0.908795 0.614595 31.098840 1 0.0 5.0 366 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
14 Linear Regression 0.023801 0.293321 0.453771 1.553045 1.246212 0.822133 0.450306 0.021066 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
15 Bayesian Ridge Regression 0.016922 0.295388 0.456077 1.568567 1.252424 0.824238 0.437291 0.074578 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
16 Decision Tree Regression 0.108201 0.267961 0.422561 1.402095 1.184101 0.764242 0.483437 0.072301 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
17 SVM Regression -0.193078 0.358487 0.375907 1.843783 1.357860 0.724023 0.105180 21.134526 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
18 Random Forest Regression 0.104667 0.269023 0.435336 1.455258 1.206341 0.795113 0.437903 0.531732 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
19 Mondrian Forest Regression 0.023784 0.293326 0.455813 1.562702 1.250081 0.823492 0.431456 1.570227 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
20 XGBoost Regression -0.043266 0.313473 0.516000 1.489683 1.220526 0.912180 0.623727 29.676452 2 0.0 5.0 366 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
21 Linear Regression 0.012660 0.296668 0.459248 1.569067 1.252624 0.828054 0.427256 0.055820 3 0.0 5.0 366 [rain, RH, wind, temp]
22 Bayesian Ridge Regression 0.011890 0.296900 0.456970 1.574863 1.254936 0.824941 0.417176 0.077074 3 0.0 5.0 366 [rain, RH, wind, temp]
23 Decision Tree Regression 0.065432 0.280812 0.436527 1.475400 1.214660 0.784589 0.402813 0.085352 3 0.0 5.0 366 [rain, RH, wind, temp]
24 SVM Regression -0.192637 0.358354 0.375813 1.843471 1.357745 0.723919 0.105273 1.244197 3 0.0 5.0 366 [rain, RH, wind, temp]
25 Random Forest Regression 0.073131 0.278499 0.445301 1.491058 1.221089 0.806817 0.424915 0.467741 3 0.0 5.0 366 [rain, RH, wind, temp]
26 Mondrian Forest Regression 0.056244 0.283573 0.448705 1.516553 1.231484 0.810572 0.404195 1.314378 3 0.0 5.0 366 [rain, RH, wind, temp]
27 XGBoost Regression -0.032702 0.310298 0.509423 1.498286 1.224045 0.901386 0.578400 24.760447 3 0.0 5.0 366 [rain, RH, wind, temp]
28 Linear Regression 0.012981 0.296572 0.458688 1.567481 1.251991 0.828100 0.456725 0.056864 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
29 Bayesian Ridge Regression 0.012313 0.296773 0.456124 1.573468 1.254379 0.824565 0.441740 0.069708 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
30 Decision Tree Regression 0.108875 0.267758 0.425162 1.376610 1.173290 0.768924 0.481253 0.070310 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
31 SVM Regression -0.193078 0.358487 0.375906 1.843783 1.357860 0.724022 0.105172 1.407259 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
32 Random Forest Regression 0.081713 0.275920 0.439929 1.479050 1.216162 0.801920 0.452215 0.483249 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
33 Mondrian Forest Regression 0.018068 0.295043 0.457961 1.565101 1.251040 0.826676 0.457056 1.084028 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
34 XGBoost Regression -0.063390 0.319519 0.522166 1.503408 1.226135 0.922153 0.634684 25.179769 4 0.0 5.0 366 [DMC, FFMC, DC, ISI]
35 Linear Regression 0.139219 0.151556 0.320090 1.387809 1.178053 0.945395 0.866292 0.056266 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
36 Bayesian Ridge Regression 0.003063 0.175529 0.346762 1.653474 1.285875 1.030543 0.903256 0.095309 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
37 Decision Tree Regression 0.272846 0.128029 0.290674 1.231867 1.109895 0.870912 0.720000 0.071607 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
38 SVM Regression 0.097159 0.158961 0.322195 1.452612 1.205244 0.955651 0.888855 0.408338 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
39 Random Forest Regression 0.236863 0.134364 0.302131 1.286515 1.134247 0.901620 0.877673 0.440848 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
40 Mondrian Forest Regression 0.864586 0.023842 0.114294 0.256671 0.506627 0.349479 0.219568 0.934590 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
41 XGBoost Regression 0.166479 0.146756 0.318951 1.447718 1.203212 0.952517 0.902110 23.067257 5 0.1 5.0 118 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
42 Linear Regression 0.099758 0.158504 0.328253 1.474158 1.214149 0.972062 0.869027 0.055260 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
43 Bayesian Ridge Regression 0.002672 0.175597 0.346800 1.654228 1.286168 1.030652 0.906024 0.102510 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
44 Decision Tree Regression 0.272846 0.128029 0.290674 1.231867 1.109895 0.870912 0.720000 0.067233 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
45 SVM Regression 0.070061 0.163732 0.324603 1.512059 1.229658 0.963866 0.846885 0.465582 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
46 Random Forest Regression 0.220092 0.137317 0.304992 1.309569 1.144364 0.908550 0.882586 0.436600 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
47 Mondrian Forest Regression 0.132683 0.152707 0.323161 1.444799 1.201998 0.960841 0.862463 1.195543 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
48 XGBoost Regression 0.120455 0.154860 0.327494 1.545141 1.243037 0.976201 0.873858 22.625245 6 0.1 5.0 118 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
49 Linear Regression 0.097521 0.158897 0.328588 1.476999 1.215319 0.972613 0.850691 0.053982 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
50 Bayesian Ridge Regression 0.039062 0.169190 0.337692 1.592360 1.261888 1.003815 0.889502 0.087339 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
51 Decision Tree Regression 0.272846 0.128029 0.290674 1.231867 1.109895 0.870912 0.720000 0.069422 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
52 SVM Regression 0.054713 0.166435 0.325338 1.542318 1.241901 0.965052 0.799573 0.303247 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
53 Random Forest Regression 0.203007 0.140325 0.307177 1.329921 1.153222 0.915026 0.867277 0.468100 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
54 Mondrian Forest Regression 0.079131 0.162135 0.329406 1.520738 1.233182 0.977887 0.886207 0.988298 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
55 XGBoost Regression 0.112866 0.156196 0.328001 1.555300 1.247117 0.978134 0.890496 22.145435 7 0.1 5.0 118 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
56 Linear Regression 0.039947 0.169034 0.339276 1.581008 1.257381 1.007331 0.935088 0.018930 8 0.1 5.0 118 [rain, RH, wind, temp]
57 Bayesian Ridge Regression 0.029330 0.170904 0.339598 1.607543 1.267889 1.009331 0.908884 0.066081 8 0.1 5.0 118 [rain, RH, wind, temp]
58 Decision Tree Regression 0.168795 0.146349 0.314138 1.347306 1.160735 0.926528 0.771988 0.062309 8 0.1 5.0 118 [rain, RH, wind, temp]
59 SVM Regression 0.033985 0.170084 0.336051 1.607412 1.267837 0.998987 0.911783 0.243271 8 0.1 5.0 118 [rain, RH, wind, temp]
60 Random Forest Regression 0.142468 0.150984 0.320372 1.420241 1.191739 0.951535 0.906098 0.411116 8 0.1 5.0 118 [rain, RH, wind, temp]
61 Mondrian Forest Regression 0.081758 0.161673 0.328507 1.521735 1.233586 0.975767 0.859526 0.913395 8 0.1 5.0 118 [rain, RH, wind, temp]
62 XGBoost Regression 0.083912 0.161294 0.328899 1.562243 1.249897 0.978451 0.881312 21.515475 8 0.1 5.0 118 [rain, RH, wind, temp]
63 Linear Regression 0.059140 0.165655 0.335228 1.543789 1.242493 0.995253 0.929790 0.056756 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
64 Bayesian Ridge Regression 0.000105 0.176049 0.347195 1.657844 1.287573 1.031789 0.905061 0.081088 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
65 Decision Tree Regression 0.216521 0.137945 0.304689 1.333948 1.154967 0.917257 0.750000 0.063691 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
66 SVM Regression 0.023198 0.171983 0.333757 1.552367 1.245940 0.989684 0.822640 0.233162 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
67 Random Forest Regression 0.173489 0.145522 0.318531 1.383005 1.176012 0.948364 0.870869 0.412509 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
68 Mondrian Forest Regression 0.131467 0.152921 0.324794 1.446059 1.202522 0.965480 0.881857 0.834365 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
69 XGBoost Regression 0.084505 0.161189 0.335594 1.609262 1.268567 0.999189 0.882940 21.365956 9 0.1 5.0 118 [DMC, FFMC, DC, ISI]
70 Linear Regression 0.038983 0.918217 0.786867 165.906788 12.880481 7.571652 3.935535 0.020302 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
71 Bayesian Ridge Regression 0.013378 0.942682 0.806777 167.810455 12.954167 7.719108 4.054009 0.076280 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
72 Decision Tree Regression 0.135117 0.826365 0.736421 150.486406 12.267290 7.078912 3.777397 0.083555 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
73 SVM Regression 0.034857 0.922160 0.790833 169.035179 13.001353 7.613949 3.711818 0.713855 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
74 Random Forest Regression 0.137207 0.824368 0.748327 155.887276 12.485483 7.293031 3.767959 0.551865 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
75 Mondrian Forest Regression 0.155953 0.806457 0.741187 156.292670 12.501707 7.256523 3.569706 1.294329 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
76 XGBoost Regression 0.057590 0.900439 0.780924 171.317775 13.088842 7.567099 3.508582 28.549180 10 0.1 60.0 249 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
77 Linear Regression 0.037404 0.919726 0.787378 166.298929 12.895694 7.572539 3.853146 0.065805 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
78 Bayesian Ridge Regression 0.016999 0.939223 0.805558 167.548174 12.944040 7.710936 4.136503 0.077227 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
79 Decision Tree Regression 0.135117 0.826365 0.736421 150.486406 12.267290 7.078912 3.777397 0.077101 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
80 SVM Regression 0.035354 0.921685 0.789499 169.280842 13.010797 7.604308 3.710722 0.746246 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
81 Random Forest Regression 0.137962 0.823647 0.748040 155.530397 12.471183 7.288500 3.718753 0.481519 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
82 Mondrian Forest Regression 0.035977 0.921090 0.796877 166.077284 12.887098 7.656943 3.972689 1.279077 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
83 XGBoost Regression 0.057632 0.900399 0.780494 171.309600 13.088529 7.561741 3.483446 58.766617 11 0.1 60.0 249 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
84 Linear Regression 0.021842 0.934595 0.800804 167.096813 12.926593 7.653198 3.910206 0.062698 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
85 Bayesian Ridge Regression 0.009262 0.946616 0.808435 168.152782 12.967374 7.729273 4.065888 0.093470 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
86 Decision Tree Regression 0.135117 0.826365 0.736421 150.486406 12.267290 7.078912 3.777397 0.080095 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
87 SVM Regression 0.025652 0.930955 0.796223 169.696498 13.026761 7.640109 3.651617 0.402736 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
88 Random Forest Regression 0.132676 0.828698 0.749689 155.120050 12.454720 7.289173 3.691036 0.520683 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
89 Mondrian Forest Regression 0.037320 0.919807 0.791296 166.048865 12.885995 7.601418 3.853596 1.271431 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
90 XGBoost Regression 0.055400 0.902532 0.781095 171.089789 13.080130 7.559520 3.430965 25.947325 12 0.1 60.0 249 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
91 Linear Regression 0.024016 0.932518 0.797407 167.165357 12.929244 7.630484 4.005220 0.023515 13 0.1 60.0 249 [rain, RH, wind, temp]
92 Bayesian Ridge Regression 0.016118 0.940065 0.806021 167.771282 12.952655 7.710099 3.964757 0.072249 13 0.1 60.0 249 [rain, RH, wind, temp]
93 Decision Tree Regression 0.158958 0.803586 0.737991 143.906316 11.996096 7.133695 3.614348 0.068854 13 0.1 60.0 249 [rain, RH, wind, temp]
94 SVM Regression 0.026131 0.930498 0.794450 169.076560 13.002944 7.615531 3.810518 0.462222 13 0.1 60.0 249 [rain, RH, wind, temp]
95 Random Forest Regression 0.117138 0.843544 0.759292 155.477990 12.469081 7.353594 3.734825 0.455932 13 0.1 60.0 249 [rain, RH, wind, temp]
96 Mondrian Forest Regression 0.102416 0.857610 0.755685 160.140005 12.654644 7.313286 3.581796 1.101429 13 0.1 60.0 249 [rain, RH, wind, temp]
97 XGBoost Regression 0.074904 0.883896 0.776558 166.831405 12.916323 7.516460 3.573941 22.841846 13 0.1 60.0 249 [rain, RH, wind, temp]
98 Linear Regression 0.001652 0.953886 0.811116 168.211376 12.969633 7.746065 4.040523 0.023907 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
99 Bayesian Ridge Regression 0.000270 0.955207 0.811027 168.652263 12.986619 7.747678 4.110734 0.075341 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
100 Decision Tree Regression 0.131897 0.829441 0.742974 144.030310 12.001263 7.183596 3.449529 0.065950 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
101 SVM Regression 0.014529 0.941583 0.802994 170.240071 13.047608 7.695820 3.739669 0.350014 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
102 Random Forest Regression 0.103614 0.856465 0.762878 157.832182 12.563128 7.394534 3.826037 0.487014 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
103 Mondrian Forest Regression 0.071780 0.886882 0.774400 162.871083 12.762096 7.478546 3.665028 0.964429 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
104 XGBoost Regression 0.035162 0.921868 0.789230 172.662148 13.140097 7.615553 3.319291 22.928115 14 0.1 60.0 249 [DMC, FFMC, DC, ISI]
105 Linear Regression 0.044338 1.242567 0.900983 824.888416 28.720871 12.978586 4.832605 0.021889 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
106 Bayesian Ridge Regression 0.000163 1.300004 0.921119 841.073925 29.001275 13.106282 4.922538 0.082161 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
107 Decision Tree Regression 0.137184 1.121847 0.850194 762.362259 27.610908 12.482921 4.197569 0.084071 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
108 SVM Regression 0.021480 1.272287 0.898976 851.993925 29.188935 12.946297 4.318662 0.872925 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
109 Random Forest Regression 0.124465 1.138384 0.865012 805.258252 28.377073 12.646089 4.483286 0.559857 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
110 Mondrian Forest Regression 0.011538 1.285213 0.915453 838.750369 28.961187 13.059733 4.986360 1.280887 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
111 XGBoost Regression 0.019974 1.274245 0.896553 859.701546 29.320668 12.943308 3.736910 64.223513 15 0.1 200.0 264 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
112 Linear Regression 0.043522 1.243628 0.902616 825.444084 28.730543 12.989507 4.813046 0.056413 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
113 Bayesian Ridge Regression 0.000223 1.299925 0.921086 841.073252 29.001263 13.106018 4.922277 0.102148 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
114 Decision Tree Regression 0.161389 1.090375 0.840542 751.009259 27.404548 12.412694 3.992360 0.080813 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
115 SVM Regression 0.023096 1.270186 0.895812 851.819396 29.185945 12.923075 4.365209 0.568480 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
116 Random Forest Regression 0.116603 1.148606 0.867246 812.139594 28.498063 12.667611 4.411177 0.531950 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
117 Mondrian Forest Regression 0.021111 1.272767 0.911819 839.012701 28.965716 13.040354 4.920706 1.334749 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
118 XGBoost Regression 0.046829 1.239328 0.890655 847.157950 29.105978 12.892733 3.984052 60.216328 16 0.1 200.0 264 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
119 Linear Regression 0.031286 1.259537 0.907801 828.681596 28.786830 13.007898 4.901932 0.055729 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
120 Bayesian Ridge Regression 0.000067 1.300128 0.921166 841.080879 29.001394 13.106649 4.924945 0.103690 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
121 Decision Tree Regression 0.151676 1.103004 0.842207 777.514167 27.883941 12.473065 3.977827 0.079662 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
122 SVM Regression 0.013107 1.283173 0.902748 850.769851 29.167959 12.963377 4.268491 0.447160 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
123 Random Forest Regression 0.111093 1.155771 0.869270 814.045763 28.531487 12.686967 4.505582 0.508102 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
124 Mondrian Forest Regression 0.024304 1.268616 0.903907 839.085606 28.966974 12.963267 4.821784 1.225762 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
125 XGBoost Regression 0.046829 1.239328 0.890655 847.157950 29.105978 12.892733 3.984052 56.803942 17 0.1 200.0 264 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
126 Linear Regression 0.011852 1.284805 0.910193 839.139052 28.967897 13.010916 4.762860 0.055294 18 0.1 200.0 264 [rain, RH, wind, temp]
127 Bayesian Ridge Regression 0.000161 1.300007 0.921091 841.079113 29.001364 13.106063 4.926444 0.087328 18 0.1 200.0 264 [rain, RH, wind, temp]
128 Decision Tree Regression 0.119361 1.145021 0.849090 774.269446 27.825698 12.328808 4.581949 0.065508 18 0.1 200.0 264 [rain, RH, wind, temp]
129 SVM Regression 0.001350 1.298461 0.905539 857.412409 29.281605 12.978542 4.192401 0.343880 18 0.1 200.0 264 [rain, RH, wind, temp]
130 Random Forest Regression 0.070474 1.208585 0.881578 821.662022 28.664648 12.778480 4.825694 0.459338 18 0.1 200.0 264 [rain, RH, wind, temp]
131 Mondrian Forest Regression 0.012799 1.283574 0.911338 840.099293 28.984466 13.026499 4.994137 1.096994 18 0.1 200.0 264 [rain, RH, wind, temp]
132 XGBoost Regression 0.027357 1.264646 0.891559 853.089085 29.207689 12.883130 4.099006 23.101554 18 0.1 200.0 264 [rain, RH, wind, temp]
133 Linear Regression 0.012399 1.284094 0.921430 833.310761 28.867122 13.112492 4.968500 0.060112 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
134 Bayesian Ridge Regression 0.000022 1.300187 0.921200 841.082446 29.001421 13.106914 4.923827 0.077662 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
135 Decision Tree Regression 0.152876 1.101444 0.847533 786.723071 28.048584 12.669606 4.723775 0.064137 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
136 SVM Regression 0.002315 1.297206 0.910843 853.829141 29.220355 13.031309 4.335234 0.346508 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
137 Random Forest Regression 0.105530 1.163004 0.873718 813.987023 28.530458 12.729784 4.415391 0.482594 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
138 Mondrian Forest Regression 0.015954 1.279473 0.912224 839.918826 28.981353 13.035619 4.822362 1.080387 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
139 XGBoost Regression 0.065722 1.214763 0.896477 824.403761 28.712432 12.930306 4.432251 22.962852 19 0.1 200.0 264 [DMC, FFMC, DC, ISI]
140 Linear Regression 0.051044 1.485552 0.963989 7712.190686 87.819079 22.089165 5.476491 0.026376 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
141 Bayesian Ridge Regression 0.000342 1.564925 0.984081 7777.125498 88.188012 22.197196 5.519435 0.097937 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
142 Decision Tree Regression 0.165307 1.306679 0.898707 7182.513286 84.749710 21.393283 4.806355 0.076900 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
143 SVM Regression 0.012985 1.545132 0.953488 7811.222456 88.381120 21.968736 4.538897 0.833406 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
144 Random Forest Regression 0.124887 1.369954 0.935135 7521.501793 86.726592 21.721360 4.892678 0.563898 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
145 Mondrian Forest Regression 0.013411 1.544466 0.978761 7767.524357 88.133560 22.151461 5.570809 1.367518 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
146 XGBoost Regression 0.039838 1.503096 0.949860 7765.098152 88.119794 21.929010 4.212619 65.346306 20 0.1 1091.0 269 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
147 Linear Regression 0.048155 1.490076 0.966624 7717.330135 87.848336 22.111801 5.484811 0.055587 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
148 Bayesian Ridge Regression 0.000990 1.563911 0.983860 7776.816013 88.186258 22.195344 5.510330 0.094431 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
149 Decision Tree Regression 0.165307 1.306679 0.898707 7182.513286 84.749710 21.393283 4.806355 0.079125 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
150 SVM Regression 0.012654 1.545651 0.953145 7812.394846 88.387753 21.966828 4.489747 0.714508 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
151 Random Forest Regression 0.120388 1.376997 0.935461 7525.504295 86.749665 21.734064 4.949520 0.525605 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
152 Mondrian Forest Regression 0.019458 1.535000 0.974267 7773.399838 88.166886 22.125003 5.393115 1.313755 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
153 XGBoost Regression 0.039019 1.504377 0.952030 7766.352683 88.126912 21.945815 4.311559 28.235538 21 0.1 1091.0 269 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
154 Linear Regression 0.031490 1.516164 0.974128 7736.068523 87.954923 22.140409 5.414844 0.056916 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
155 Bayesian Ridge Regression 0.000039 1.565400 0.984192 7777.282976 88.188905 22.198135 5.515813 0.083799 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
156 Decision Tree Regression 0.165307 1.306679 0.898707 7182.513286 84.749710 21.393283 4.806355 0.070675 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
157 SVM Regression 0.000691 1.564378 0.961385 7813.896869 88.396249 22.016218 4.522732 0.533139 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
158 Random Forest Regression 0.122692 1.373391 0.934229 7515.934280 86.694488 21.700638 4.877561 0.489604 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
159 Mondrian Forest Regression 0.016126 1.540216 0.969646 7779.171871 88.199614 22.070958 5.425796 1.387820 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
160 XGBoost Regression 0.039019 1.504377 0.952030 7766.352683 88.126912 21.945815 4.311559 40.966699 22 0.1 1091.0 269 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
161 Linear Regression 0.008769 1.551733 0.975074 7765.936066 88.124549 22.117167 5.374795 0.057177 23 0.1 1091.0 269 [rain, RH, wind, temp]
162 Bayesian Ridge Regression 0.000038 1.565400 0.984184 7777.286964 88.188928 22.198061 5.513711 0.080387 23 0.1 1091.0 269 [rain, RH, wind, temp]
163 Decision Tree Regression 0.151629 1.328092 0.906292 6929.703510 83.244841 21.227738 4.973370 0.067078 23 0.1 1091.0 269 [rain, RH, wind, temp]
164 SVM Regression -0.000646 1.566471 0.962752 7810.100418 88.374773 22.018133 4.613485 0.352632 23 0.1 1091.0 269 [rain, RH, wind, temp]
165 Random Forest Regression 0.074119 1.449430 0.947284 7575.109207 87.035103 21.811447 5.162353 0.451577 23 0.1 1091.0 269 [rain, RH, wind, temp]
166 Mondrian Forest Regression 0.010818 1.548525 0.975618 7771.692877 88.157205 22.123888 5.499732 1.124015 23 0.1 1091.0 269 [rain, RH, wind, temp]
167 XGBoost Regression 0.031162 1.516678 0.948868 7755.787421 88.066949 21.913847 4.309474 23.000765 23 0.1 1091.0 269 [rain, RH, wind, temp]
168 Linear Regression 0.015539 1.541135 0.985738 7757.339935 88.075762 22.219276 5.540720 0.022336 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
169 Bayesian Ridge Regression 0.000021 1.565427 0.984209 7777.293476 88.188965 22.198278 5.516868 0.063743 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
170 Decision Tree Regression 0.145796 1.337222 0.913246 7137.493008 84.483685 21.157816 4.637093 0.069215 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
171 SVM Regression -0.009976 1.581077 0.969144 7822.871862 88.447000 22.081596 4.474194 0.341351 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
172 Random Forest Regression 0.122416 1.373824 0.936931 7541.570391 86.842215 21.762646 4.853987 0.464362 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
173 Mondrian Forest Regression 0.012300 1.546205 0.976399 7775.138183 88.176744 22.132888 5.409031 1.196045 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
174 XGBoost Regression 0.031187 1.516638 0.957523 7792.579739 88.275590 22.001707 4.451702 23.249308 24 0.1 1091.0 269 [DMC, FFMC, DC, ISI]
175 Linear Regression 0.021804 0.771713 0.720307 171.345080 13.089885 7.929731 4.267013 0.062712 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
176 Bayesian Ridge Regression 0.000169 0.788782 0.732664 173.302827 13.164453 8.028876 4.471667 0.090315 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
177 Decision Tree Regression 0.167907 0.656451 0.656553 147.687076 12.152657 7.321948 3.780174 0.076738 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
178 SVM Regression 0.019605 0.773448 0.715703 173.717876 13.180208 7.894367 4.210846 1.112014 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
179 Random Forest Regression 0.131270 0.685354 0.677682 160.496742 12.668731 7.562352 3.976736 0.495857 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
180 Mondrian Forest Regression 0.017552 0.775068 0.726228 171.810005 13.107632 7.975933 4.432691 1.599018 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
181 XGBoost Regression 0.070682 0.733153 0.700589 172.077914 13.117847 7.780220 3.694690 26.602059 25 1.0 60.0 223 [ISI, DMC, Y, temp, rain, month, RH, wind, DC,...
182 Linear Regression 0.017936 0.774765 0.720468 172.142247 13.120299 7.930705 4.280088 0.024019 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
183 Bayesian Ridge Regression 0.000201 0.788756 0.732648 173.300810 13.164377 8.028749 4.472481 0.081795 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
184 Decision Tree Regression 0.167907 0.656451 0.656553 147.687076 12.152657 7.321948 3.780174 0.082038 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
185 SVM Regression 0.017699 0.774952 0.716670 174.084981 13.194127 7.901726 4.227504 0.809767 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
186 Random Forest Regression 0.128519 0.687525 0.679060 160.766011 12.679354 7.574277 3.920889 0.533916 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
187 Mondrian Forest Regression 0.020994 0.772352 0.722178 171.381813 13.091288 7.947184 4.138528 1.271442 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
188 XGBoost Regression 0.040299 0.757122 0.708168 178.058586 13.343859 7.846690 3.543665 38.707061 26 1.0 60.0 223 [ISI, DMC, temp, rain, month, RH, wind, DC, FF...
189 Linear Regression 0.007100 0.783314 0.730838 172.137925 13.120134 8.006119 4.462997 0.053520 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
190 Bayesian Ridge Regression 0.001037 0.788097 0.732282 173.254108 13.162603 8.025819 4.468429 0.074020 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
191 Decision Tree Regression 0.167907 0.656451 0.656553 147.687076 12.152657 7.321948 3.780174 0.070872 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
192 SVM Regression 0.012399 0.779133 0.718900 174.505437 13.210051 7.916737 4.210258 0.361420 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
193 Random Forest Regression 0.129218 0.686973 0.677464 160.407001 12.665189 7.561658 3.863223 0.506004 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
194 Mondrian Forest Regression 0.029235 0.765851 0.717979 171.171652 13.083258 7.903659 4.383942 1.332839 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
195 XGBoost Regression 0.069011 0.734471 0.699951 172.188285 13.122053 7.771634 3.791768 24.975517 27 1.0 60.0 223 [ISI, DMC, FFMC, rain, RH, wind, DC, temp]
196 Linear Regression 0.013003 0.778656 0.723346 172.518512 13.134630 7.947875 4.450129 0.055028 28 1.0 60.0 223 [rain, RH, wind, temp]
197 Bayesian Ridge Regression 0.005386 0.784666 0.730207 173.024626 13.153883 8.009011 4.355719 0.070859 28 1.0 60.0 223 [rain, RH, wind, temp]
198 Decision Tree Regression 0.182543 0.644904 0.652837 139.803260 11.823843 7.235515 4.074115 0.072359 28 1.0 60.0 223 [rain, RH, wind, temp]
199 SVM Regression 0.008398 0.782290 0.720317 177.380679 13.318434 7.921627 3.971668 0.286989 28 1.0 60.0 223 [rain, RH, wind, temp]
200 Random Forest Regression 0.109671 0.702394 0.684503 161.953485 12.726095 7.619626 3.852910 0.469818 28 1.0 60.0 223 [rain, RH, wind, temp]
201 Mondrian Forest Regression 0.020169 0.773004 0.722417 172.058095 13.117092 7.943566 4.267944 1.157222 28 1.0 60.0 223 [rain, RH, wind, temp]
202 XGBoost Regression 0.059137 0.742261 0.705472 172.874733 13.148184 7.813816 3.852409 41.664515 28 1.0 60.0 223 [rain, RH, wind, temp]
203 Linear Regression 0.006492 0.783794 0.729918 173.032446 13.154180 8.006054 4.454165 0.060180 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
204 Bayesian Ridge Regression 0.000116 0.788824 0.732684 173.306340 13.164587 8.029037 4.471603 0.074389 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
205 Decision Tree Regression 0.145181 0.674380 0.668911 148.701898 12.194339 7.405292 3.811976 0.070708 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
206 SVM Regression -0.000425 0.789250 0.725599 177.709172 13.330760 7.975535 4.116303 0.312591 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
207 Random Forest Regression 0.110977 0.701363 0.691704 161.729107 12.717276 7.687318 3.872826 0.471841 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
208 Mondrian Forest Regression 0.045665 0.752889 0.710152 169.560380 13.021535 7.836615 4.310977 0.980947 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]
209 XGBoost Regression 0.027890 0.766912 0.712429 178.740916 13.369402 7.878459 3.581193 22.170258 29 1.0 60.0 223 [DMC, FFMC, DC, ISI]

These results are almost as useless as the one above. However, it seems like we can generate submodels that work reasonable well considering the dataset. Let’s visualize the mess:

Let’s try a cell-wise approach as well. The basic requirements for this approach that there is a minimum of 5 samples per grid cell with no area greater than 500 ha. Here is what we can get out of it.

Let’s avoid commenting these results ;)

Conclusion

We can conclude that this dataset is of absolutely no use for prediction of forest fire sizes. The RMSE and MAD scores of the original publication indicate models that are as useless as these results here (results on log transformed sizes don’t count in the real wordl!). It would be interesting to get a full 30 year coverage including climate data to see how it turns out. We could also use BEHAVE (Fire Behavior Prediction and Fuel Monitoring System) of SAGA-GIS. Or we could treat it as a proper spatio-temporal problem and apply machine learning on it. I covered machine learning algorithms for geospatial applications last year.

I wrote my master thesis on rockfall hazard ratings and risk assessment. I’m almost tempted to write a proper paper on wildfire risk assessment to see what is out there and how that could be improved. I looke around quite a lot and most contained very low quality statistical models (if any at all).