Nhess 2021 299 ATC1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

1 Assessing the importance of conditioning factorfeature selection in Landslide

2 Susceptibility for Belluno province (Veneto Region, NE Italy)

4 Sansar Raj Meena1,2 *, Silvia Puliero1, Kushanav Bhuyan1,2, Mario Floris1, Filippo Catani1

1
6 Department of Geosciences, University of Padova, Padova, Italy.
2
7 Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente,

8 Enschede, Netherlands.

10 * Corresponding author Email: [email protected]

11

12

13 Abstract

14 In the domain of landslide risk science, landslide susceptibility mapping (LSM) is very

15 important as it helps spatially identify potential landslide-prone regions. This study used a

16 statistical ensemble model (Frequency Ratio and Evidence Belief Function) and two machine

17 learning (ML) models (Random Forest and XG-Boost) for LSM in the Belluno province

18 (Veneto Region, NE Italy). The study investigated the importance of the conditioning factors

19 in predicting landslide occurrences using the mentioned models. In this paper, we evaluated

20 the importance of the conditioning factors (features) in the overall prediction capabilities of the

21 statistical and ML algorithms. By the trial-and-error method, we eliminated the least

22 "important" features by using a common threshold. Conclusively, we found that removing the

23 least "important" features does not impact the overall accuracy of the LSM for all three models.

24 Based on the results of our study, the most commonly available features, for example, the

25 topographic features, contributes to comparable results after removing the least "important"

1
26 ones. This confirms that the requirement for the important conditioning factor maps can be

27 assessed based on the physiography of the region. Based on the analysis of the three models, it

28 was observed that most commonly available feature data can be useful for carrying out LSM

29 at regional scale., eliminating the least available ones in most of the use cases due to data

30 scarcity. Identifying LSMs at regional scale has implications for understanding landslide

31 phenomena in the region and post-event relief recovery measures, planning disaster risk

32 reduction, mitigation, and evaluating potentially affected areas.

33

34 1. Introduction

35 Landslides are one of the most frequently occurring natural disasters that cause significant

36 human casualties and infrastructure destruction. Landslides are triggered by several natural and

37 man-made triggering events such as earthquakes, volcanic eruptions, heavy rains, extreme

38 winds, and unsustainable construction activities such as informal unplanned settlement

39 development and cutting of roads along the slopes (Glade et al., 2006;van Westen et al., 2008).

40 Extreme meteorological events such as the Vaia storm of 2018 triggered landslides and debris

41 flow, destroyed critical infrastructures in the northern parts of Italy (Boretto et al., 2021). As

42 reported by (Gariano et al., 2021) in the last 50 years between 1969-2018, landslides posed a

43 severe threat to the Italian population. Approximately, 1500 out of the 8100 municipalities in

44 Italy have faced landslides with severe fatalities. Between the years of 1990 and 1999, 263

45 people were killed by landslides. Studies by (Rossi et al., 2019) estimated that approximately

46 2500 people were killed between 1945-1990. Moreover, predictive modelling of the Italian

47 population at risk to landslides (Rossi et al., 2019) shows massive tendency of risk to the

48 population with data acquired between 1861-2015, emphasizing the necessity of landslide risk

49 studies.

2
50 Therefore, to assess landslide risk and plan for suitable risk mitigation measures, it is crucial

51 to realize analyse the significance of landslide studies, particularly landslide Landslide

52 susceptibility mapping Mapping (LSM). LSM is an essential tool that incorporates the potential

53 landslide locations (Senouci et al., 2021). The probability of a landslide occurring in a

54 particular region owing to the effects of several causative factors is referred to as landslide

55 susceptibility. LSM is an essential step towards landslide risk management and helps in

56 effective mapping of the spatial distribution of probable landslide manifestations (Dai et al.,

57 2002). In the past, researchers have used a range of models to assess landslide susceptibility

58 using technologies such as Earth Observation (EO) and Geographic Information Systems

59 (GIS). The recognition extraction and analysis of slope movements have been going on since

60 the early 1970s (Brabb et al., 1972) and is still one of the most important componentstools to

61 perform LSM (Ercanoglu and Gokceoglu, 2002;Chacón et al., 2006;Guzzetti et al.,

62 2006;Castellanos Abella and Van Westen, 2008;Floris et al., 2011;Catani et al., 2013;Pham et

63 al., 2015;Reichenbach et al., 2018;Youssef and Pourghasemi, 2021;Liu et al., 2021).

64 Traditional methods such as the expert-based Analytical Hierarchy Process (AHP), multi-

65 variate statistics, data-driven Frequency Ratio (FR) have been employed for landslide

66 susceptibility for many years, with satisfactory results (Pradhan, 2010;Castellanos Abella and

67 Van Westen, 2008;Komac, 2006). Examples of such approaches is given in the study area, by

68 which combined traditional LSM methods with an updated online landslide database in the

69 Veneto Region, Italy, where they used online spatial data from Italian portals for mapping

70 landslide susceptibility at medium and large scales. A A use case of such approaches is given

71 by Floris et al (2011) which apply traditional LSM methods (FR) for mapping landslide

72 susceptibility in a case study in Veneto Region, Italy. Afterwards, with the development of

73 new approaches, susceptibility modelling has advanced from traditional approaches. Presently,

74 two approaches: (1) statistical and (2) machine learning, are practised for LSM at investigating

3
75 the landslide predisposing factors and to map the geographical distribution of landslide

76 processes. (Reichenbach et al., 2018) classified landslide susceptibility models into six main

77 groups: (1) classical statistics, (2) index-based, (3) machine learning, (4) multi-criteria analysis,

78 (5) neural networks, and (6) others. Research by (Reichenbach et al., 2018) also depicted that

79 before 1995, only five models were used for LSM, but in recent times, an investigation of 19

80 other models was carried out, which yielded good results. More than 50 per cent of the methods

81 consisting of the first five models mentioned above accounted for landslide susceptibility

82 studies. Recent work of (Stanley et al., 2021) emphasized the importance of data-driven

83 methods in global LSM, trained to report landslide spatial occurrences between the periods of

84 2015-2018. The first version of the Landslide Hazard Assessment for Situational Awareness

85 (LHASA) from their work for NASA, reported landslide occurrences with a decision tree

86 model that first defines the intensity of one week of rainfall. LHASA version 2 used the data-

87 driven model of XG-Boost by adding two dynamically varying factors: snow and soil moisture.

88 However, despite advances in LSM, the advent of feature importance or the importance of the

89 causative conditioning factors in the prediction capability of a model is not discussed enough.

90 The need of increasing our control over the model sensitivity to system parameters changes,

91 including those induced by anthropogenic and climate-change dynamics, is becoming a key

92 factor in the implementation of truly efficient LSM for risk mitigation purposes. The VAIA

93 Vaia windstorm of 2018, Forzieri et al, (2020), as a typical extreme weather event, may easily

94 escape traditional statistical prediction schemes and represent, therefore, a challenging test for

95 exploring the sensitivity of the various LSM models to changing factors and conditions .

96 One goal of this research is to look into the relative changes in LSM accuracy when the least

97 "important" conditioning factors are removed. Feature selection in LSM is an approach in

98 reducing landslide conditioning features factors to improve model performance and reduce

99 computational coststime. The purpose of this approach is to find the optimal set of conditioning

4
100 features factors that will provide the best fit for the model to yield higher accuracy as

101 predictions. (Micheletti et al., 2014) emphasized the importance of feature selection in LSM

102 and discussed the use of Machine Learning (ML) models such as Support Vector Machine

103 (SVM), Random Forest (RF), and AdaBoost for LSM, as well as the significance of associated

104 features within the confluence of the ML models for feature importance. However, their study

105 did not consider geological and meteorological features like lithology, land use, and rainfall

106 intensity for both LSM and feature selection. Studies by (Liu et al., 2021) depicted the

107 improvement in the predictive capability of the so-called Feature Selected Machine Learning

108 (FS-ML) model but also remarked on the fact that the same features conditioning factors may

109 contribute differently in different ML models. In this study, we wanted to investigate post-

110 predictionthe prediction capability of the model after removing conditioning factors as

111 anfeature selection approach to improve LSM accuracy in contrast to what has been done in

112 literature like (Liu et al., 2021), where they perform assess pre-prediction feature conditioning

113 factor importance using approaches like multi-collinearity analysis, variance inflation factor

114 before prediction of the susceptibility. The identification of the most crucial features can help

115 in monitoring the effect of extreme events (such as Vaia) on the increase changes in the

116 evolution of landslide hazard. This has implications for observation of the influence of extreme

117 events on crucial factors in comprehending the changes in the evolution of hazard can be

118 evaluated.

119 We present a study in the province of Belluno (Veneto Region, NE Italy) with the comparison

120 of feature or the conditioning factor importance of statistical and ML models for LSM before

121 the Vaia storm event. The results from the LSM will be then validated using the IFFI landslide

122 inventory data for testing the various models' prediction capability with/without certain factors.

123 We also investigate whether many of the latter features conditioning factors are crucial for

124 LSM. As in many regions over the world, the same data or factor maps might not be available.

5
125

126 2. Study area and Data

127 2.1 Study area

128 The area of the Belluno Province (Veneto Region, NE Italy) is part of the tectonic unit of the

129 Southern Alps. The territory is 3,672 km² wide, stretching from north to south between the

130 Dolomite Alps and the Venetian Pre-Alps, with elevations ranging from 42 to 3325 m above

131 mean sea level. From a geological point of view, Dolomite Alps comprises the Hercynian

132 crystalline basement consisting of micaschists and phyllites intruded by the Permian

133 ignimbrites (Doglioni, 1990;Schönborn, 1999). These Paleozoic units are mainly outcropping

134 in the NE and central-West sectors. The Middle-Upper Triassic includes carbonate, volcanic

135 and dolomitic formations. In particular, the Upper Triassic Main Dolomite covers 14% of the

136 whole province. Jurassic-Cretaceous limestone and marls are especially located between the

137 Valsugana and Belluno thrusts (Sauro et al., 2013). Moreover, in the Belluno valley and in the

138 southern part of the area, Cenozoic sediments, i.e., flysch and molasse and Quaternary glacial,

139 alluvial and colluvial deposits are largely present. Instead, Venetian Prealps are characterized

140 by Jurassic-Cretaceous sedimentary cover, such as layered limestones and dolomites with

141 cherts (Compagnoni et al., 2005;Corò et al., 2015). Because of its morphological

142 characteristics, the study area is affected by slope instability, which overlay an area of 165 km²

143 corresponding to 6% of the province (Baglioni et al., 2006). Most of the landslides phenomena

144 are located in the NW (Upper basin of Cordevole River) and SE (Alpago district) sectors of the

145 province (Figure 1). The dominant landslide types are slides (47%), rapid flows (20%), slow

146 flows (12%), and shallow soil slips (7%) (Iadanza et al., 2021). The climate of the province of

147 Belluno is continental. The mean annual temperature recorded in the period 1961–1990 is 7°C

148 and the mean precipitation is 1284 mm/year (Desiato et al., 2005) with two peaks distributed

149 in spring and autumn. In the last 27 years, temperature and rainfall intensity in the study area

6
150 have increased due to climatic changes leading to more frequent meteorological conditions

151 (ARPAV, 2021 (Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto).

152

153 2.2 Landslide inventory data

154 The inventory of landslide phenomena in Italy (IFFI) conducted by the Italian Institute for

155 Environmental Protection and Research (ISPRA) and the Regions and Autonomous

156 Provinces was used in this study (Trigila et al., 2010). The IFFI Project was financed in 1997.

157 Since 2005, the catalogue is available online and consists of point features indicating the

158 scarp of the landslides and polygon features delineating the instabilities. The archive stores

159 the main attributes of the landslides, such as morphometry, type of movement, rate, involved

160 material, induced damages and mitigation measures. The inventory currently holds 620,808

161 landslides collected from historical documents, field surveys and aerial photointerpretation,

162 covering an area of 23,700 km2, which corresponds to the 7.9% of the Italian territory (Trigila

163 and Iadanza, 2018). In the Belluno province, the IFFI inventory consists of 5934 points of

164 landslides occurred before 2006 (Baglioni et al., 2006).

7
166 Figure 1: a) Location of the study area and landslides collected by IFFI (Inventory of

167 Landslide Phenomena in Italy) project b) field photographs after the VAIA event.

168

8
169 2.3 Landslide conditioning factors

170 Based on the regional environmental characteristics of the study area and the scientific

171 literature, fourteen landslide conditioning factors were selected, including: (i) topographical

172 factors such as elevation, slope angle, slope aspect, topographical wetness index (TWI),

173 topographical position index (TPI), topographical roughness index (TRI), profile curvature,

174 and plan curvature; (ii) hydrological factors (i.e., distance to drainage, precipitation);

175 geological factors (lithology); (iii) anthropogenic factors (distance to roads); and (iv)

176 environmental factors like Normalized Difference Vegetation Index (NDVI) and landcover

177 (see figure 2). A freely accessible digital elevation model (DEM) with a spatial resolution of

178 25 metres and was downloaded from the Veneto Region cartographic portal

179 (https://idt2.regione.veneto.it), was used to derive the topographical layers. Refer to table 1 for

180 a detailed description of the conditioning factors. Land cover, lithology maps, road network

181 and drainage maps were downloaded from the same portal. Rainfall data was downloaded from

182 the Regional Agency for the Environmental Prevention and Protection of Veneto (ARPAV:

183 https://www.arpa.veneto.it/ ) web site. We resampled the conditioning factor maps to 25 meter

184 pixels in order to do the analysis.

185 Table 1: Description of the conditioning factors for landslide occurrences.

Sl Conditioning Data Range Description/Justification

No. Factor

1 Elevation 42 m to 3325 m The geomorphological and geological processes

are affected by elevation (Raja et al., 2017). It

has an impact on topographic characteristics,

which contribute to spatial differences in many

landform processes, as well as the distribution of

vegetation.

9
2 Slope Flat areas to Slope is a derivative of the DEM which can cause

very high failure of slope (Pham et al., 2018). Landforms

slopes till having a higher angle of slope are usually more

86.48° susceptible to collapse, which is closely

correlated to landslides.

3 Aspect North (0 Aspect has a correlation with other geo-

degrees) to environmental factors is a crucial factor for LSM

North (360 that describes the slope direction (Dahal et al.,

degrees) 2008). The slope direction to a degree dictates

the frequency of landslides.

4 Topographic -2.12 to 20.06 The influence of topography on the location and

wetness index amount of saturated runoff source areas is an

essential conditioning factor (Pourghasemi et al.,

2012). TWI measures the amount of

accumulated water and distribution of soil

moisture at a location. Higher TWI values can

relate to higher chances of landslide occurrence.

5 Topographic -1143.68 to The topographic position index (TPI) shows the

Position Index 243.84 difference between the elevation of a point and

its surrounding defined by a specified radius.

Lower values representsrepresent the plausibility

of features lower than the surrounding, thus

possibly relating to higher odds of landslide

occurrence.

10
6 Topographic 0 to 1077.30 Topographic Roughness Index (TRI) calculates

Roughness the difference in elevation between adjacent

Index pixels in a DEM which depicts the terrain

fluctuation (Riley et al., 1999). As the slope of a

landscape moves, the TRI decreases, relating to

slope movement.

7 Profile Concave The driving and resisting forces within a

Curvature Flat landslide in the slope direction are affected by

Convex profile curvature.

8 Plan Concave The direction of landslide movement is

Curvature Flat controlled by the plan curvature, which regulates

Convex the convergence or divergence of landslide

material (Dury, 1972;Meten et al., 2015).

9 Drainage 0 to 400 Drainage transports water, which induces

material saturation, culminating in landslides in

valleys. (Shahabi and Hashim, 2015).

10 Rainfall 84 to 1198.05 Precipitation characteristics shift by climatic

(mm/month) conditions and geographical characteristics,

resulting in significant temporal and

geographical variations in rainfall quantity and

intensity. This can lead to the triggering of

landslides across large areas but also for specific

smaller areas.

11
11 Lithology Volcanites, The geological strength indices, failure

Pre-Permian, susceptibility, and permeability of lithological

metamorphic, units differ (Yalcin and Bulut 2006), where

sequence changes in the stress-strain behaviour of the rock

Morainic, strata can be caused by lithological unit

Gravels, etc. variation. Slope failure typically occurs on a

slope with low strength and permeabilityshear

strength.

12 Distance to 0 to 200 A crucial manmade element impacting the

Roads occurrence of landslides is roads because of road

clear-cutting and construction activities

(Dunning et al., 2009).

13 Landcover Rock, Forest, Because land cover may influence the

Urban cover hydrological functioning of slopes, rainfall

etc. partitioning, infiltration properties, and runoff, as

well as the soil shear strength, different land

cover types may affect slope stability.Land cover

can be utilized to describe the region's vastly

dismembered zones and the likelihood of

landslide activities.

14 NDVI -0.66 to 0.66 NDVI is important in realizing the amount of

vegetation cover which can be interpreted to

understand the strength of the slope and the

landslide occurrences. The NDVI reflects the

12
inhibitory effect of landslide occurrence (Huang

et al., 2020).

186

13
187

14
188

15
16
191

192 Figure 2: Maps of the conditioning factors used in this study: (A) Elevation, (B) Slope, (C)

193 Aspect, (D) Topographical wetness index, (E) Topographical position index, (F) Topographical

194 roughness index, (G) Profile curvature, (H) Plane curvature, (I) Distance to drainage networks,

195 (J) Rainfall monthly average (1994-2020) mm, (K) Lithology, (L) Distance to road network

196 (M) Landcover, (N) NDVI

197

198 3. Methodology

199 We propose an approach that helpss understand assess importance of the the intrinsic

200 relationship between the features conditioning factorsand the output post-prediction, which can

201 help improve the susceptibility results be then refined by removing the less "important" features

202 factors throughout the statistical and ML models. As stated previously, the study attempts the

203 application of sensitivity analysis to understand relative feature importance of the conditioning

17
204 factors as a preliminary step towards improving the landslide susceptibility the

205 modellingprediction capability of a space-time changing parameter in LSM methods. The

206 apparent reality is not as simple as using a certain model that gives the highest LSM accuracy

207 and using said derived outputs maps for disaster risk management and mitigation measures.

208 Therefore, it is important to test the effects of the features and its relative importance in LSM.

209 In this study, the LSM was obtained by the combination between IFFI landslide inventory and

210 the conditioning factors through statistical methods such as FR-EBF and ML models, i.e.

211 Random Forest and XG-Boost

212 (Figure 3).

213 The successive sub-sections address the definitions of the statistical and ML models for LSM.

18
215

19
216 Figure 3: Overview of the conceptual workflow of methodology for landslide susceptibility

217 assessment.

218

219 3.1 Statistical approach

220 3.1.1 Ensemble Frequency Ratio - Evidence Belief Function

221 In landslide susceptibility studies, the frequency ratio (FR) model is often applied. This is an

222 straightforward evaluation tool method which calculates the likelihood of landslide occurrence

223 and non-occurrence for each conditioning factor. (Lee, 2013;Mondal and Maiti, 2013;Shahabi

224 et al., 2014). For each landslide conditioning factor, the FR is a probabilistic model based on

225 observed correlations between landslide distribution and related parameters (Lea Tien Tay

226 2014). The model depicts the relationship between spatial locations and the factors that

227 determine the occurrence of landslides in a specific area. Spatial phenomenon and factor classes

228 correlation can be found through FR and is very helpful for geospatial analysis (Mahalingam

229 et al. 2016; Meena et al. 2019b). Figure 3 gives an overview of the methodology employed in

230 this study.

231 The proportion of landslide inventory points for all classes within each factor can be used to

232 compute FR weights. The area ratio for each of the factor classes in relation to the total area of

233 the study region was calculated by overlapping the landslide inventory points with the

234 conditioning factors. The FR weights are calculated by dividing the landslide occurrence ratio

235 in a class by the entire area in that class (Demir et al. 2012). FR weights can be computed using

236 the ratios of landslide inventory points of all classes within each factor. The landslide inventory

237 points are then overlaid with the conditioning factors to obtain the area ratio for each factor

238 class to the total area. The FR weights are then obtained by dividing the landslide occurrence

239 ratio in a class by the area in that class (Demir et al. 2012).

240

20
242

243 Figure 3: Overview of the conceptual workflow of methodology for landslide susceptibility

244 assessment.

21
245 Using the equationEq. 1, the landslide Landslide susceptibility Susceptibility index Index (LSI)

246 was computed by summing the values of each factor ratio (Lee, 2013):

247

248 LSI =∑ FR (Eq.21)

249

250 LSI= (DEM*wi)+(slope*wi)+(aspect*wi)+(Topographic Wetness Index*wi)+(Topographic

251 Roughness Index*wi)+(Topographic Position Index*wi)+(Distance to road*wi)+(Distance to

252 drainage*wi)+(Land Cover*wi)+(Lithology*wi)+(NDVI*wi)+(Rainfall*wi)+(Profile

253 Curvature*wi)+(Plain Curvature*wi)

254

255 Where LSI is the landslide susceptibility index, FR is the frequency ratio of every factor type

256 or class, and wi is the weight of each conditioning factor. The higher the LSI value, the higher

257 the susceptibility to landslides.

258 Where the Landslide Susceptibility Index landslide susceptibility index is the LSI, and the

259 frequency Frequency ratio Ratio of each factor type is the FR. An FR value of 1 in the

260 relationship analysis implies that the density of landslides in a specific class is proportionate to

261 the size of the class in the map; an LSI value of 1 is an average value. Higher LSI values suggest

262 a stronger spatial correlation between landslides and each class of the related factor, whereas

263 lower LSI values imply a weaker correlation. In a nutshell, a greater LSI value represents higher

264 landslide susceptibility and the vice-versa. We integrated the LSI results with evidence

265 Evidence belief Belief functions Functions (EBF) derived predictor values. The EBF uses the

266 conditioning factors defined by FR as the input data. Eq. (32) was applied to the rating of every

267 spatial factor. with the training dataset.

268

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆−𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
269 𝑃𝑃𝑃𝑃 = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆−𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑚𝑚𝑚𝑚𝑚𝑚 (Eq.32)

22
270 where SA is the indicator of spatial Spatial association Association between spatial variables

271 factors and landslides, and whereas PR is the Pprediction Rrate. The lowest absolute difference

272 of all variables factors is divided by the computed absolute difference between the maximum

273 and the least SA values (Table 2). The eigenvectors of the matrix were calculated by

274 normalising each column's pairwise result. The eigenvalue was calculated by dividing each

275 pairwise importance rate in a column by the total of the pairwise importance rates in that

276 column. The fractional predictor is obtained by averaging the eigenvectors across a row of

277 matrices. Pairwise comparison of the PR values of the slope failure predictors yielded the

278 pairwise rating matrix of the predictor rating. We used PR values for assigning weights of the

279 factors for susceptibility analysis.

280

281 3.2 Machine learning models

282 3.2.1 Random Forest model

283 Random Forest (RF) is based on the fundamental concept of the "wisdom of crowds" where

284 multiple decision trees, introduced by (Breiman, 2001), has been utilized in a number of remote

285 sensing research for a variety of applications .(Melville et al., 2018). RF creates many deep

286 decision trees using the training data and it can overcome the overfitting problem mostly

287 resulting from complex datasets better than other decision trees. Each RF decision tree gives a

288 prediction, which is then weighted according to the value created from votes from each tree

289 leading to generation of the susceptibility map (see figure 4). Since the RF has shown an

290 impressive performance for classification purposes, it is regarded as one of the most efficient

291 non-parametric ensembles models (Chen et al., 2017). Based on the advantages listed above,

292 the RF model is used to assess landslide susceptibility. Landslide inventories along with the

293 conditioning factors are divided into training and testing data as seen in figure 4. Using the

294 bagging technique, the training data is divided into training subsets, generally about one-third

23
295 of the total training samples. A decision tree is created for each subset based on the training

296 subset defined in the first stage and accordingly, votes as implemented that outputs the

297 landslide susceptibility.

299

300 Figure 4: Conceptual diagram of the Random Forest model.

301

302 3.2.2 XG-Boost model

303 Extreme gradient boosting or commonly known as the XG-Boost ML model is an optimized

304 gradient boosting algorithm that is designed for optimum speed and performance and boosting

305 ensembles are used to generate a prediction model. (Sahin, 2020). The core idea of a boosting

306 algorithm is to combine the weaker learners to improve accuracy (Can et al., 2021), meaning

307 that different models with lower susceptibility accuracies are “boosted” by combining them to

308 achieve an ensembled higher susceptibility accuracy. The model is known for its fast-training

309 speed for classification tasks. In the study, we use training parameters to adjust the XG-Boost

310 algorithm like learning rate, subsample ratio, maximum depth of the tree and others. It uses

311 boosting techniques to reduce overfitting problems to improve accuracy results (figure 5). The

312 training data is divided into subsets which are then trained using a tree ensemble model. This

24
313 means that every weight derived from each model training of landslide instances in the area are

314 added and then predicted on the test set with the average landslide susceptibility scores of the

315 ensemble models.

316

318

319 Figure 5: Training and testing procedure of the XG-Boost model.

320 3.3 Feature selection algorithms

321 The goal of feature selection is to remove the least important conditioning factors in order to

322 increase the aid in the discovery of acceptable conditions for training the models and to increase

323 generalisability in landslide prediction. This selection help eliminates the irrelevant (less

324 important) conditioning factors to obtain optimal prediction accuracy (Micheletti et al., 2014).

325 For the statistical model, we used class weights obtained from frequency ratio and used them

326 as input for generating predictor rate from FR-EBF model which gives the final weights of the

327 conditioning factors. So, we used the predictor rate weights to select the suitable features.

328 In terms of the feature importance for selecting the right set of features (or factors in this

329 case)factors for both RF and XG-Boost, we use the in-built impurity feature importance

330 algorithm which is performed on the training set (refer to feature selection in figure 3). Based

25
331 on the results of the feature selection algorithms for the results as ranks of features conditioning

332 factors for each modelsorted in a descending order, the most important features factors will be

333 selected to investigate the improvement of model performance in terms of the accuracy

334 obtained. With this, we can understand which of the conditioning factors played the most

335 important roles in giving the highest accuracy for each ML model. Thus, we can comment on

336 whether certain factors are impactful in performing LSM with ML models. Besides, the

337 comparison of the resulting important features of the different models can be interpreted to

338 highlight the respective strengths of the models and allow drawing better conclusions towards

339 the robustness of the relevant features for landslide predictions.

340

341 4. Results

342 4.1 Statistical model

343 The class weights were derived from data driven FR model and the final weights of the factors

344 were derived by using predictor rate from evidence belief function given in Table 2. The class

345 and factor weights were calculated using equations 1 and 2. The final weights of landslide

346 conditioning factors were calculated using an ensemble of FR-EBF, and then utilised to create

347 the final LSM. Because there is no common approach for identifying landslide susceptibility

348 classes in the final LSM, we normalised the findings to 0 to 100 for uniformity and

349 comparability. Using a quantile natural breaks classification, which separates the values into

350 groups with an equalrandom number of values, the resultant LSM was classified into

351 five classes: very low, low, moderate, high, and very high, as shown in figure 7 .(Chung and

352 Fabbri, 2003). This method of classification gives a better distribution of values in each class

353 than common approaches such as natural breaks, which can result in certain classes having

354 limited or excessive data.

26
355 In terms of the feature importance that we observe in figure 6 and Table 2 (normalized weights),

356 based on the trial-and-error approach, factors (or features) under the threshold of 0.3 were

357 discarded as they did not make much of a difference in terms of predicting landslide

358 occurrences in the study area. Therefore, five conditioning factors having coefficient values

359 lower than 0.30 were dropped and overall, the area under the curve (AUC) accuracy still

360 remained similar to the original accuracy with the 14 factors.


0.8
Importance coefficient values

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

FR-EBF-14 Features
361

362

363 Figure 6: Feature importance of the statistical model

364

365 Table 2: Frequency ratio values for spatial factors class weighting and EBF coefficients for

366 predictor rates (PR) based on degrees of spatial associations.

Factors and Bel Min Max [Max-Min] Predictor FR Weights Normalized

classes Rate weights

Elevation 0.07 0.24 0.17 0.73

<430 0.07 0.50 0.06

430 - 700 0.15 1.13 0.20

700 - 1000 0.13 0.96 0.19

1000 - 1500 0.12 0.86 0.15

27
1500 - 1900 0.11 0.81 0.12

1900 - 2300 0.24 1.72 0.17

>2300 0.18 1.31 0.12

Profile 0.00 0.53 0.53 2.30

Curvature

Concave 0.53 1.05 0.40

Flat 0.00 0.00 0.30

Convex 0.47 0.95 0.30

Plan 0.00 0.52 0.52 2.26

Curvature

Concave 0.52 1.03 0.35

Flat 0.00 0.00 0.33

Convex 0.48 0.97 0.32

Slope 0.14 0.25 0.11 0.48

<10 0.14 0.70 0.14

10 - 20 0.23 1.11 0.22

20 - 30 0.25 1.25 0.27

30 - 40 0.20 0.99 0.20

>40 0.17 0.86 0.17

Distance from 0.02 0.36 0.34 1.49

drainage

0 - 100 0.36 1.15 0.28

100 - 200 0.30 0.97 0.19

200 - 300 0.23 0.74 0.12

300 - 400 0.10 0.31 0.07

>400 0.02 0.06 0.34

Distance from 0.08 0.24 0.15 0.67

roads

0 - 50 0.36 1.15 0.27

50 - 100 0.30 0.97 0.19

28
100 - 150 0.23 0.74 0.17

150 - 200 0.10 0.31 0.16

>200 0.02 0.06 0.13

Landcover 0.01 0.24 0.23 2.98

Urban 0.17 1.48 0.17

Rocks 0.10 0.90 0.09

Arable 0.01 0.07 0.01

Permanent 0.10 0.92 0.13

cultivation

Forest 0.11 0.95 0.11

Grassland 0.24 2.11 0.14

Shrubland 0.04 0.37 0.04

Sparse 0.12 1.08 0.21

vegetation

Water body 0.12 1.05 0.09

TWI 0.17 0.25 0.08 1.00

-2.12 - 1.52 0.19 1.01 0.20

1.52 - 3.35 0.20 1.04 0.20

3.35 - 5.70 0.18 0.92 0.18

5.70 - 9.62 0.17 0.90 0.18

9.62 - 20.06 0.25 1.30 0.24

TPI 0.00 0.31 0.31 1.35

-1143.68 - - 0.00 0.00 0.00

202.34

-202.34 - - 0.18 0.74 0.21

17.33

-17.33 - -1.01 0.26 1.06 0.27

-1.01 - 20.75 0.24 0.98 0.26

20.75 - 243.84 0.31 1.24 0.27

TRI 0.00 0.34 0.34 1.47

29
0 - 4.22 0.22 0.73 0.23

4.22 - 21.1 0.34 1.11 0.35

21.12 - 46.47 0.25 0.82 0.22

46.47 - 257.70 0.20 0.65 0.20

257.70 - 0.00 0.00 0.00

1077.30

Rainfall 0.00 0.81 0.81 3.54

intensity

84 - 110.83 0.81 11.29 0.32

110.83 - 0.08 1.15 0.27

127.38

127.38 - 0.05 0.70 0.15

140.80

140.80 - 0.06 0.81 0.19

157.35

157.35 - 0.00 0.00 0.06

198.05

NDVI 0.14 0.25 0.11 0.48

-0.66 - 0.15 0.14 0.70 0.13

0.15 - 0.34 0.22 1.13 0.21

0.34 - 0.52 0.25 1.26 0.25

0.52 - 0.66 0.21 1.07 0.21

0.66 - 0.99 0.18 0.89 0.20

Aspect 0.05 0.15 0.09 0.41

Flat (-1) 0.11 1.02 0.10

North (0-22.5) 0.08 0.75 0.07

Northeast 0.09 0.84 0.09

(22.5-67.5)

East (67.5- 0.11 1.08 0.11

112.5)

30
Southeast 0.14 1.31 0.14

(112.5-157.5)

South (157.5- 0.15 1.40 0.14

202.5)

Southwest 0.14 1.33 0.14

(202.5-247.5)

West (247.5- 0.08 0.76 0.09

292.5)

Northwest 0.05 0.50 0.07

(292.5-337.5)

North (337.5- 0.06 0.58 0.06

360)

Lithology 0.04 0.26 0.22 2.84

Volcanites 0.26 3.45 0.16

Pre-Permian 0.11 1.50 0.11

metamorphic

sequence

Morainic 0.06 0.85 0.15

Gravels 0.04 0.52 0.04

Mix of alluvial 0.05 0.70 0.03

deposits

Conglomerate 0.21 2.84 0.21

Limestone and 0.13 1.76 0.16

dolomitic

limestone

Calcareous 0.08 1.04 0.08

shales

31
Shales and 0.06 0.76 0.07

gypsums

Alternation of 0.07 0.91 0.06

marls and

sandstones

Water body 0.22 2.97 0.00

367

32
33
369 Figure 7: Landslide susceptibility maps derived using the ensemble of FR-EBF approaches

370 for (A) 14 landslide features and (B) 9 landslide features (Black square represents the

371 enlarged area).

372

373 4.2 Machine learning models

374 The LSM was generated based on the conditioning factor data, where the model learnt the

375 information from the feature maps, which helped identify areas of susceptibility. The final

376 results of the ML models in generating the LSM are given in Table 3. We observe that the AUC

377 scores of RF are not much apart from the XG-Boost model, indicating very good prediction

378 capabilitysimilar predictive skills of both the models. Based on the information in Table 32,

379 the number of pixels in the moderate susceptibility class is more in the XG-Boost model than

380 the RF model. Visually the results show more susceptible areas near the landslide features

381 (figures 8 and 9).

382 The model performance in terms of the accuracy of AUC is relatively similar to the results after

383 eliminating the lower degree of feature importance for both RF and XG-Boost. As discussed

384 previously in section 3.3, the feature importance for the ML models is carried out using the

385 impurity feature importance algorithm that enables to assess the relative relevance of the

386 conditioning factors in the optimal prediction of the landslides in terms of accuracy. As seen

387 in figure 10, the factors of Landcover, Profile Curvature, Plan Curvature, TWI and TPI have

388 the lowest values for the RF model. We examined various values as a cut-off for choosing the

389 "important" conditioning factors and Aafter much trial-and-error, a value of 0.03 was chosen

390 as the threshold, and a. Any factors above this valuethat were considered the as "important"

391 factors for landslide susceptibility, h. Hence, in figure 8, we see that the five factors mentioned

392 above are removed and giving us 0.906 AUC as accuracy, which is better in AUC accuracy

393 without removing the five factors (0.902 AUC as seen in Table 3).

34
394 Similarly, the same was repeated for the XG-Boost ML model and referring to Table 3, and

395 despite removing the lower valued conditioning factors of Profile Curvature, TPI, and Plan

396 Curvature, the AUC accuracy score was similar (Table 3). We observe that Slope and Distance

397 to Roads had a much bigger impact on the RF mode than the XG-Boost model. On the other

398 hand, Lithology played a bigger role in estimating landslide occurrences in the XG-Boost

399 model. These observations indicate interesting results which will be discussed further in the

400 discussion section.

401

402 Table 3: Overall table with AUC results for landslide susceptibility of Belluno.

403

No. Model AUC

1 FR-EBF 14 features 0.836

2 FR-EBF 9 features 0.834

3 RF 14 features 0.902

4 RF 9 features 0.906

5 XG-Boost 14 features 0.910

6 XG-Boost 10 features 0.907

404

405

35
407

36
408 Figure 8: LSMs derived using the Random Forest approach for (A) 14 landslide features and

409 (B) 9 landslide features (Black square represents the enlarged area).

410

411

37
413

414 Figure 9: LSMs derived using the XG-Boost approach for (A) 14 landslide features and (B) 9

415 landslide features (Black square represents the enlarged area).

38
416

0.25
Importance coefficient values

0.2

0.15

0.1

0.05

RF-14 Features XG-Boost-14 Features


417

418

419 Figure 10: Feature importance of the RF and XG-Boost models.

420

421 5. ValidationAccuracy Assessmen

422 Validation Accuracy assessment is crucial in producing quality LSMs for natural hazards

423 where the information presented in the map is beneficial for planners (Goetz et al., 2015) A

424 number of validation accuracy assessment approaches may be used to assess the quality of the

425 LSMs. We compare the landslide inventory data to the resultant maps derived using the

426 ensemble of FR-EBF, machine learning RF and XG-Boost models. The efficiency of any model

427 for LSM is calculated by comparing the inventory data to the produced maps. This reflects if

428 the models in use can accurately forecast which areas are susceptible to landslides

429 (Pourghasemi et al., 2018). The findings from the total landslide input events were validated

430 tested using 30% of the landslide occurrences. Validation Testing for this study was done using

39
431 the Receiver Operating Characteristics (ROC) and the Relative Landslide Density (R-Index)

432 approaches.

433

434 5.1 Receiver Operating Characteristics (ROC)

435 The test dataset was used to corroborate the six resultant LSMs from statistical and machine

436 learning using the receiver operating characteristics (ROC) approach. The ROC approach

437 shows how to evaluate the true positive rate (TPR) and false positive rate (FPR) in the

438 LSMs (Ghorbanzadeh et al., 2018; Linden, 2006). TPRs are pixels that are correctly labeled as

439 high susceptibility in the landslide validation data, whereas FPRs are pixels that are incorrectly

440 labeled. ROC curves are created using TPRs versus FPRs. The accuracy of the generated LSMs

441 is determined by the AUC. The AUC shows whether there were more correctly labeled pixels

442 than incorrectly labeled pixels. Greater AUC values suggest a more accurate susceptibility

443 map, and vice versa. The susceptibility map is meaningful if the AUC values are close to unity

444 or one. A map with a value of 0.5 is considered insignificant since it was created by chance.

445 (Baird, 2013).

446 Figure 11 shows the accuracy values obtained using the ROC technique for the statistical

447 approaches of FR-EBF and machine learning approaches of RF and XG-Boost. XG-Boost

448 shows the highest accurate results with an AUC value of 0.91 and RF with 0.906, and FR-EBF

449 with 0.836 (refer to Table 3). These results are quite good as it is closer to unity or one. The

450 ensemble of FR-EBF shows lower AUC values than the machine learning-based XG-Boost and

451 Random Forest. Machine learning results may vary as the models used landslides and non-

452 landslides features as training data, whereas results of FR-EBF are derived only from the

453 landslide data. The results could vary based on the geographical location and the selection of

454 landslide conditioning factors as well. Machine learning results may differ because the models

455 used landslide and non-landslide features as training data, whereas FR-EBF results are derived

40
456 solely from landslide data. The results may differ depending on the geographical location and

457 the selection of landslide conditioning factors.

458

459

460 Figure 11. The ROC represents the success rate curves Testing for the performance of the

461 statistical and machine learning models for LSM in Belluno province, Italy.for the statistical

462 based and machine learning models for LSM in Belluno province, Italy

463

464 5.2 Relative Landslide Density (R-Index)

465 The relative landslide density index was also used to assess the accuracy of the LSMs that

466 resulted (R-index). Equation (4) is used to get the R-index:

467 R = (ni/Ni)/Σ(ni/Ni)) ×100 (Eq.4)

468

469 where Ni is the percentage of landslides in each susceptibility class and ni is the percentage of

470 land area susceptible to landslides in each susceptibility class Table 4 shows the quantile

41
471 classification approach to classify the six landslide susceptibility maps into five susceptible

472 groupsclasses. In comparison to the RF and FR-EBF models, the XG-Boost model with 14 and

473 10 features has a higher R-index for very high susceptibility classes. The R-index findings

474 show that FR EBF has a better R-index value for high susceptibility class than XG-Boost,

475 which has the lowest R-index for high susceptibility class. FR-EBF has a higher r-index value

476 for the high susceptibility class than the other three approaches. In addition, the R-index of FR-

477 EBF is higher for the very low susceptible class. Table 4 shows the R-index values for

478 susceptibility class in FR-EBF, RF, and XG-Boost, as well as plots of the same in figure 12.

479

480 Table 4: R-indices for the FR-EBF, RF, and XG-Boost models' landslide susceptibility

481 mappings (LSMs).

Validation Susceptibility Number of Area (%) Number of Landslide


Area (km²) R- index
methods class pixels (ni) landslides (%) (Ni)

FR-EBF-14
Very Low
Features 21875 334248750 9.28 48 2.71 6

Low 90000 570760000 15.85 171 9.66 13

Moderate 165000 896709375 24.90 308 17.40 15

High 263750 1026578125 28.50 460 25.99 20

Very High 444375 773585000 21.48 783 44.24 45

FR-EBF-9
Very Low
Features 19375 323332500 8.98 38 2.15 5

Low 91875 541371875 15.03 179 10.11 15

Moderate 153125 894758125 24.84 289 16.33 15

High 276875 1041846875 28.93 480 27.12 21

Very High 443750 800571875 22.23 784 44.29 44

42
RF-14
Very Low
Features 6875 682346250 18.94 11 0.62 1

Low 34375 658375000 18.28 55 3.11 4

Moderate 75625 619031875 17.19 122 6.89 9

High 159375 749470625 20.81 264 14.92 17

Very high 712500 892657500 24.78 1318 74.46 69

RF-9
Very Low
Features 7500 735246875 20.41 12 0.68 1

Low 30000 632679375 17.57 48 2.71 4

Moderate 75000 581844375 16.15 120 6.78 10

High 147500 692276250 19.22 245 13.84 17

Very High 729375 959834375 26.65 1345 75.99 68

XG-Boost-
Very Low
14 Features 11250 1076978750 29.90 18 1.02 1

Low 6875 330045625 9.16 11 0.62 3

Moderate 11875 278243750 7.72 19 1.07 5

High 11250 352568125 9.79 18 1.02 4

Very High 947500 1564045000 43.42 1704 96.27 87

Very Low 12500 1094226250 30.38 20 1.13 1

Low 7500 297782500 8.27 12 0.68 3

XG-Boost-
Moderate
10 Features 8125 242914375 6.74 13 0.73 4

High 15625 314181875 8.72 25 1.41 7

Very High 945000 1652776250 45.89 1700 96.05 84

482

483 6. Discussion

484 Landslides are very dynamic in nature, meaning that their behaviour, movement, and spatial

485 distribution changes over space and time. Therefore, it is vitalimportant to analyse the

43
486 significance of the conditioning factors that lead to landslide occurrences. The relevance of the

487 conditioning features for LSM is essential to realize which of the features had the biggest

488 impact on the prediction of landslide occurrences. As not all features conditioning factor maps

489 can be available globally, or sometimes even locally, due to reasons such as variousnon-

490 compliance in sharing data restriction, or data unavailability, erroneous data structure, and

491 others, it can be worthwhile to understand which of the available conditioning factors play is

492 essential to choose thean important features role in LSMwhich could be available for most use

493 cases. For example, topographical features derived from digital elevation models such as

494 Elevation, Slope, aspect, Plan curvature, Profile curvature, TWI, TPI, TRI are available almost

495 globally because of missions such as the Shuttle Radar Topography Mission (SRTM). Other

496 features, such as distance to roads and drainage networks, that might have direct or indirect

497 influence on the occurrence of landslides, can also be easily accessed through numerous open-

498 source platforms. However, conditioning factor maps of rainfall data derived from rain gauge

499 stations are not easily accessible and available. In this study, we used fourteen features for

500 landslide susceptibility assessment and carried out the feature importance test of the

501 conditioning factors using for traditional statistical ensemble model of FR-EBF and machine

502 learning models of RF and XG-Boost. The feature selection approach from statistical model is

503 dependent upon the landslide data and its relation to each feature and their classes. On the other

504 hand, feature selection and determining their importance usingfor machine learning models

505 depends upon the landslide and non-landslide samples that are used to train the models. We

506 used the in-built impurity feature importance algorithm to assess the importance of the features

507 during the model training phases. Based on literature review for this sort of study, there is no

508 standard threshold values available for discarding or selection of features for LSM. In this

509 study, we used a trial-and-error approach to determine a threshold of 0.30 for the selection of

510 features conditioning factors used for landslide susceptibility for all the three models.

44
511 Feature importance algorithms used in this study are different, however there is similarity in

512 the importance of the features in both statistical and machine learning algorithms (See figure 6

513 and 10). As we look at the figures 7, 8, and 9 in the enlarged region, we observe that there are

514 not many differences despite removing the least important features. The reason for such

515 observation can be linked to the lower impact of least important factors on overall LSM results.

516 Furthermore, there are several factors that determine the importance of features for carrying

517 out LSM such as (1) completeness and quality of the landslide inventory dataset used for

518 analysis, (2) mapping scale of the features maps like landcover, lithology, or other geological

519 features. If the spatial locations of landslides in an inventory does not represent the ground

520 truth phenomenon, then there can be negative impact of landslide input data for feature

521 selection. Most importantly, the type of landslide inventory data also impacts the landslide

522 feature selection algorithms, such as landslides mapped as points and polygons. Sampling

523 methodology of landslide selection is important, there are various ways to use landslides in

524 carrying out susceptibility assessment, many studies have used 70-30 ratio and others have

525 used random sampling or K-fold sampling methods (Merghadi et al., 2018;Chen et al., 2018).

526 One of the most important observations from this study was the reclusion of the "least important

527 featuresfactors" in the context of LSM. The fact that despite removal of certain conditioning

528 factors, we still get very good results or comparable results post after feature removing

529 themremoval., T this observation annotatesexplains the use ofemploying very the important

530 features conditioning factors are enough for LSM which can be obtained for most of the use

531 cases.

532 The use of landslide samples along with non-landslide samples can affect the landslide feature

533 importance as can be seen in results in this study. In the case of the statistical model, one of the

534 reasons for the lower AUC performance can be accredited to the absence of the non-landslide

535 samples. As the model was trained without non-landslide samples and simply trained with only

45
536 landslide samples, Therefore, the model's ability to discriminate between the non-landslide and

537 landslide pixels is affected hencetherefore, predicting landslide occurrences over potentially

538 non-landslide locations. Thus, thisBecause of this reason, the statistical model exhibiteds the

539 homogeneous distribution of predicted landslide pixels (see figure 7). We used landslides and

540 non-landslide samples for training the ML models which shows varying results from that of

541 the statistical ensemble model (See figure 8 and 9). There is more homogeneous distribution

542 of landslide susceptibility classes in statistical model results, but it is evident from the machine

543 learning results that the non-landslide samples have a greater impact on final landslide

544 susceptibility results.

545

546 7. Conclusions

547 8.7.

548 In context of theIn the current state-of-the-art approaches for LSM, the contemporary literature

549 lays emphasis on dthe advent of different models for improving accuracy of landslide

550 occurrences susceptibility against the test data. However, this study investigated how the

551 conditioning factors affect the overall prediction of landslides in the context of northeast Italy,

552 Belluno province. An important aspect of this study was to identify if at all, removing the “least

553 important” conditioning factors in the modelling process affects the performance in predicting

554 new unknown landslides.

555 As understood, ML models require conditioning factors as input for LSM, however, investing

556 on the importance of the features (conditioning factors) could possibly direct provide a better

557 understanding of landslide occurrences with respect to the available factor/featureconditioning

558 factor maps for LSM. This study indicates that various models behave differently with different

559 features, whereby the same features that are important in one instance of a particular model,

560 can be the least important (even null-void) in other models. Therefore, this study gave gives

46
561 new insights towards tthe application and he use of already available conditioning factor maps,

562 without spending/exhausting resources for generating other conditioning factor maps

563 maps/features that would otherwise might not be available, thus suggesting a streamlined

564 acquisition of data and modelling of landslide occurrences for future events..

565 In this study we also concluded that the landslides and non-landslides samples impacts the

566 feature importance, especially in the ML models, and in as these models use inputs in the form

567 of landslides and non-landslides samples. contrast, the statistical model used only landslide

568 samples. Therefore, it was found to be crucial in asserting a balance between the two data

569 samples to avoid overfitting or underfitting. This study illustrates that feature selection is very

570 important step of carrying out LSMs. We found that there are differences in the final LSMs

571 derived from the statistical and ML models, which are attributed to the above-mentioned

572 sample selection techniques.

573 This research introduces the importance of post-training feature importance algorithms for

574 LSM. This approach can also be used to assess the susceptibility of other natural disasters. The

575 results can eventually comment whether certain conditioning factors can be discarded while

576 modelling landslide occurrences. In many parts of the globe, the availability of data is scarce

577 and therefore, with the ability to model landslides without relying on the conventional factors,

578 we can still predict landslides spatially over a given region. Although there are certain

579 drawbacks like (1) the same factor maps will not be available everywhere, (2) factors that are

580 least important in one region might not repeat the same behaviour in other regions of the world,

581 and (3) model capability changes with respect to different regions, the resulting susceptibility

582 maps can still give quality information for local emergency relief measures, planning of disaster

583 risk reduction, mitigation, and to evaluate potentially affected areas.

584

47
585 Funding: This research was funded by the Veneto Region, VAIA-LANDslides project,

586 Research Unit UNIPD-GEO, Principal Investigator Mario Floris.

587

588 References:

589 Baglioni, A., Tosoni, D., De Marco, P., and Arziliero, L.: Analisi del dissesto da frana in
590 Veneto, 2006.
591 Baird, C.: Comparison of Risk Assessment Instruments in Juvenile Justice, 2013.
592 Boretto, G., Crema, S., Marchi, L., Monegato, G., Arziliero, L., and Cavalli, M.: Assessing the
593 effect of the Vaia storm on sediment source areas and connectivity storm in the Liera catchment
594 (Dolomites), Copernicus Meetings, 2021.
595 Brabb, E. E., Pampeyan, E. H., and Bonilla, M. G.: Landslide susceptibility in San Mateo
596 County, California, Reston, VA, Report 360, 1972.
597 Breiman, L.: Random Forests, Machine Learning, 45, 5-32, 10.1023/A:1010933404324, 2001.
598 Can, R., Kocaman, S., and Gokceoglu, C.: A Comprehensive Assessment of XGBoost
599 Algorithm for Landslide Susceptibility Mapping in the Upper Basin of Ataturk Dam, Turkey,
600 Applied Sciences, 11, 4993, 2021.
601 Castellanos Abella, E. A., and Van Westen, C. J.: Qualitative landslide susceptibility
602 assessment by multicriteria analysis: A case study from San Antonio del Sur, Guantánamo,
603 Cuba, Geomorphology, 94, 453-466, 10.1016/j.geomorph.2006.10.038, 2008.
604 Catani, F., Lagomarsino, D., Segoni, S., and Tofani, V.: Landslide susceptibility estimation by
605 random forests technique: sensitivity and scaling issues, Natural Hazards and Earth System
606 Sciences, 13, 2815-2831, 10.5194/nhess-13-2815-2013, 2013.
607 Chacón, J., Irigaray, C., Fernández, T., and El Hamdouni, R.: Engineering geology maps:
608 landslides and geographical information systems, Bulletin of Engineering Geology and the
609 Environment, 65, 341-411, 10.1007/s10064-006-0064-z, 2006.
610 Chen, T., Trinder, J. C., and Niu, R.: Object-oriented landslide mapping using ZY-3 satellite
611 imagery, random forest and mathematical morphology, for the Three-Gorges Reservoir, China,
612 Remote sensing, 9, 333, 2017.
613 Chen, W., Peng, J. B., Hong, H. Y., Shahabi, H., Pradhan, B., Liu, J. Z., Zhu, A. X., Pei, X. J.,
614 and Duan, Z.: Landslide susceptibility modelling using GIS-based machine learning techniques
615 for Chongren County, Jiangxi Province, China, Science of the Total Environment, 626, 1121-
616 1135, 10.1016/j.scitotenv.2018.01.124, 2018.
617 Chung, C.-J. F., and Fabbri, A. G.: Validation of Spatial Prediction Models for Landslide
618 Hazard Mapping, Natural Hazards, 30, 451-472, 10.1023/B:NHAZ.0000007172.62651.2b,
619 2003.
620 Compagnoni, B., Galluzzo, F., Bonomo, R., and Tacchia, D.: Carta geologica d'Italia,
621 Dipartimento difesa del suolo, 2005.
622 Corò, D., Galgaro, A., Fontana, A., and Carton, A.: A regional rockfall database: the Eastern
623 Alps test site, Environmental Earth Sciences, 74, 1731-1742, 10.1007/s12665-015-4181-5,
624 2015.
625 Dahal, R. K., Hasegawa, S., Nonomura, A., Yamanaka, M., Masuda, T., and Nishino, K.: GIS-
626 based weights-of-evidence modelling of rainfall-induced landslides in small catchments for
627 landslide susceptibility mapping, Environmental Geology, 54, 311-324, 10.1007/s00254-007-
628 0818-3, 2008.

48
629 Dai, F. C., Lee, C. F., and Ngai, Y. Y.: Landslide risk assessment and management: an
630 overview, Engineering Geology, 64, 65-87, https://doi.org/10.1016/S0013-7952(01)00093-X,
631 2002.
632 Desiato, F., Lena, F., Baffo, F., Suatoni, B., Toreti, A., di Ecologia Agraria, U. C., and
633 Romagna, A. E.: Indicatori del clima in Italia, APAT, Roma, 2005.
634 Doglioni, C.: Thrust tectonics examples from the Venetian Alps, 1990.
635 Dunning, S., Massey, C., and Rosser, N.: Structural and geomorphological features of
636 landslides in the Bhutan Himalaya derived from terrestrial laser scanning, Geomorphology,
637 103, 17-29, 2009.
638 Dury, G.: Hillslope form and Process. M.A. Carson and M.J. Kirkby, 1972. Cambridge
639 University Press, London, vii + 475 pp., £ 6.60, 1972,
640 Ercanoglu, M., and Gokceoglu, C.: Assessment of landslide susceptibility for a landslide-prone
641 area (north of Yenice, NW Turkey) by fuzzy approach, Environmental Geology, 41, 720-730,
642 10.1007/s00254-001-0454-2, 2002.
643 Floris, M., Iafelice, M., Squarzoni, C., Zorzi, L., De Agostini, A., and Genevois, R.: Using
644 online databases for landslide susceptibility assessment: an example from the Veneto Region
645 (northeastern Italy), Nat. Hazards Earth Syst. Sci., 11, 1915-1925, 10.5194/nhess-11-1915-
646 2011, 2011.
647 Forzieri, G., Pecchi, M., Girardello, M., Mauri, A., Klaus, M., Nikolov, C., Rüetschi, M.,
648 Gardiner, B., Tomaštík, J., Small, D., Nistor, C., Jonikavicius, D., Spinoni, J., Feyen, L.,
649 Giannetti, F., Comino, R., Wolynski, A., Pirotti, F., Maistrelli, F., Savulescu, I., Wurpillot-
650 Lucas, S., Karlsson, S., Zieba-Kulawik, K., Strejczek-Jazwinska, P., Mokroš, M., Franz, S.,
651 Krejci, L., Haidu, I., Nilsson, M., Wezyk, P., Catani, F., Chen, Y. Y., Luyssaert, S., Chirici,
652 G., Cescatti, A., and Beck, P. S. A.: A spatially explicit database of wind disturbances in
653 European forests over the period 2000–2018, Earth Syst. Sci. Data, 12, 257-276, 10.5194/essd-
654 12-257-2020, 2020.
655 Gariano, S. L., Verini Supplizi, G., Ardizzone, F., Salvati, P., Bianchi, C., Morbidelli, R., and
656 Saltalippi, C.: Long-term analysis of rainfall-induced landslides in Umbria, central Italy,
657 Natural Hazards, 106, 2207-2225, 10.1007/s11069-021-04539-6, 2021.
658 Ghorbanzadeh, O., Rostamzadeh, H., Blaschke, T., Gholaminia, K., and Aryal, J.: A new GIS-
659 based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-
660 fold cross-validation approach for land subsidence susceptibility mapping, Natural Hazards,
661 94, 497-517, 10.1007/s11069-018-3449-y, 2018.
662 Glade, T., Anderson, M. G., and Crozier, M. J.: Landslide hazard and risk, John Wiley & Sons,
663 2006.
664 Goetz, J. N., Brenning, A., Petschko, H., and Leopold, P.: Evaluating machine learning and
665 statistical prediction techniques for landslide susceptibility modeling, Computers &
666 Geosciences, 81, 1-11, 10.1016/j.cageo.2015.04.007, 2015.
667 Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., and Galli, M.: Estimating the
668 quality of landslide susceptibility models, Geomorphology, 81, 166-184,
669 10.1016/j.geomorph.2006.04.007, 2006.
670 Huang, F., Chen, J., Du, Z., Yao, C., Huang, J., Jiang, Q., Chang, Z., and Li, S.: Landslide
671 Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning
672 Models, ISPRS International Journal of Geo-Information, 9, 377, 2020.
673 Iadanza, C., Trigila, A., Starace, P., Dragoni, A., Biondo, T., and Roccisano, M.: IdroGEO: A
674 Collaborative Web Mapping Application Based on REST API Services and Open Data on
675 Landslides and Floods in Italy, ISPRS International Journal of Geo-Information, 10, 89, 2021.
676 Komac, M.: A landslide susceptibility model using the Analytical Hierarchy Process method
677 and multivariate statistics in perialpine Slovenia, Geomorphology, 74, 17-28,
678 10.1016/j.geomorph.2005.07.005, 2006.

49
679 Lee, S.: Landslide detection and susceptibility mapping in the Sagimakri area, Korea using
680 KOMPSAT-1 and weight of evidence technique, Environmental Earth Sciences, 70, 3197-
681 3215, 10.1007/s12665-013-2385-0, 2013.
682 Linden, A.: Measuring diagnostic and predictive accuracy in disease management: an
683 introduction to receiver operating characteristic (ROC) analysis, Journal of evaluation in
684 clinical practice, 12, 132-139, 2006.
685 Liu, L.-L., Yang, C., and Wang, X.-M.: Landslide susceptibility assessment using feature
686 selection-based machine learning models, Geomechanics and Engineering, 25, 1-16, 2021.
687 Melville, B., Lucieer, A., and Aryal, J.: Object-based random forest classification of Landsat
688 ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities
689 in Tasmania, Australia, International journal of applied earth observation and geoinformation,
690 66, 46-55, 2018.
691 Merghadi, A., Abderrahmane, B., and Tien Bui, D.: Landslide Susceptibility Assessment at
692 Mila Basin (Algeria): A Comparative Assessment of Prediction Capability of Advanced
693 Machine Learning Methods, ISPRS International Journal of Geo-Information, 7,
694 10.3390/ijgi7070268, 2018.
695 Meten, M., PrakashBhandary, N., and Yatabe, R.: Effect of Landslide Factor Combinations on
696 the Prediction Accuracy of Landslide Susceptibility Maps in the Blue Nile Gorge of Central
697 Ethiopia, Geoenvironmental Disasters, 2, 9, 10.1186/s40677-015-0016-7, 2015.
698 Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, M., and
699 Kanevski, M.: Machine Learning Feature Selection Methods for Landslide Susceptibility
700 Mapping, Mathematical Geosciences, 46, 33-57, 10.1007/s11004-013-9511-0, 2014.
701 Mondal, S., and Maiti, R.: Integrating the analytical hierarchy process (AHP) and the frequency
702 ratio (FR) model in landslide susceptibility mapping of Shiv-khola watershed, Darjeeling
703 Himalaya, International Journal of Disaster Risk Science, 4, 200-212, 2013.
704 Pham, B. T., Tien Bui, D., Pourghasemi, H. R., Indra, P., and Dholakia, M. B.: Landslide
705 susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of
706 prediction capability of naïve bayes, multilayer perceptron neural networks, and functional
707 trees methods, Theoretical and Applied Climatology, 128, 255-273, 10.1007/s00704-015-
708 1702-9, 2015.
709 Pham, B. T., Tien Bui, D., and Prakash, I.: Bagging based Support Vector Machines for spatial
710 prediction of landslides, Environmental Earth Sciences, 77, 146, 10.1007/s12665-018-7268-y,
711 2018.
712 Pourghasemi, H. R., Pradhan, B., and Gokceoglu, C.: Application of fuzzy logic and analytical
713 hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran, Natural
714 hazards, 63, 965-996, 2012.
715 Pradhan, B.: Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy
716 logic and multivariate logistic regression approaches, Journal of the Indian Society of Remote
717 Sensing, 38, 301-320, 10.1007/s12524-010-0020-z, 2010.
718 Raja, N. B., Çiçek, I., Türkoğlu, N., Aydin, O., and Kawasaki, A.: Landslide susceptibility
719 mapping of the Sera River Basin using logistic regression model, Natural Hazards, 85, 1323-
720 1346, 10.1007/s11069-016-2591-7, 2017.
721 Reichenbach, P., Rossi, M., Malamud, B. D., Mihir, M., and Guzzetti, F.: A review of
722 statistically-based landslide susceptibility models, Earth-Science Reviews, 180, 60-91,
723 10.1016/j.earscirev.2018.03.001, 2018.
724 Riley, S. J., DeGloria, S. D., and Elliot, R.: Index that quantifies topographic heterogeneity,
725 intermountain Journal of sciences, 5, 23-27, 1999.
726 Rossi, M., Guzzetti, F., Salvati, P., Donnini, M., Napolitano, E., and Bianchi, C.: A predictive
727 model of societal landslide risk in Italy, Earth-Science Reviews, 196, 102849,
728 https://doi.org/10.1016/j.earscirev.2019.04.021, 2019.

50
729 Sahin, E. K.: Assessing the predictive capability of ensemble tree methods for landslide
730 susceptibility mapping using XGBoost, gradient boosting machine, and random forest, SN
731 Applied Sciences, 2, 1308, 10.1007/s42452-020-3060-1, 2020.
732 Sauro, F., Zampieri, D., and Filipponi, M.: Development of a deep karst system within a
733 transpressional structure of the Dolomites in north-east Italy, Geomorphology, 184, 51-63,
734 https://doi.org/10.1016/j.geomorph.2012.11.014, 2013.
735 Schönborn, G.: Balancing cross sections with kinematic constraints: The Dolomites (northern
736 Italy), Tectonics, 18, 527-545, 1999.
737 Senouci, R., Taibi, N.-E., Teodoro, A. C., Duarte, L., Mansour, H., and Yahia Meddah, R.:
738 GIS-Based Expert Knowledge for Landslide Susceptibility Mapping (LSM): Case of
739 Mostaganem Coast District, West of Algeria, Sustainability, 13, 630, 2021.
740 Shahabi, H., Khezri, S., Ahmad, B. B., and Hashim, M.: Landslide susceptibility mapping at
741 central Zab basin, Iran: a comparison between analytical hierarchy process, frequency ratio and
742 logistic regression models, Catena, 115, 55-70, 2014.
743 Shahabi, H., and Hashim, M.: Landslide susceptibility mapping using GIS-based statistical
744 models and Remote sensing data in tropical environment, Scientific reports, 5, 9899, 2015.
745 Stanley, T. A., Kirschbaum, D. B., Benz, G., Emberson, R. A., Amatya, P. M., Medwedeff,
746 W., and Clark, M. K.: Data-Driven Landslide Nowcasting at the Global Scale, Frontiers in
747 Earth Science, 9, 10.3389/feart.2021.640043, 2021.
748 Trigila, A., Iadanza, C., and Spizzichino, D.: Quality assessment of the Italian Landslide
749 Inventory using GIS processing, Landslides, 7, 455-470, 10.1007/s10346-010-0213-0, 2010.
750 Trigila, A., and Iadanza, C.: Landslides and floods in Italy: hazard and risk indicators -
751 Summary Report 2018, 2018.
752 van Westen, C. J., Castellanos, E., and Kuriakose, S. L.: Spatial data for landslide
753 susceptibility, hazard, and vulnerability assessment: An overview, Engineering Geology, 102,
754 112-131, 10.1016/j.enggeo.2008.03.010, 2008.
755 Youssef, A. M., and Pourghasemi, H. R.: Landslide susceptibility mapping using machine
756 learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi
757 Arabia, Geoscience Frontiers, 12, 639-655, 10.1016/j.gsf.2020.05.010, 2021.
758

51

You might also like