Environmental Modelling and Software 119 (2019) 285-304

Environmental Modelling and Software 119 (2019) 285–304
Contents lists available at ScienceDirect
Environmental Modelling & Software

journal homepage: www.elsevier.com/locate/envsoft
A review of artificial neural network models for ambient air pollution T

prediction
Sheen Mclean Cabanerosa,∗, John Kaiser Calautitb, Ben Richard Hughesa
a
Department of Mechanical and Aerospace Engineering, University of Strathclyde, Glasgow, United Kingdom
b
Department of Architecture and Built Environment, The University of Nottingham, United Kingdom
A R T I C LE I N FO A B S T R A C T
Keywords: Research activity in the field of air pollution forecasting using artificial neural networks (ANNs) has increased
Air pollution dramatically in recent years. However, the development of ANN models entails levels of uncertainty given the
Artificial neural networks black-box nature of ANNs. In this paper, a protocol by Maier et al. (2010) for ANN model development is
Multilayer perceptron presented and applied to assess journal papers dealing with air pollution forecasting using ANN models. The
Forecasting
majority of the reviewed works are aimed at the long-term forecasting of outdoor PM10, PM2.5, and oxides of
Backpropagation algorithm
nitrogen, and ozone. The vast majority of the identified works utilised meteorological and source emissions
predictors almost exclusively. Furthermore, ad-hoc approaches are found to be predominantly used for de-
termining optimal model predictors, appropriate data subsets and the optimal model structure. Multilayer
perceptron and ensemble-type models are predominantly implemented. Overall, the findings highlight the need
for developing systematic protocols for developing powerful ANN models.
1. Introduction et al., 2007a,b; Elangasinghe et al., 2014; Hrust et al., 2009; P. Jiang
et al., 2017; Sun et al., 2013). On the other hand, the use of ANNs does
There is a growing interest in recent years on the use of artificial not require an in-depth understanding of the dynamics between air
neural networks (ANNs) in predicting and forecasting ambient air pollution concentration levels and other explanatory variables. Lastly,
pollution. As poor air quality in urban areas has been attributed to powerful and less-complicated computing tools that are able to develop
chronic diseases and premature mortalities of vulnerable members of and implement ANNs and their training algorithms are becoming more
the public (Organisation for Economic Co-operation and Development, available to the public in recent years (Gardner and Dorling, 1998;
2016; World Health Organization, 2016), a greater demand is directed Sharma et al., 2005).
towards policy-makers and urban city planners to provide rapid and However, the development of ANN models and the interpretation of
parsimonious solutions to circumvent the effects of air pollution their results entail certain issues despite their success in many appli-
(Baklanov et al., 2007; Moustris et al., 2010). In recent years, ANNs cations. Data-driven models are generally problem-specific, thus a one-
have been successfully implemented in many short- and long-term size-fits-all approach on building them is clearly not available (Gardner
forecasting applications (Biancofiore et al., 2017a; Cabaneros et al., and Dorling, 1998; Guoqiang Zhang and Patuwo, 1998; Hagan et al.,
2017; Coman et al., 2008; Ibarra-Berastegi et al., 2008; Lightstone 1995; Maier and Dandy, 2000). Nonetheless, several authors argued
et al., 2017; Rahimi, 2017). Furthermore, more practitioners resort to that more general and consistent protocols that outline the complete
data-driven approaches such as ANNs as alternatives to traditional building process of ANN models should still be established (Galelli
deterministic or physics-based approaches, e.g. the Urban Airshed et al., 2014; Jakeman et al., 2006; Maier and Dandy, 2000; Maier et al.,
Model (UAM) (Chang and Cardelino, 2000), Weather Research and 2010; Wu et al., 2014).
Forecasting Model with Chemistry (WRF/Chem) (Chuang et al., 2011) Jakeman et al. (2006) outlined ten basic steps to developing and
and Community Multiscale Air Quality model (CMAQ) (Mueller and evaluating environmental models, and argued that modellers should
Mallard, 2011). This is because deterministic approaches are sensitive provide enough information that describes and justifies the choice of
to several factors, including the scale and quality of the parameters model parameters, their development and evaluation. Maier and Dandy
involved, computationally expensive, and dependent to large databases (2000) emphasised that a lack of a comprehensive guide to ANN model
of several input parameters, of which some may not be available (Dutot building makes it difficult for future modellers to draw meaningful
∗
Corresponding author. Level 8, James Weir Building, 75 Montrose St, Glasgow, G1 1XJ, United Kingdom.
E-mail address: [email protected] (S.M. Cabaneros).
https://doi.org/10.1016/j.envsoft.2019.06.014
Received 25 June 2018; Received in revised form 25 June 2019; Accepted 26 June 2019
Available online 30 June 2019
1364-8152/ © 2019 Published by Elsevier Ltd.
S.M. Cabaneros, et al. Environmental Modelling and Software 119 (2019) 285–304
Fig. 1. Graphical representation of the methodology used for selecting the papers reviewed.
comparisons between existing models. After reviewing and analysing Technology & Innovation, Evolving Systems, Expert Systems with
43 papers dealing with the use of ANNs in predicting and forecasting Applications, Frontiers of Earth Science, Geophysical Research Letters,
water resources variables based on the modelling methodology they IEEE Access, IEEE Transactions on Intelligent Transportation Systems,
suggested, the authors reported that many of the papers carried out the International Journal of Environmental Research and Public Health,
modelling process incorrectly. Maier et al. (2010) also encountered the International Journal of Environmental Studies, Journal of
same methodological concerns after reviewing 210 papers dealing with Environmental Engineering and Science, Journal of Environmental
the prediction of water resource variables that were published from Management, Journal of Environmental Protection and Ecology,
1999 to 2007. Furthermore, Wu et al. (2014) argued that a justification Journal of the Air & Waste and Management Association, Knowledge-
for the use of particular methods and parameter values during the ANN Based Systems, Mathematical Geosciences, Neural Computing &
model building process should also be provided to increase the level of Applications, Neurocomputing, Pure and Applied Geophysics, Science
confidence in the model results. of the Total Environment, Sensors, Soft Computing, Sustainable Cities
Consequently, papers dealing with the use of ANNs in air pollution and Society, Urban Climate, and Water, Air, & Soil Pollution.
prediction and forecasting are surveyed and assessed through the The papers were identified through the StarPlus Library catalogues
modelling protocols suggested by Wu et al. (2014). Furthermore, the by the University of Sheffield, England (The University of Sheffield,
latest development regarding the modelling of air pollutant con- 2017), ScienceDirect (2017), ProQuest (2017), and IEEE Xplore® Digital
centration levels using ANNs is examined. Additionally, only papers Library (IEEE Xplore, 2017). Search terms included “air pollution
dealing with the modelling of outdoor air pollution, especially in urban forecasting”, “air pollution modelling”, “artificial neural networks”,
and industrial areas, are considered, although the threats attributed to “ANN”, “multilayer perceptron”, “RNN”, “LSTM”, “NARX” and “ma-
indoor air pollution also cannot be neglected (Kotzias et al., 2009; chine learning” with different combinations from previous review pa-
Symonds et al., 2016). Despite the popularity of feedforward types of pers and standalone articles. The process was repeated until the citation
ANNs, especially the Multilayer Perceptron (MLP) network, in many trails stopped, see Fig. 1. Furthermore, the list of references of the se-
forecasting and prediction applications (Gardner and Dorling, 1998; lected research articles were investigated to identify further references.
Shahraiyni and Sodoudi, 2016; Sharma et al., 2005), other forms of The authors selected the papers published from January 2001 to
ANN models are examined. The readers are expected throughout this February 2019. Only papers dealing with the forecasting of outdoor air
work to be familiar with the concepts and terminologies related to pollutants were considered, although the threat of indoor pollutants is
ANNs. Detailed discussions of the subject matter can be found in other as serious for investigation given that an average person spends more
papers and textbooks (see Bishop, 1995; Gardner and Dorling, 1998; than 90% of their time indoors (Ashmore et al., 2001). Additionally,
Hagan et al., 1995; Hornik et al., 1989; Samarasinghe, 2006). papers that used ANN models which failed to outperform or provide
The remainder of this paper is organised as follows. In Section 2, similar results to other alternative techniques were not selected. Fi-
details on how the reviewed papers were selected, as well as an over- nally, papers published as conference proceedings were manually re-
view of the research activity in the use of ANNs for the prediction and moved from the initial list of papers, resulting in 139 peer-reviewed
forecasting of ambient air pollutant concentrations from 2001 to 2019, articles. The key details of the selected articles, e.g. the authors, year of
are given. In Section 3, the ANN model development protocols are publication, study location, and air pollutant(s) examined, are pre-
briefly outlined. A comprehensive discussion of each step can be found sented in Table 1.
elsewhere (see Galelli et al., 2014; Humphrey et al., 2017; Jakeman The distribution of articles by year of publication is given in Fig. 2.
et al., 2006; Maier and Dandy, 2000). The section also provides the There is a growing number of published articles since 2001 that cite the
taxonomies of available options at the various steps in this process use of air pollution forecasting tools based on ANNs, with almost 50%
suggested by Maier and Dandy (2000). A total of 139 papers are then of the identified papers published since 2015 alone. This can be well
examined based on the given taxonomies. In Section 4, conclusions are explained by the emerging computing technologies tailored for the
then provided. Finally, reccommendations for future research are given development of ANN models that are not easily accessible in the past.
in Section 5. That is, faster and more powerful computing tools capable of per-
forming ANN training algorithms and processing big data are becoming
2. Overview available recently (IEEE Spectrum, 2018).
The number of papers in which air pollutant variables were con-
The selected papers in this work are taken from the following in- sidered is shown in Fig. 3. The results reveal that airborne particulate
ternational peer-reviewed journals: Air Quality, Atmosphere & Health, matter with an aerodynamic diameter smaller than 10 μm (PM10) and
Atmosphere, Atmospheric Environment, Atmospheric Pollution 2.5 μm (PM2.5), oxides of nitrogen, e.g. NO2, NO and NOx, and ozone
Research, Building and Environment, Chemosphere, Clean – Soil Air are the most examined variables among the papers identified. Parti-
Water, Ecological Modelling, Ecological Processes, Engineering culate matter was studied in 87 of the 139 papers reviewed, almost 50%
Applications of Artificial Intelligence, Engineering Computations, of which dealt with PM10 modelling while almost 45% of which for
Environmental Forensics, Environmental Modelling & Assessment, PM2.5. This is followed by the oxides of nitrogen (51 papers) and ozone
Environmental Modelling and Software, Environmental Monitoring and (44 papers). The modelling of CO and SO2 pollutants was also examined
Assessment, Environmental Pollution, Environmental Science & Policy, by a considerable amount of papers, e.g. 18 and 23 papers, respectively.
Environmental Science and Pollution Research, Environmental It is also worth noting that at least a third of the papers identified
286
Table 1 Table 1 (continued)

Details of papers reviewed.
Authors (year) Location(s) Air pollutants
Authors (year) Location(s) Air pollutants examined
examined
Gennaro et al. (2013) Northeast Spain PM10
Kolehmainen et al. (2001) Stockholm, Sweden NO2 Moustris et al. (2013) Greater Athens Area, PM10
Perez and Trier (2001) Santiago, Chile NO; NO2 Greece
Chelani et al. (2002) Delhi, India SO2 Papaleonidas and Iliadis Athens, Greece O3
Abdul-Wahab and Al-Alawi Khaldiya, Kuwait O3 (2013)
(2002) Russo et al. (2013) Lisbon, Portugal NO2
Kukkonen et al. (2003) Helsinki, Finland NO2 Singh et al. (2012) Lucknow, India RSPM; SO2; NO2
Lu et al. (2003) Hong Kong RSP Siwek and Osowski (2012) Warsaw, Poland PM10
Wang et al. (2003) Mong Kok, Hong Kong RSP Antanasijevic et al. (2013) 26 EU countries PM10
Hasham et al. (2004) Edmonton, Canada NOx Arhami et al. (2013) Tehran, Iran CO; NOx; NO; NO2;
Heo and Kim (2004) Seoul, Korea O3 O3
Jiang et al. (2004) Shanghai, China TSP; SO2; NOx Gennaro et al. (2013) Northeast Spain PM10
Niska et al. (2004) Helsinki, Finland NO2 Moustris et al. (2013) Greater Athens Area, PM10
Nunnari (2004) Syracuse, Italy SO2 Greece
Olcese and Toselli (2004) Cordoba, Argentina ? Papaleonidas and Iliadis Athens, Greece O3
Chelani et al. (2005) Kolkata, India NO2 (2013)
Hooyberghs et al. (2005) Belgium PM10 Russo et al. (2013) Lisbon, Portugal NO2
Niska et al. (2005) Helsinki, Finland NO2; PM2.5 Ul-Saufie et al. (2013) Negeri Sembilan, Malaysia PM10
Ordieres et al. (2005) Ciudad Juarez and El Paso, PM2.5 Yan and Jian (2013) Hangzhou, China PM10; PM2.5
Mexico Zhang et al. (2013) Taiyuan, China PM10
Agirre-Basurko et al. (2006) Bilbao, Spain O3; NO2 Azid et al. (2014) Malaysia API
Grivas and Chaloulakou Athens, Greece PM10 Elangasinghe et al. (2014) Auckland, New Zealand NO2
(2006) He et al. (2014) Mong Kok, Hong Kong PM10; PM1
Nagendra and Khare (2006) New Delhi, India NO2 Luna (2014) Rio de Janeiro, Brazil O3
Schlink et al. (2006) Several EU countries O3 Özdemir and Taner (2014) Kocaeli, Turkey PM10
Slini et al. (2006) Thessaloniki, Greece PM10 Russo and Soares (2014) Lisbon, Portugal PM10
Brunelli et al. (2007) Palermo, Italy SO2; O3; PM10; NO2; Zhou et al. (2014) Xi'an Province, China PM2.5
CO Alam and McNabola (2015) Vienna, Austria PM10
Dutot et al. (2007a,b) Orleans, France O3 Biancofiore et al. (2015) Pescara, Italy O3
Osowski and Garanty (2007) Warsaw, Poland CO; NO2; SO2; dust Cortina-Januchs et al. (2015) Salamanca, Mexico PM10
Sousa et al. (2007) Porto, Portugal O3 Dunea et al. (2015) Oltenia, Romania O3; PM10; PM2.5
Al-Alawi et al. (2008) Kuwait O3 Dursun and Taylan (2015) Konya City, Turkey SO2
Coman et al. (2008) Paris, France O3 Feng et al. (2015) Jing-Jin-Ji area, China PM2.5
Díaz-Robles et al. (2008) Temuco, Chile PM10 Mishra and Goyal (2015) Agra, India NO2
Ibarra-Berastegi et al. (2008) Bilbao, Spain SO2; CO; NO2; NO; Russo et al. (2015) Lisbon, Portugal PM10
O3 Santos & Fernández-olmo Cantabria Region, Spain As; Cd; Ni; Pb
Martín et al. (2008) Algeciras, Spain CO (2015)
Perez and Salini (2008) Santiago, Chile PM2.5 Zhu et al. (2015) Chongqing, China NOx
Solaiman et al. (2008) Ontario, Canada O3 Zou et al. (2015) Texas, USA PM2.5
Abderrahim et al. (2016) Algiers, Algeria PM10
Zito et al. (2008) Leicestershire, UK CO; NO2 Bai et al. (2016) Chongqing, China PM10; SO2; NO2
Ettouney et al. (2009) Jahra, Kuwait O3 Catalano et al. (2016) London, United Kingdom NO2
Galatioto & Zito (2009) Palermo, Italy CO; C6H6 Chelalli et al. (2016) Algiers, Algeria PM10
Hrust et al. (2009) Zagreb, Croatia NO2; O3; CO; PM10 Ding et al. (2016) Hong Kong NO2; NOx; O3; SO2;
Juhos et al. (2009) Szeged, Hungary NO; NO2 PM2.5
Pisoni et al. (2009) Milan, Italy O3 Durao et al. (2016) Sines, Portugal O3
Tsai et al. (2009) Taiwan O3
Demir et al. (2010) Istanbul, Turkey PM10 He et al. (2016) Lanzhou, China SO2; NO2; PM10
Inal (2010) Istanbul, Turkey O3 Hoshyaripour et al. (2016) Sao Paulo, Brazil O3
Jain and Khare (2010) Delhi City, India CO Li et al. (2016) Beijing, China PM2.5
Kurt and Oktay (2010) Istanbul, Turkey SO2; CO; PM10 Li et al. (2017) China PM2.5
Mahapatra (2010) New Delhi, India O3 Lightstone et al. (2017) United States of America PM2.5
Moustris et al. (2010) Athens, Greece ERPI (NO2; CO; Mao et al. (2017) Eastern China PM2.5
SO2; O3) Peng et al. (2017) Canada O3; PM10; NO2
Pires et al. (2010) Oporto, Portugal O3 Rahimi (2017) Tabriz, Iran NOX; NO2
Feng et al. (2011) Beijing, China O3 Stamenković et al. (2017) 17 EU countries, USA, NOX
Paschalidou et al. (2011) 4 cities in Cyprus PM10 China, Japan, Russia and
Prakash et al. (2011) New Delhi, India CO; NO2; NO; O3; India
SO2; PM2.5 Taylan (2017) Jeddah, Saudi Arabia O3
Vlachogianni et al. (2011) Thessaloniki, Greece and PM10; NOx Yeganeh et al. (2017) Queensland, Australia PM2.5
Helsinki, Finland Zhang and Ding (2017) Hong Kong NO2; NOx; O3;
Voukantsis et al. (2011) Thessaloniki, Greece and PM10; PM2.5 PM2.5; SO2
Helsinki, Finland Alimissis et al. (2018) Athens, Greece NO2; NO; O3; CO;
Barrón-adame et al. (2012) Salamanca, Mexico SO2 SO2
Chattopadhyay and Kolkata, India O3 Antanasijević et al. (2018) 26 EU countries SOx; NOx; NH3;
Chattopadhyay (2012) NMVOC; PM10
Fernando et al. (2012) Phoenix, Arizona PM10 Dotse et al. (2018) Brunei Darussalam PM10
Perez (2012) Santiago, Chile PM10 Franceschi et al. (2018) Bogota, Colombia PM2.5; PM10
Singh et al. (2012) Lucknow, India RSPM; SO2; NO2 Freeman et al. (2018) Kuwait O3
Siwek and Osowski (2012) Warsaw, Poland PM10 Gao et al. (2018) Jinan, China O3
Antanasijevic et al. (2013) 26 EU countries PM10 Huang and Kuo (2018) Beijing, China PM2.5
Jiang et al. (2018) Beijing, China PM2.5; SO2; NO2;
Arhami et al. (2013) Tehran, Iran CO; NOx; NO; NO2; CO; O3
O3 Li and Zhu (2018) China PM2.5; PM10; CO
(continued on next page)
287
Table 1 (continued)
Authors (year) Location(s) Air pollutants

examined
Nidzgorska-Lencewicz Tricity Agglomeration, PM10

(2018) Poland
Pak et al. (2018) Beijing, China O3
Radojević et al. (2018) Belgrade, Serbia SO2; NOx
Tzanis et al. (2019) Attica, Greece PM2.5; PM10
Ventura et al. (2019) Rio de Janeiro, Brazil PM2.5
Wang and Song (2018) Beijing, China CO; NO2; SO2; O3;
PM10; PM2.5
Yeganeh et al. (2018) Queensland, Australia NO2
Zhu et al. (2018) China PM2.5
Bai et al. (2019) Beijing, China PM2.5
Liu et al. (2019) Beijing, China PM2.5; SO2; NO2;
CO
Qi et al. (2019) Jing-Jin-Ji, China PM2.5
Qin et al. (2019) Shanghai, China PM2.5
Fig. 4. Number of occurrences various time steps have been used.
commonly determined by the sampling periods of the instruments used

to measure the pollutant species and meteorological data at a mon-
itoring station. Consequently, some identified papers pre-processed
their collected data via averaging and linear interpolation techniques to
create model datasets with consistent time steps.
Fig. 5 shows the number of instances various forecast lengths have
been used by the identified papers. Unfortunately, 43 out of 139 papers
did not explicitly describe the forecast length utilised in their model
development process. This ambiguity poses a challenge to future
modellers as parameter settings that are implicitly described can cast
doubts on the readers and future modellers. Of the remaining papers,
short-term forecasting (forecast length = 1) was carried out 68 times.
Long-term forecasting (forecast length > 1) was done 171 times, of
which +24, +48 and + 72 ahead forecasts were done 25, 7 and 4
times, respectively. Prediction (forecast length = 0) was carried out by
48 papers. It is also worth emphasising that most papers identified did
Fig. 2. Distribution of papers by year of publication. consider multiple forecast lengths.
3. Methods used for ANN model development
The design of ANN models can often be regarded as more of an art

than a science due to the lack of a clean-cut method for implementing
each model development step (Guoqiang Zhang and Patuwo, 1998).
Regardless, several guidelines are available in the literature to provide
future modellers with a systematic way of developing ANN models. The
model development process is divided into eight main steps: (1) data
collection, (2) data pre-processing, (3) selection of input variables (or
Fig. 3. Distribution of papers by air pollutant variables predicted.
delved in more than one air pollutant variables.

The distribution of the time steps used in modelling variables con-
sidered is shown in Fig. 4. The hourly time step was used in 69 out of
139 papers reviewed, followed by daily (50 papers), and yearly (4 pa-
pers) and 5-min scales (4 papers). On the other hand, several time steps
including 2-hourly, 4-hourly, minute and 30-min were utilised by other
papers. However, the time steps of the modelling variables are Fig. 5. Number of occurrences various forecast lengths have been used.
288
predictors), (4) data splitting, (5) selection of model architecture, (6)

determination of model structure, (7) model training, and (8) model
validation. The ANN model-building protocol used in this review is
based on those presented by Jakeman et al. (2006) and Maier et al.
(2010).
3.1. Collection of data
3.1.1. Introduction
Since black-box models such as ANN models are data-dependent, it
is generally difficult to incorporate them with prior knowledge.
Consequently, the performance of ANN models primarily relies on the
type and form of data utilised to train them.
One important requirement in data selection is for them to span the
full range of input space for which the network will be utilised, as
black-box models do not extrapolate well (Hagan et al., 1995). In the
case of air pollution forecasting applications, the use of predictors,
covering a period of a year or more has been highly recommended. This
is to ensure that seasonal factors which have been identified to strongly
influence air pollution levels are taken into account (Arhami et al.,
2013; Colls, 2001; Kumar et al., 2017). For instance, many studies have
Fig. 6. Number of occurrences various lengths of input data have been col-
utilised datasets covering a one-year period as the data tends to be
lected.
roughly periodic after a year (Bai et al., 2019; Catalano et al., 2016;
Coman et al., 2008; Fernando et al., 2012; Kurt and Oktay, 2010;
Mishra and Goyal, 2016).
Furthermore, the selection of various types of predictors plays an
important role in model performance since air quality is a complex
function of meteorology, emissions and other parameters (Colls, 2001).
There is a plethora of predictor types that have been considered in
previous environmental modelling applications. In this paper, they are
categorised as meteorological, emissions, traffic, and others. Meteor-
ological parameters refer to the variables that characterise atmospheric
chemistry. Meteorological variables especially wind speed, wind di-
rection, relative humidity and atmospheric turbulence have been found
to have a massive influence on the dispersion and concentration of
several air pollutants including O3, NO2, PM10 and PM2.5 (Colls, 2001;
Dominick et al., 2012; Kumar et al., 2017; Peng et al., 2017). Emissions
data primarily refers to primary and secondary air pollutants in urban
environments. They are also considered as important predictors as they
are highly correlated to other air pollutants (Colls, 2001; World Health
Organization, 2018). Finally, traffic data refers to the information that
characterises traffic behaviour. This includes traffic flow density, speed,
occupancy degree, queues length, and travel time which are typically
monitored on roads in close proximity to air quality stations. The use of Fig. 7. Number of occurrences various sets of data have been collected, where
traffic parameters has also been suggested as they play a significant role MET and EMISSIONS denote the use of meteorological and air pollutant vari-
in the formation of several roadside air pollutants (Agirre-Basurko ables only, respectively; and W/SATELLITE and W/TRAFFIC are the use of
et al., 2006; Catalano et al., 2016; Colls, 2001; Galatioto and Zito, 2009; satellite and traffic data along with other variables, respectively.
World Health Organization, 2018). On the other hand, uncategorised
parameters that were also considered by the identified papers include measurements. On the other hand, the utilisation of only meteor-
satellite, land-use and economic variables. ological data was carried out 9 times, while the sole use of emissions
data was done 13 times. Additionally, the use of traffic data with other
3.1.2. Results variables was implemented by 17 papers. The utilisation of data based
Fig. 6 reveals the distribution of input data lengths utilised by the on satellite-derived imagery also appeared in 4 papers. For instance, the
identified papers. The majority of the papers utilised input data with utilisation of satellite-derived aerosol optical depth (AOD) variables
length covering more than a period of one year. In detail, training data was explored in the forecasting of PM2.5 in several occasions (Mao
with lengths from one to three years were used 60 times, while those et al., 2017; Wen et al., 2019; Yeganeh et al., 2017). Finally, land-use,
with lengths longer than three years were used 43 times. Furthermore, economic and stability predictors were utilised by only a few identified
the use of data covering a period of less than 6 months occurred 22 papers.
times, while only 8 studies utilised data with lengths between six
months to one year. However, 6 of the identified studies did not provide
details regarding input data length. 3.1.3. Conclusion
Fig. 7 shows the number of instances where a given set of predictors The findings have indicated that the majority of the identified stu-
was utilised by the identified papers. Of the 139 papers reviewed, the dies utilised data that covers more than one year for ANN model de-
use of both meteorological and pollutant emissions variables was ob- velopment. This is considered a good modelling practice as the devel-
served on 90 occasions. The most common meteorological variables oped ANN models can possess a greater generalisation ability when
utilised include temperature, relative humidity, and wind-related introduced to a sufficient length of data. However, this practice does
289
not fully address other prevailing modelling issues including the im- contrast to range scaling, standardisation converts an old variable into a
balance data problem which occurs when peak or rare network target new variable with zero mean and unit standard deviation. Finally, it is
values have lesser representatives. The problem is typically en- worth noting that another common data pre-processing technique is
countered by ANN models dealing with the prediction of high or peak called feature extraction, where the dimension of the original input
pollution episodes (Bai et al., 2019; Fernando et al., 2012; Gong and data space is reduced to avoid redundancy. One popular feature ex-
Ordieres-Meré, 2016). To address the imbalance data problem, several traction method is the method of principal component analysis (PCA).
techniques coming from the field of data mining can be explored: data In this paper, the implementation of both PCA and ANN models is
re-sampling (Drummond and Holte, 2003; Zhao et al., 2016), cost- considered as a data-intensive hybrid modelling approach. Hence, PCA
sensitive learning (Fontes et al., 2014; Tsai et al., 2009), algorithm method is not mentioned here and is described in more details in Sec-
modifications (López et al., 2012), and the Synthetic Minority Over- tion 3.3 instead. On the other hand, data imputation is the process that
Sampling Technique (SMOTE) (Chawla et al., 2002). With regards to addresses the issue of missing data, which is a problem repeatedly en-
the predictor types utilised, the results reveal that meteorological and countered in air quality modelling applications (Junninen et al., 2004).
pollutant emissions data are the most commonly used predictors by the Missing data can be the result of many factors such as insufficient
identified studies. However, training with several predictors should be sampling, errors in measurements or faults in data acquisition. One of
handled carefully as the dimension of input data space greatly influ- the simplest ways to address this issue is through the substitution of
ences the network complexity of the resulting model (Hagan et al., missing values with the mean of the entire dataset. However, this
1995). Furthermore, less distinction can be observed between non- practice is highly discouraged as this can disrupt the inherent structure
linear models such as ANN models and linear statistical models when a of the original dataset, potentially degrading the performance of a
large number of predictors are used. This is because the combination of model. Another popular approach in dealing with missing data is the
a large number of non-linear processes tends to linearize the overall list-wise or pair-wise deletion of predictors with missing data. Missing
mechanism of a developed non-linear model (Ibarra-Berastegi et al., data imputation techniques are categorised as univariate, multivariate,
2008). This observation is true for at least the prediction of NO2, O3, nearest neighbour, and the hybrid of the previous approaches
SO2 and CO in urban environments (Chelani et al., 2002; Guardani (Junninen et al., 2004; Plaia and Bondı, 2006). Univariate methods
et al., 1999; Kao and Huang, 2000; Kolehmainen et al., 2001; Schlink include linear (LIN), spline and nearest neighbour (NN) interpolation.
et al., 2006; Slini et al., 2006). Consequently, this highlights the careful Multivariate methods include regression-based imputation, nearest
implementation of predictor selection techniques (see Section 3.3). neighbour interpolation, self-organising maps (SOMs) and MLP. Ad-
ditionally, other modellers address the issue of missing data by deletion,
in which predictors with a large fraction of missing values are left out. A
3.2. Data pre-processing more detailed discussion of the said techniques can be found in
Junninen et al. (2004) and Plaia and Bondı (2006). The taxonomy of
3.2.1. Introduction the said methods is given in Fig. 9.
Another important step in the development of ANN models is data
pre-processing. This step refers to preliminary techniques that aim at
improving the representation of the collected data. Two popular data 3.2.2. Results
pre-processing techniques in the field of air pollution modelling include The majority of the studies identified did not provide sufficient
normalisation and missing data imputation (see Fig. 8). details describing the methods they utilised for data normalisation. Of
Normalisation is used to ensure that all predictors fall in a similar those that provided details, the standard normalisation scheme was
range. This is an important step in model development as inputs with implemented 43 times, see Fig. 10. There are 9 cases where the input
large values disproportionately mask the impact of those inputs with data was adjusted to have zero mean and unity variance. However,
smaller ones. The step should also be taken to match the range of the there was only a small number of instances (5 times) where other
predictors to those of the transfer function of the hidden layer (see normalisation methods were used.
Section 3.5). There are two popular categories for normalisation, Fig. 11 shows the number of occurrences various approaches to
namely, range scaling and standardisation. Under range scaling, the missing data imputation techniques have been undertaken. It should be
predictors are transformed such that the maximum and minimum va- noted that of the 139 papers reviewed, only 34 provided details about
lues of the predictors are mapped to 0 and 1 or -1 to 1, respectively. In missing data. Of those that disclosed such information, the majority
carried out the deletion of predictors with missing data. Under uni-
variate methods, nearest neighbour interpolation was used 5 times,
while linear interpolation was only utilised 4 times. There are 6
Fig. 8. Taxonomy of data normalisation procedures. Fig. 9. Taxonomy of missing data imputation approaches.
290
deletion approach appears to be a quick and practical approach, this

may not be an option to those studies with a very limited amount of
data available or cases where the collection of additional data can be
very expensive. In such cases, modellers should make the full use of any
data available, even if it is incomplete. Others may have collected da-
tasets without missing values, while others merely did not disclose the
imputation techniques they implemented when they encountered
missing data. However, it is still considered a good modelling practice
to thoroughly discuss the use of imputation techniques for repeatability
and reproducibility of results (Wu et al., 2014). Several available state-
of-the-art imputation methods include single imputation (Plaia and
Bondı, 2006), regression-based imputation using the EM-algorithm
(Dempster et al., 1977; Schneider, 2001), known data regression (KDR)
method (Folch-Fortuny et al., 2015), vector autoregressive model-im-
putation (VAR-IM) algorithm (Bashir and Wei, 2017) and Bayesian
compressive sensing (BCS) imputation methods (Williams et al., 2018).
3.3. Selection of predictors
Fig. 10. Number of occurrences various missing data normalisation techniques 3.3.1. Introduction
have been implemented. Choosing the most suitable ANN model predictors for a prediction
problem is a nontrivial task. ANN models make no prior assumption
regarding the distribution of the predictors involved and the underlying
physical dynamics between predictors and target variables (Gardner
and Dorling, 1998). As such, the robustness of an ANN model heavily
relies on the form and manner of which predictors are being fed into the
model. Consequently, the inclusion of too many correlated and extra-
neous predictors results to more network connections leading to over-
fitting issues (Hagan et al., 1995). On the other hand, the absence of
relevant explanatory variables inhibits the model from correctly ap-
proximating the underlying dynamics between predictors and target
variables (Maier and Dandy, 2000).
There are several approaches in selecting the most significant pre-
dictors of a given model (see Fig. 12). They are divided into two ca-
tegories, namely, model-free and model-based approaches (Maier et al.,
2010). Model-free approaches perform input selection without relying
on the performance of the developed ANN models. In other words, the
process is undertaken before the ANN models are trained. Model-free
approaches can be further divided into two categories: ad-hoc and
analytical. The selection of model predictors implemented in an arbi-
trary manner or based on domain knowledge falls under the ad-hoc
approach. In contrast, the analytic approach involves the use of a sta-
Fig. 11. Number of occurrences various missing data imputation techniques tistical measure of dependence between model predictors and target
have been implemented. variables. This is mostly carried out through cross-correlation. How-
ever, this analytical approach can only detect linear dependence be-
tween data, leading to the omission of relevant predictors that are as-
instances where a combination of the techniques above were used, of
sociated with the target variables in a non-linearly manner
which distinct methods were employed to address specific gap lengths.
(Samarasinghe, 2006). On the other hand, model-based approaches
perform input selection by determining the effect of a candidate model
3.2.3. Conclusion predictor on the overall model performance. As pointed out by Maier
The majority of the papers identified did not provide sufficient in- et al. (2010), the approach has several downsides. Firstly, the approach
formation regarding the data normalisation and imputation methods is time-consuming as a number of ANN models need to be developed.
used before model training. With regards to data normalisation, one Furthermore, it does not clearly measure the impact of the utilised
possible explanation for the lack of mention is the growing number of predictors on the model performance, as the latter is also a function of
neural network model building platforms that perform data pre-pro- several network parameters, e.g. the number of hidden layer nodes, etc.
cessing techniques by default. For instance, the MATLAB neural net- A popular example of model-based approaches is the stepwise selection
work toolbox normalises input values by default through the map- of inputs, where a network iteratively selects, e.g. forward selection, or
minmax function (The MathWorks, 2017). However, this seemingly remove, e.g. backward elimination, predictors based on the model
trivial yet essential ANN development step should be clearly defined to performance. An ad-hoc approach can also be done, where arbitrary
assist future modellers. Of the papers that described the process of data combinations of model predictors are tested. A global approach can also
normalisation, simple range scaling was commonly adapted. Other be implemented, where a global optimisation algorithm is used to select
methods for pre-processing data for ANN development can be found in the combination of predictors that maximises model performance. Fi-
the literature (Bowden et al., 2003). On the other hand, the majority of nally, an approach based on sensitivity analysis can be undertaken,
the identified papers adopted to the deletion of variables with missing where plots of sensitivity for each predictor to the target variables are
values. This result highlights some potential issues. Although the examined.
291
Fig. 12. Taxonomy of approaches to determining optimal model predictors.
predictors. Several guidelines in determining the most suitable pre-

dictor selection technique to every problem specification can be found
in Galelli et al. (2014). The proposed guidelines were evaluated by
taking into account three factors, including, a wide range of dataset
properties that reflect the properties of real-world environmental ob-
servations, an assessment criteria selected to highlight algorithm suit-
ability in different problem specifications, and a website for sharing
data, algorithms and results (Galelli et al., 2014).
3.4. Data splitting
3.4.1. Introduction
The division of data is another essential step in the development of
ANN models. This is carried out by splitting the available data into
three subsets, namely, the training, validation and test sets. The
training subset is used for computing the gradient and calibrating the
network weights and biases. On the other hand, the validation subset is
utilised to stop the network training before overfitting takes place. In
detail, the error in the validation subset monitors the network perfor-
Fig. 13. Number of occurrences various input selection methods have been
mance during training. When this error begins to increase for several
used.
iterations, the training is stopped, and the weights and bias values that
yielded the minimum error are then utilised as the final trained network
3.3.2. Results weights and biases. Hence, the division of data is an essential modelling
The number of occurrences various input selection approaches have step to avoid the problem of model, where the network tends to
been used is shown in Fig. 13. It can be seen that model-free approaches memorise the data in the training subset, but unable to generalise to
were implemented 99 times, compared with the 40 occasions on which new situations, e.g. unforeseen data (Hagan et al., 1995). Lastly, the
model-based approaches were used. Of the model-free approaches testing subset is used to determine the generalisation ability of the
considered, ad-hoc methods were most widely implemented with ap- developed model. That is, the error from the testing subset is utilised to
plications in 82 papers, followed by linear approaches, especially cor- compare the predictive performance of different models. However, note
relation analysis which was utilised in 13 papers. A non-linear method that there are aspects of model performance other than predictive va-
was only employed 4 times. In 13 of the 40 times where a model-based lidity. The three aspects of model validity are fully covered in Section
approach was implemented, the process was carried out in an ad-hoc 3.8.
manner. A stepwise method was used 10 times, while global search Data splititing approaches can be categorised as either supervised or
approaches were implemented 7 times. unsupervised (see Fig. 14). Supervised approaches refer to the process
of dividing the input data into three subsets that takes into considera-
3.3.3. Conclusion tion the statistical properties of each subset. On the other hand, un-
The results of the review highlight the need for greater attention to supervised approaches do not explicitly take the statistical properties of
predictor selection implementation. Although the selection of pre- the data subsets into account, and only stratified unsupervised ap-
dictors is dependent on the external problem specifications, a sys- proaches attempt to ensure that the statistical properties of the subsets
tematic approach should be encouraged among modellers to reduce are similar (Maier et al., 2010). For instance, SOM can be utilised to
bias and increase the repeatability of performances of data-driven cluster the available data and to allocate data samples from each cluster
models in general. Ad-hoc approaches to predictor selection, either to the training, testing and validation subsets, thereby ensuring that
model-based or model-free, were used in almost 70% of the identified patterns from different regions of the multivariate predictor-output
papers. Furthermore, linear analytical model-free approaches were also space are represented in each subset. In the random unsupervised ap-
widely implemented, which contradicts the rationale of using non- proach, the data are randomly divided into their respective subsets.
linear models such as ANNs in approximating the typically non-linear This approach may pose uncertainty on the model results as data in one
dynamics between air pollutants and predictors. This indicates the need of the sets may be biased towards extreme or uncommon events
to further examine the use of non-linear approaches in selecting (Gardner and Dorling, 1998). Alternatively, a v-fold cross-validation
292
ad-hoc manner. This may lead to uncertainties regarding the quality

and repeatability of results, as ANN models fed with different data splits
are likely to yield different calibrated network weights and bias.
Consequently, such a practice leads to different model performances
(Maier et al., 2010). Systematic approaches for optimal division of data
for ANN models can be considered: genetic algorithm (GA) and SOMs
(Bowden et al., 2002), modified Kennard-Stone algorithm (Saptoro
et al., 2012). Future modellers may also look into the benchmarking
approach proposed by Wu et al. (2013) for comparing different data
splitting methods. The authors highlighted the importance of finding a
data division method that provides consistent prediction validation
error results that are representative of the predictive errors obtained
over the full range of the available data.
3.5. Selection of model architecture
Fig. 14. Number of occurrences various input selection methods have been 3.5.1. Introduction
used. In this paper, model architecture refers to the overall structure and
manner how information flows from one layer to another in ANNs. The
taxonomy of ANN model architectures is shown in Fig. 16.
approach can be implemented. The method randomly divides the da-
Two of the most popular network architectures applicable for pre-
taset into v independent subsets. One of the v subsets is selected as test
diction and function approximation are the feedforward and recurrent
set while the remaining v-1 subsets are utilised for model calibration.
networks (Hagan et al., 1995). In a feedforward network, information
The process is repeated several times until a pre-specified criterion is
moves from the input layer to the output layer in a single direction.
met. In the physics-based approach, the data are split into different
Multilayer perceptron (MLP) is one of the most utilised feedforward
classes according to a knowledge about the underlying physical pro-
ANN types for non-linear function approximation tasks (Shahraiyni and
cesses. In the ad-hoc approach, the data allocated for the training, va-
Sodoudi, 2016). Other examples of feedforward ANNs include radial
lidation and testing set are selected in an ad-hoc manner. One popular
basis function networks (RBFs), general regression neural networks
example is the allocation of the first N observations to the training set,
(GRNNs), Ward neural networks (WNN) and extreme learning machine
and the next group of observations are allotted to the validation set, and
(ELM).
final group for the testing set.
In contrast to feedforward ANNs, a recurrent neural network (RNN)
allows feedback. In other words, some output neurons are connected to
3.4.2. Results the neurons of the preceding layers, which can improve the capacity of
In the papers identified, a few did not discuss the process of data RNNs to learn (Hagan et al., 1995). One popular example under this
division explicitly. Of those that mentioned the data division process, type is the Elman network (Biancofiore et al., 2017b; X. Feng et al.,
the results indicate that the only unsupervised data splitting methods 2015; Peng et al., 2017). Sophisticated forms of RNNs including the
were implemented (see Fig. 15). In detail, ad-hoc method was im- long short-term memory (LSTM) networks are also receiving more at-
plemented 79 times, while random data division methods done 40 tention from modellers in recent years. Such efforts are being made to
times. The v-fold cross validation method was implemented 8 times. address the well-known vanishing gradient problem for which RNNs
Finally, only a few number of papers employed unsupervised stratified still suffer from (Freeman et al., 2018). Hagan et al. (1995) and
techniques. Samarasinghe (2006) argued that those of the recurrent type are po-
tentially more effective because of their feedback mechanism which
3.4.3. Conclusion improves their capacity to learn.
The majority of the identified papers performed data division in an Finally, the application of hybrid ANN models has been highlighted
in recent years (Makridakis et al., 2018). The ensemble modelling ap-
proach has been argued to capitalise on the strengths and overcomes
the weaknesses of the individual models involved (Chen et al., 2008;
Shahraiyni et al., 2015; Sharma et al., 2005). In this paper, hybrid
models were categorised into the following three classes: data-in-
tensive, model-intensive and technique-intensive (Maier et al., 2010). A
data-intensive approach is one that attempts to classify the data with
respect to various dynamics dependent on the problem specifications or
the criteria set by the modeller. Then, separate models are developed
for the identified separate classes. Examples of data-intensive hybrid
ANN models include the use of ANNs and techniques such as PCA, k-
means clustering, ensemble empirical mode decomposition (EEMD) and
wavelet decomposition. Model-intensive approach is one that employs
different models for different sub-components of the overall physical
system and then aggregates various responses calculated from different
models. Fuzzy-neuro networks, e.g. the hybrid of a feedforward ANN
and fuzzy systems, multiple restricted Boltzmann machine (RBM) layers
and a back-propagation (BP) layer, and LSTM and convolutional neural
networks (CNNs) are a few examples of model-intensive AANN models.
Lastly, a technique-intensive approach is one in which an ANN is
Fig. 15. Number of occurrences various unsupervised data splitting methods combined with a different technique with the purpose of developing an
have been used. ensemble approach that exploits the advantages offered by different
293
Fig. 16. Taxonomy of model architectures.
techniques. Common examples of such type include the use of MLP and
support vector machine (SVM), stacked auto-encoders (SAE) and a
learning regression (LR) layer, and non-linear autoregressive ANN with
exogenous inputs (NARX-ANN).
Network architecture also refers to the way information is translated
from input nodes to the nodes of the succeeding layers in a network.
Consequently, the selection of appropriate transfer or activation func-
tions which maps the sum of weighted information from a current node
to the succeeding node plays an important role. This because the su-
perimposition of different transfer functions determines the ability of an
ANN to approximate different input-output dynamics. The selection of
an appropriate type of transfer function depends on the nature of the
model task (Hagan et al., 1995). It also relies on which network layer
the function is to be utilised. In air pollution modelling applications,
sigmoid transfer functions are commonly used in the hidden layer as
they are nonlinear and easily differentiable (Gardner and Dorling,
1998). Sigmoid function has a graph that looks like a stretched ‘S’ and
yields values between either 0 to 1 or -1 to 1. Such characteristics en-
able an ANN to approximate extremely any non-linear and complex
relationships between predictors and target variables. One of the most
Fig. 17. Number of occurrences various model architectures have been used. popular sigmoid functions include the logistic sigmoid and hyperbolic
tangent functions (Bishop, 1995). On the other hand, linear or identity
function is found to be the most appropriate transfer function in the
output layer for prediction or regression applications as it yields un-
bounded estimation values (Hagan et al., 1995). It is worth mentioning
294
which is also applicable to other environmental engineering tasks, can

be found in Hunter et al. (2018). In the said proposed framework, the
most suitable sub-models are developed for each sub-process of a spe-
cific problem of interest based on consideration of model purpose, the
degree of process understanding and data availability. The selected sub-
models are then combined to form the hybrid model. On the other
hand, this review also indicates that the transfer functions in both
hidden and output layers were selected properly by the identified pa-
pers. However, the results here are not conclusive as the success of
transfer functions still relies on the structure of the dataset on which the
network is trained from. Hence, this aspect in model development
should also be further examined.
3.6. Determination of model structure
3.6.1. Introduction
Another important step in the ANN model development process is
the determination of the number of layers and the number of nodes in
each layer. The input layer is where the predictors are being fed into,
while it is in the output layer is where the final network results are
Fig. 18. Distribution of papers employing hybrid approaches by year of pub- calculated. Hence, the number of input and output nodes are dependent
lication. the number of predictors and target variables, respectively. Lastly, the
layer other than the input and output layers is called the hidden layer. It
that feedforward ANNs with linear transfer functions and without is in the hidden layer where the underlying dynamics between pre-
hidden nodes are equivalent to linear statistical models. dictors and target variables is captured. With the sufficient number of
nodes in the hidden layer, an ANN can approximate almost any function
(Bishop, 1995; Samarasinghe, 2006). ANNs may have one or more
3.5.2. Results hidden layers. It has been shown that the use of too many hidden layers
Fig. 17 illustrates the number of times various model architectures and neurons can lead to model overfitting, while the opposite can cause
have been used by the identified papers. MLP models were found to be to model underfitting (Hagan et al., 1995; Samarasinghe, 2006). The
the most commonly used model architecture, implemented in 78 pa- said issues are reported to affect the generalisation ability of a model,
pers. Of these papers, linear statistical models were used as benchmarks leading to poor prediction accuracy. In addition, the hidden layer
and outperformed by MLP the models 15 times. The number of studies parameters are case-specific as they are dependent on the data com-
in which alternative network architectures were employed was rea- plexity of a specific application (Gardner and Dorling, 1998). A general
sonably uniform, ranging from 4 to 13. Additionally, it is worth noting method that determines the optimal number of hidden layers and
that there is a growing trend in the number of papers that employ neurons to be used still remains unknown, thus contributing to the
hybrid ANN modelling approaches in ambient air pollution forecasting initial difficulty in ANN model building. As a result, different ap-
tasks in recent years, see Fig. 18. The majority of the 45 implemented proaches have been employed to address the said uncertainty.
hybrid models fall under the data-intensive type. The results also reveal The methods for determining the optimal ANN model structure can
the popular use of deep neural networks such as LSTM models. LSTM be classified into three types, namely, global, stepwise trial-and-error or
models were employed 10 times, five of which were coupled with other ad-hoc (see Fig. 19). In the first approach, the optimal number of
modelling techniques. With regards to the selection of transfer func- hidden layers and nodes are determined using global methods based on
tions, almost 30% of the papers reviewed did not provide details con- competitive evolution found in nature, e.g. GA, particle swarm opti-
cerning their use of transfer functions. Among those that did provide misation, simulated annealing, etc. Using this approach, it is possible to
information, logistic sigmoid function was predominantly used in the simultaneously optimize the network weights and biases, and the
hidden layer nodes, to be followed by hyperbolic tangent function. In number of hidden layer and nodes. If implemented properly and ap-
the output nodes, the identity function was widely utilised, followed by propriately, the global methods are likely to result in the best ANN
the logistic sigmoid. The results also reveal a very few instances where structure and parameters. However, they are found to be computa-
the Gaussian function was used. tionally expensive (Maier et al., 2010). The stepwise trial-and-error
approach can be used, in which a basic ANN structure is first assumed,
3.5.3. Conclusion which is modified with each trial with the objective of achieving a
Much effort has been directed towards the evaluation of existing structure that is neither too complex nor too simple. The stepwise ap-
ANN architectures, especially the MLP model, applied to air pollution proach can be further split into two types, one based on pruning al-
forecasting applications. Furthermore, there is a growing interest in the gorithm and the second based on constructive approaches. Lastly, ad-
development of novel and more sophisticated network architectures. hoc approaches can be used to determine an optimal network structure
This can be explained by the increasing number of easily accessible in which the number of hidden nodes is determined without adhering to
computing tools that are able to run complicated algorithms rapidly. strict pruning and constructive techniques. One ad-hoc approach is
The use of hybrid modelling approaches has also been practised by based purely on a trial-and-error approach. The partial ad-hoc is based
modellers given their high potential at improving the performance of on the use of both a trial-and-error approach and an empirical formula
plain ANN models (Chen et al., 2008; Sharma et al., 2005). However, that provides upper and/or lower bounds of the number of hidden
Maier et al. (2010) remarked that because of the wide variety of nodes. The last ad-hoc approach is based on experience or intuition of
modelling approaches available, it is not possible to draw any conclu- the modeller.
sions as to which network architecture should be employed for a spe-
cific forecasting task. The authors suggested that this requires further 3.6.2. Results
investigation in the future. A generic framework for developing both As can be seen in Fig. 20, an ad-hoc approach to determining the
hybrid process and data-driven models of salinity in river systems, structure of ANN models was by far the most popular, with 132
295
identified studies employed an ad-hoc approach to determining an

appropriate ANN model structure. There has been a little adoption of
constructive, stepwise model building approaches, but the use of global
optimisation methods has received little attention. As such, this stage
requires further attention in the future. Future modellers may look into
an objective approach based on Bayesian model selection (BMS)
method for ANNs for comparing models of varying complexity in order
to select the most appropriate ANN structure which can be found in
Kingston et al. (2008). The approach which utilises Markov Chain
Monte Carlo posterior simulations to estimate the evidence in favour of
competing models. The authors remarked that the said approach pro-
vides a simple and objective method for selecting the ANN model with
the optimal complexity when used in conjunction with the Bayesian
training procedure developed by (Kingston et al., 2005a,b).
3.7. Model training
3.7.1. Introduction
Before ANN models are employed for forecasting applications, they
must be trained or calibrated to do so. Training an ANN is the process of
calibrating the connection weights between the interconnected nodes of
the network. It is through the connection weights and node biases that
an ANN can be able to approximate complex non-linear mappings from
the nodes of the input layer to the network outputs. Training an ANN is
typically carried out in a supervised manner. Before training, the net-
work weights and biases are usually initialised. Initial weighting values
are typically selected randomly from a uniform distribution (Hagan
Fig. 19. Taxonomy of model structure. et al., 1995). During the training process, the network is repeatedly
presented with the desired network response for each input pattern as
the network weights and biases are calibrated until the target outcome,
e.g. the acceptable difference between the desired and actual output, is
met.
Network calibration methods generally belong to either local or
global optimisation approaches (see Fig. 21). Local methods usually
work on gradient information and are therefore prone to becoming
trapped in local optima if the error surface is reasonably rugged.
However, these methods are generally computationally efficient. Gra-
dient methods can be further sub-divided into first-order methods, e.g.
back-propagation, or second-order methods, e.g. Newton's method and
conjugate gradient method. When using the back-propagation, suitable
values of the network training parameters such as the learning rate and
momentum term also need to be initialised first. The suitable values for
the said parameters are also case-specific. Nonetheless, a few empirical
formulas for finding them are available in the literature (Gardner and
Dorling, 1998; Hagan et al., 1995; Samarasinghe, 2006). Global opti-
misation methods, such as genetic algorithms, have an increased ability
to find global optima in the error surface, although this is generally at
the expense of computational efficiency. Alternatively, stochastic cali-
bration methods can be utilised to account for parameter uncertainty.
These approaches can be used to obtain distributions of the model
Fig. 20. Distribution of papers employing hybrid approaches by year of pub- parameters, rather than finding a single parameter vector. This has the
lication. advantage that prediction limits can be obtained. In order to achieve
this, Bayesian methods are commonly used (Bishop, 1995; Maier et al.,
2010).
applications. It is worth noting that a number of identified papers, e.g.
13 papers, implemented an approach that combines trial-and error and
3.7.2. Results
empirical rules to determine the optimal number of hidden nodes. Of
The results in Fig. 22 indicate that deterministic local calibration
the structured approaches, constructive stepwise approaches were im-
techniques were commonly used, e.g. 90 times out of the 104 cited
plemented 5 times, whereas the global approaches were only used
techniques by the identified papers. Of those techniques, 42 of them
twice.
falls under first-order approaches and 35 falls under second-order ap-
proaches. Additionally, there were 9 studies that utilised global tech-
3.6.3. Conclusion niques and 3 that employed stochastic techniques.
Despite the important role network structure plays in determining
the desired relationship between model predictors and outputs, little 3.7.3. Conclusion
effort has been directed into this area of the ANN model development First-order local search procedures, such as the backpropagation
process. This is evident in this review where the majority of the algorithm, were primarily used by the identified papers, although
296
Fig. 21. Taxonomy of model calibration techniques.
quantify the performance of the developed ANN models. ANN model

performance is usually assessed using a quantitative error metric.
However, ANN models should not be assessed solely on their predictive
error, but also through their ability to capture underlying dynamics
between predictors and target variables (Kingston et al., 2005a,b). As
such, three aspects of model validity need to considered when assessing
the performance of ANN models, or data-driven models, generally
speaking: replicative validity, predictive validity, and structural va-
lidity (Gass, 1983; Humphrey et al., 2017) (see Fig. 23).
Metrics assessing replicative validity see to it that a developed
model correctly approximates both observed data and those utilised in
previous ANN model building steps (Gass, 1983). Popular methods
under this category include means and variance, minimum and max-
imum, analysis of variance, goodness-of-fit testing, regression and
correlation analysis, and confidence interval construction (Wu et al.,
2014). On the other hand, metrics dealing with predictive validity ex-
amine the performance of ANN models in approximating unforeseen or
independent data. A taxonomy of the commonly used metrics under this
category is given in Fig. 24.
Fig. 22. Number of occurrences various training methods have been used.
Squared errors are based on the squares of the differences between
actual and modelled output values. Common examples include the
second order methods were also utilised extensively in order to improve mean squared error (MSE), sum of squared errors (SSE) and root mean
the computational efficiency of ANN calibration. However, studies in- square errors (RMSE). Absolute errors are based on the absolute dif-
vestigating the potential benefits of using global optimisation techni- ferences between actual and modelled outputs. Metrics under this type
ques in terms of improving the predictive ability of ANN models are include mean sum of absolute deviations (MSAD) and total sum of
rather limited, which is an area worthy of further exploration. Some absolute deviations (TSAD). Relative errors measure the performance of
studies that employ global optimisation techniques during ANN cali- models with outputs. Common examples under this type include
bration can be found in the literature (Antanasijević et al., 2018; Y.
Feng et al., 2011; Lu et al., 2003; Niska et al., 2004; Pires et al., 2010).
A model calibration technique that accounts for the relative contribu-
tions of model predictors in generating the target variable could also be
further examined. Kingston et al. (2005a,b) argued that the quantifi-
cation of the relative input contributions can facilitate in assessing the
trained ANN based on the predictor-target variable relationship esti-
mated. Finally, future works could also delve into effect of the input
pattern to be included in the calibration set on the performance of the
resulting ANN models in a rapid and continuous manner. Bowden et al.
(2012) proposed an approach based on SOMs that can identify un-
characteristic data patterns presented to ANN models.
3.8. Model validation
3.8.1. Introduction
A wide range of statistical performance indices must be employed to Fig. 23. Taxonomy of performance evaluation techniques.
297
Fig. 26. Number of occurrences various replicative model validity metrics have
Fig. 24. Taxonomy of predictive performance evaluation techniques.
been considered.
average absolute relative error (AARE), the normalized root mean

3.8.2. Results
square error (NRMSE) and the normalized mean bias error (NMBE).
The results depicted in Fig. 25 indicate that a range of performance
Finally, correlation errors measure the empirical error between actual
criteria was used in most studies. It can be observed that predictive
and modelled outputs. One common example is the Pearson correlation
validity measures were used predominantly, while replicative and
coefficient (r). Other metrics include the information criteria, such as
structural validity were considered 18 and 10 times, respectively. The
the Akaike information criterion (AIC) and Bayesian information cri-
number of times different model replicative validity were utilised is
terion (BIC), which consider model complexity in addition to model
shown in Fig. 26. In those papers where replicative validity was con-
error. Finally, metrics dealing with structural validity ensure that a
sidered, the mean, standard deviation, minimum and maximum values
model is plausible when compared with a priori knowledge of the
between the observed and estimated values in the calibration stage
system behaviour, which is intended to be reflected in the resulting
were compared, while the Kolmogorov-Smirnov test was utilised to
model (Wu et al., 2014). That is, a model is structurally valid if it not
evaluate the statistical significance of the model training results with
only correctly approximates the observed data, but also reflects the way
significance level of 0.05. The number of times various model pre-
in which the real system is understood to operate to display such data
dictive performance metrics were used to validate the models is given
characteristics. Structural validity metrics also include the measure of
in Fig. 27. While correlation and squared error metrics were also widely
uncertainties of ANN model results. Methods that measure structural
used (163 and 118 times, respectively), measures based on absolute and
validity include sensitivity analysis (Mount et al., 2013), overall con-
relative errors, also employed extensively. Furthermore, visual inspec-
nection weights (Olden and Jackson, 2002), and measure of general-
tion was utilised in the majority of the papers identified (see Fig. 28).
isation (Razavi and Tolson, 2011). The reader is suggested to read other
The comparison between plots of actual and predicted values were
works, see (Dawson et al., 2007; Humphrey et al., 2017) for a more
predominantly shown (44 times), to be followed by scatter plots (36
detailed discussion regarding model validation.
times) and error histogram and surfaces (9 times). The number of times
Fig. 25. Number of occurrences various aspects of validity have been con- Fig. 27. Number of occurrences various predictive model validity metrics have
sidered. been considered.
298
4. Conclusions
Since the period between January 2001 and February 2019, re-
search activity in the field of forecasting and prediction of ambient air
pollution variables using ANNs has increased dramatically. Many
journal papers have been identified in the covered period, despite the
fact that a restricted journal list was considered and that the review was
limited to predicting air pollutant variables in urban and suburban
environments. Even within the period covered by this paper, there has
been an increase in the number of papers published in the later years,
with an average of 10 papers per year from 2010 to 2018. This can be
well explained by the increased availability of software packages that
enable modellers to build and train ANNs relatively easily and quickly
(Gardner and Dorling, 1998; Sharma et al., 2005).
As was the case of the identified papers, the primary application
area has been the forecasting and prediction of outdoor PM10, PM2.5,
oxides of nitrogen, and ozone. This is consistent with the findings of the
review by Yetilmezsoy et al. (2011), where a survey on techniques
based on artificial intelligence in modelling air pollution was made.
Fig. 28. Number of occurrences various visual inspection methods have been This number can be explained by the growing necessity to alleviate the
used. adverse effects of the said key air pollutants on human health by means
of early warning and preventive measures (World Health Organization,
2013). Attention was also directed towards other variables such as the
oxides of CO and SO2. Consequently, there is a need to broaden the
application areas of ANN models to focus on other air pollutant vari-
ables. ANNs would seem to be ideally suited to modelling complex
relationships of environmental variables given their universal function
approximation capability of ANNs (Gardner and Dorling, 1998; Hornik
et al., 1989; Maier et al., 2010). Additionally, the vast majority of
identified papers tested the effectiveness of ANN models in the long-
term forecasting of air pollution. This highlights the growing demand
for parsimonious and effective early-warning systems capable of pro-
viding accurate forecasts to the public, urban planners, and decision-
makers.
The majority of the identified studies have been found to utilise
datasets spanning at least a one-year period, which is considered suf-
ficient enough to reveal the annual cyclic pattern of air pollution levels.
This is generally considered a good practice. However, there may be
cases wherein the said period may not be sufficient especially when
extreme levels of air pollution concentration are not well represented
enough. The use of pre-processing methods that address the imbalance
Fig. 29. Number of occurrences various structural model validity metrics have data problem may be beneficial in this situation. As for the initial col-
been considered. lection of predictors, meteorological and pollutant emissions variables
have been predominantly considered by the papers reviewed, with
various model structural performance metrics were used is shown in meteorological variables receiving greater emphasis. This is consistent
Fig. 29. In those 10 occurrences were structural validity was con- with the findings of a review by Shahraiyni and Sodoudi (2016) on the
sidered, sensitivity analysis was conducted only 6 times, uncertainty in prediction of ambient PM10 levels using statistical models. Although the
model predictions were quantified only 3 time, and a skill score metric use of several predictors of various types is encouraged to extract in-
that determines a model performance relative to a reference model was herent associations between predictors and target variables, this should
only used once. be handled carefully as modelling with ANNs typically requires careful
consideration between model complexity and performance (Bishop,
1995; Hagan et al., 1995; Samarasinghe, 2006). The use of too many
3.8.3. Conclusion predictors can reduce the nonlinear mechanisms of ANNs to merely
Review of the 139 papers has indicated that a range of performance linear ones, defeating the entire point of utilising ANNs for air pollution
criteria, mainly those that assess the predictive performance of models, prediction in the first place. Furthermore, the use of parsimonious
was used predominantly by the identified papers. While this observa- models is typically encouraged in real-world pollution forecasting tasks
tion suggests a good modelling practice, the replicative and structure where data is limited. This highlights the need to build ANN models
validity of the models developed were generally ignored. Given the that are powerful enough to reveal inherent features from a small
black box nature of ANN models, the predictive aspect of model vali- number of predictors.
dation is not sufficient enough to fully assess the ability of the devel- The majority of the identified papers failed to provide details with
oped models to fully capture the underlying dynamics between pre- regards to the implementation of data pre-processing techniques. This
dictors and target variables. Hence, future works should incorporate can cast doubts to future modellers, especially those that attempt to
these additional aspects of validation to provide a wider assessment of repeat the methods and reproduce the findings of the said papers. On
their model performance. A full discussion on the comprehensive va- the other hand, most of the identified papers did not handle missing
lidity assessment of ANN models for prediction tasks can be found in data well by considering the list-wise deletion of predictors with
Humphrey et al. (2017). missing data. The effective use of data normalisation techniques should
299
be the focus of future research. Another aspect of the ANN model de- 1) Each step of the overall model development process of ANNs should
velopment that requires further attention is the implementation of be viewed as interconnected entities. That is, the links between each
predictor selection, as the majority of the identified papers selected modelling stage and the justification for the component chosen for
their model predictors in an ad-hoc fashion. Much attention should be each stage should be stated explicitly. There is always the tendency
directed towards the use of model-based global methods for predictor among modellers to regard one aspect of the modelling stage to not
selection, given the current limitations and difficulty in implementing have an influence on the others. As such, some specific aspects that
them. need further emphasis are the following:
Similar to the findings of other similar reviews (Gardner and a) the influence of the length of training data utilised on the overall
Dorling, 1998; Maier and Dandy, 2000; Maier et al., 2010; Wu et al., ANN model performance;
2014), the vast majority of the identified papers performed data divi- b) the significance of the selected predictors and utilised model
sion in an ad-hoc fashion. It has been pointed out that the said popular structure on the complexity and overall ANN model performance;
approach can entail uncertainties especially when assessing model and
performance as the resulting network can only generalise on the range c) the links between the selected data normalisation scheme and
of data inputs for which it was trained (Gardner and Dorling, 1998). transfer function utilised;
Alternative methods to data division methods should be examined and d) the influence of the adapted initialisation schemes for the
incorporated in future research efforts. With regards to model archi- weighting, bias and other training parameters on the overall ANN
tecture selection, the findings of this review found that feedforward model performance.
networks were predominantly used by the identified papers, most of 2) More emphasis should be given on the trade-off between the so-
which were MLPs. While MLPs are still found to be the dominant ANN phistication of a model and structure architecture adapted and the
architecture in this paper, they were also predominantly used as a overall model performance. Modellers tend to have the tendency to
benchmark against which to compare alternative architectures. This consider more sophisticated models without first examining the
finding is similar to those of several previous reviews (Gardner and ability of much simpler ones to perform a modelling task.
Dorling, 1998; Guoqiang Zhang and Patuwo, 1998; Sharma et al., 2005; Sometimes, model complexity sabotages the overall performance
Yetilmezsoy et al., 2011). MLPs were extensively used in the prediction and potential of an ANN model to be deployed in real-world fore-
and forecasting of water quality and quantity variables (Maier and casting tasks in terms of model parsimony.
Dandy, 2000; Maier et al., 2010; Wu et al., 2014). On the other hand, 3) Aside from the replicative, predictive and structural aspects of
there was a significant amount of experimentation with other types of model validity, future modellers should also assess a developed ANN
feedforward architectures, such as generalised regression neural net- model in terms of computational penalty and running time as these
works and radial basis function networks, and recurrent networks and, are essential factors for models to be practical enough in real-world
most importantly, different types of hybrid network architectures. It is modelling tasks. Additionally, uncertainty analysis of model results
also worth noting that the utilisation of hybrid ANN model archi- should be investigated more in future works. For ANN models to be
tectures has increased in the past decade. The development of hybrid fully-implementable in real-world applications, future efforts should
ANN model architectures is an important progress, as it emphasizes that be made towards the quantification of uncertainty in model results
“ANNs have a role to play not only as an alternative to traditional (Arhami et al., 2013; Borrego et al., 2008).
modelling approaches, but also as a complementary modelling tool that 4) More information regarding the software or computing environment
can be used to improve the performance of existing approaches” (Maier employed in the ANN model development should be provided. The
et al., 2010). built-in functions or libraries that perform some or all steps in
The findings of this review also suggest that more efforts are di- building ANN models between many software often possess minor
rected towards the use of global optimisation methods in determining or major difference. As such, the comparison of one modelling ap-
the optimal model structure, as the majority of the identified papers proach to another should be carried out in the same modelling en-
carried out ad-hoc approaches, primarily trial-and-error method. This vironment to avoid bias. Additionally, all software settings used in
trend may not be beneficial to future modellers in the field of air pol- building ANN models should be explicitly discloses to avoid ambi-
lution forecasting as ad-hoc methods offer limited repeatability due to guities and increase repeatability of results. Maier et al. (2010) also
their case-specific nature. Consequently, there is a need for a more advised the use of open access data sets in order to enable a better
established and systematic protocol for identifying optimal model comparison of ANN development methods across studies.
structures that caters to a wide range of model predictor-output dy- 5) Although each modelling task utilising data-driven models such as
namics. First-order local search procedures, such as the back- ANNs is problem-specific, the adoption of several protocols to
propagation algorithm, were primarily used, although second-order building ANN models should be adapted more seriously in the future
methods were also used extensively in order to improve the computa- to ensure a good modelling practice (Wu et al., 2014).
tional efficiency of ANN calibration. However, studies investigating the 6) A complete theoretical understanding of the principles behind the
potential benefits of using global optimisation techniques in terms of ANN modelling paradigm should be expected among modellers in
improving the predictive ability of ANN models are rather limited, order for the field of environmental modelling using ANNs to ad-
which is an area worthy of further exploration. In addition, although vance (Gardner and Dorling, 1998). The emergence of many easy-to-
some work was done on the incorporation of parameter uncertainty into use computing platforms supporting ANN model building tasks may
ANN model calibration, this also presents an area of future research. cause some modellers to take advantage of the black box nature of
Finally, different performance metrics in terms of the predictive validity ANNs, making the overall process more of an art rather than science.
of the models were utilised in the majority of the papers reviewed.
However, further efforts should be made to examine other aspects of
model performance such as replicative validity and structural validity. Acknowledgements
5. Recommendations for future work The authors would like to acknowledge the financial support pro-
vided by the British Council, Philippines (through grant number
Based on the review of 139 journal papers dealing with the fore- 261810845) and the Commission on Higher Education of the Republic
casting and prediction of ambient air pollution variables using ANNs of the Philippines that has enabled the production of this work.
published between January 2001 and February 2019, the following
recommendations for future research are made:
300
Appendix A. Supplementary data Bowden, G.J., Dandy, G.C., Maier, H.R., 2003. Data transformation for neural network
models in water resources applications. Retrieved from. https://iwaponline.com/jh/
article-pdf/5/4/245/392619/245.pdf.
Supplementary data to this article can be found online at https:// Bowden, G.J., Maier, H.R., Dandy, G.C., 2002. Optimal division of data for neural net-
doi.org/10.1016/j.envsoft.2019.06.014. work models in water resources applications. Water Resour. Res. 38 (2) 2-1-2–11.
https://doi.org/10.1029/2001WR000266.
Bowden, G.J., Maier, H.R., Dandy, G.C., 2012. Real-time deployment of artificial neural
References network forecasting models: understanding the range of applicability. Water Resour.
Res. 48 (10), 1–16. https://doi.org/10.1029/2012WR011984.
Abderrahim, H., Chellali, M.R., Hamou, A., 2016. Forecasting PM10 in Algiers: efficacy of Brunelli, U., Piazza, V., Pignato, L., Sorbello, F., Vitabile, S., 2007. Two-days ahead
multilayer perceptron networks. Environ. Sci. Pollut. Control Ser. 23 (2), 1634–1641. prediction of daily maximum concentrations of SO2, O3, PM10, NO2, CO in the urban
https://doi.org/10.1007/s11356-015-5406-6. area of Palermo, Italy. Atmos. Environ. 41 (14), 2967–2995. https://doi.org/10.
Abdul-Wahab, S., Al-Alawi, S., 2002. Assessment and prediction of tropospheric ozone 1016/j.atmosenv.2006.12.013.
concentration levels using artificial neural networks. Environ. Model. Softw 17 (3), Cabaneros, S.M.S., Calautit, J.K.S., Hughes, B.R., 2017. Hybrid artificial neural network
219–228. https://doi.org/10.1016/S1364-8152(01)00077-9. models for effective prediction and mitigation of urban roadside NO2 pollution.
Agirre-Basurko, E., Ibarra-Berastegi, G., Madariaga, I., 2006. Regression and multilayer Energy Procedia 142, 3524–3530. https://doi.org/10.1016/J.EGYPRO.2017.12.240.
perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Catalano, M., Galatioto, F., Bell, M., Namdeo, A., Bergantino, A.S., 2016. Improving the
Environ. Model. Softw 21 (4), 430–446. https://doi.org/10.1016/j.envsoft.2004.07. prediction of air pollution peak episodes generated by urban transport networks.
008. Environ. Sci. Policy 60, 69–83. https://doi.org/10.1016/j.envsci.2016.03.008.
Al-Alawi, S.M., Abdul-Wahab, S.A., Bakheit, C.S., 2008. Combining principal component Chang, M.E., Cardelino, C., 2000. Application of the Urban Airshed Model to forecasting
regression and artificial neural networks for more accurate predictions of ground- next-day peak ozone concentrations in Atlanta, Georgia. J. Air Waste Manag. Assoc.
level ozone. Environ. Model. Softw 23 (4), 396–403. https://doi.org/10.1016/j. 50 (11), 2010–2024. 1995. https://doi.org/10.1080/10473289.2000.10464219.
envsoft.2006.08.007. Chattopadhyay, S., Chattopadhyay, G., 2012. Modeling and prediction of monthly total
Alam, M.S., McNabola, A., 2015. Exploring the modeling of spatiotemporal variations in ozone concentrations by use of an artificial neural network based on principal
ambient air pollution within the land use regression framework: estimation of PM 10 component analysis. Pure Appl. Geophys. 169, 1891–1908. https://doi.org/10.1007/
concentrations on a daily basis. J. Air Waste Manag. Assoc. 65 (5), 628–640. https:// s00024-011-0437-5.
doi.org/10.1080/10962247.2015.1006377. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic
Alimissis, A., Philippopoulos, K., Tzanis, C.G., Deligiorgi, D., 2018. Spatial estimation of minority over-sampling technique. J. Artif. Intell. Res. 16 Retrieved from. https://
urban air pollution with the use of artificial neural network models. Atmos. Environ. arxiv.org/pdf/1106.1813.pdf.
191, 205–213. https://doi.org/10.1016/J.ATMOSENV.2018.07.058. Chelani, A.B., Chalapati Rao, C., Phadke, K., Hasan, M., 2002. Prediction of sulphur di-
Antanasijević, D., Pocajt, V., Perić-Grujić, A., Ristić, M., 2018. Multiple-input–multiple- oxide concentration using artificial neural networks. Environ. Model. Softw 17 (2),
output general regression neural networks model for the simultaneous estimation of 159–166. https://doi.org/10.1016/S1364-8152(01)00061-5.
traffic-related air pollutant emissions. Atmosph. Pollut. Res. 9 (2), 388–397. https:// Chelani, A.B., Singh, R.N., Devotta, S., 2005. Nonlinear dynamical characterization and
doi.org/10.1016/j.apr.2017.10.011. prediction of ambient nitrogen dioxide concentration. Water Air Soil Pollut. 166
Antanasijević, D.Z., Pocajt, V.V., Povrenović, D.S., Ristić, M.Đ., Perić-Grujić, A.A., 2013. (1–4), 121–138. https://doi.org/10.1007/s11270-005-7384-7.
PM10 emission forecasting using artificial neural networks and genetic algorithm Chellali, M.R., Abderrahim, H., Hamou, A., Nebatti, A., Janovec, J., 2016. Artificial
input variable optimization. Sci. Total Environ. 443, 511–519. https://doi.org/10. neural network models for prediction of daily fine particulate matter concentrations
1016/j.scitotenv.2012.10.110. in Algiers. Environ. Sci. Pollut. Control Ser. 14008–14017. https://doi.org/10.1007/
Arhami, M., Kamali, N., Rajabi, M.M., 2013. Predicting hourly air pollutant levels using s11356-016-6565-9.
artificial neural networks coupled with uncertainty analysis by Monte Carlo simu- Chen, S.H., Jakeman, A.J., Norton, J.P., 2008. Artificial Intelligence techniques: an in-
lations. Environ. Sci. Pollut. Control Ser. 20 (7), 4777–4789. https://doi.org/10. troduction to their use for modelling environmental systems. Math. Comput. Simulat.
1007/s11356-012-1451-6. 78 (2–3), 379–400. https://doi.org/10.1016/j.matcom.2008.01.028.
Ashmore, M.R., Dimitroulopoulou, C., Byrne, M.A., Kinnersley, R.P., 2001. Modelling of Chuang, M.T., Zhang, Y., Kang, D., 2011. Application of WRF/Chem-MADRID for real-
indoor exposure to nitrogen dioxide in the UK. Atmos. Environ. 35, 269–279. time air quality forecasting over the Southeastern United States. Atmos. Environ. 45
Retrieved from. https://pdf.sciencedirectassets.com/271798/1-s2.0- (34), 6241–6250. https://doi.org/10.1016/j.atmosenv.2011.06.071.
S1352231000X01549/1-s2.0-S135223100000176X/main.pdf?x-amz-security- Colls, J., 2001. second ed. Air Pollution, vol 29 Spon Press, West 35th Street, New York,
token=AgoJb3JpZ2luX2VjEB8aCXVzLWVhc3QtMSJIMEYCIQD NY 10001.
%2BBJkssu7s8wqKquYtpBljYPISp2VHr%2BT28DxjHvsH Coman, A., Ionescu, A., Candau, Y., 2008. Hourly ozone prediction for a 24-h horizon
%2FgIhAKpDwMpcYXqzSVjS8emlqNIptxW3d9PtZFB%2F. using neural networks. Environ. Model. Softw 23 (12), 1407–1421. https://doi.org/
Azid, A., Juahir, H., Toriman, M.E., 2014. Prediction of the level of air pollution using 10.1016/J.ENVSOFT.2008.04.004.
principal component analysis and artificial neural network Techniques : a case study Cortina-Januchs, M.G., Dominguez, J.Q., Corona, A.V., Andina, D., 2015. Development of
in Malaysia. https://doi.org/10.1007/s11270-014-2063-1. a model for forecasting of PM10 concentrations in Salamanca, Mexico. Atmosph.
Bai, Y., Li, Y., Wang, X., Xie, J., Li, C., 2016. Air pollutants concentrations forecasting Pollut. Res. 6 (4), 626–634. https://doi.org/10.5094/APR.2015.071.
using back propagation neural network based on wavelet decomposition with me- Dawson, C.W., Abrahart, R.J., See, L.M., 2007. HydroTest: a web-based toolbox of eva-
teorological conditions. Atmosph. Pollut. Res. 7 (3), 557–566. https://doi.org/10. luation metrics for the standardised assessment of hydrological forecasts. Environ.
1016/j.apr.2016.01.004. Model. Softw 22 (7), 1034–1052. https://doi.org/10.1016/J.ENVSOFT.2006.06.008.
Bai, Y., Zeng, B., Li, C., Zhang, J., 2019. An ensemble long short-term memory neural Demir, G., Ozcan, K., Ucan, O.N., Bayat, C., 2010. An Artificial Neural Network ( ANN )
network for hourly PM 2.5 concentration forecasting. Chemosphere. https://doi.org/ based model for short-term predictions of daily mean PM10 concentrations . An
10.1016/j.chemosphere.2019.01.121. Artificial Neural Network-based model. J. Environ. Protect. Ecol. 11 (3), 1163–1171
Baklanov, A., Hänninen, O., Slørdal, L.H., Kukkonen, J., Bjergene, N., Fay, B., et al., 2007. 2010.
Integrated systems for forecasting urban meteorology, air pollution and population Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete
exposure. Atmos. Chem. Phys. Atmosph. Chem. Phys. 7, 855–874. Retrieved from. data via the EM algorithm. J. R. Stat. Soc. Ser. B 39 Retrieved from. http://web.mit.
www.atmos-chem-phys.net/7/855/2007/. edu/6.435/www/Dempster77.pdf.
Barrón-adame, J.M., Cortina-januchs, M.G., Vega-corona, A., Andina, D., 2012. Expert Díaz-Robles, L.A., Ortega, J.C., Fu, J.S., Reed, G.D., Chow, J.C., Watson, J.G., Moncada-
systems with applications unsupervised system to classify SO 2 pollutant con- Herrera, J.A., 2008. A hybrid ARIMA and artificial neural networks model to forecast
centrations in salamanca , Mexico. Expert Syst. Appl. 39, 107–116. https://doi.org/ particulate matter in urban areas: the case of Temuco, Chile. Atmos. Environ. 42 (35),
10.1016/j.eswa.2011.05.083. 8331–8340. https://doi.org/10.1016/j.atmosenv.2008.07.020.
Bashir, F., Wei, H.-L., 2017. Handling missing data in multivariate time series using a Ding, W., Zhang, J., Leung, Y., 2016. Prediction of air pollutant concentration based on
vector autoregressive model-imputation (VAR-IM) algorithm. Neurocomputing 276, sparse response back-propagation training feedforward neural networks. Environ.
23–30. https://doi.org/10.1016/j.neucom.2017.03.097. Sci. Pollut. Control Ser. 19481–19494. https://doi.org/10.1007/s11356-016-7149-4.
Biancofiore, F., Busilacchio, M., Verdecchia, M., Tomassetti, B., Aruffo, E., Bianco, S., Dominick, D., Talib Latif, M., Juahir, H., Aris, A.Z., Zain, S., 2012. An assessment of
et al., 2017a. Recursive neural network model for analysis and forecast of PM10 and influence of meteorological factors on PM10 and NO2 at selected stations in
PM2.5. Atmosph. Pollut. Res. 10, 8–15. https://doi.org/10.1016/j.apr.2016.12.014. Malaysia. Sustain. Environ. Res. 22 (5), 305–315.
Biancofiore, F., Busilacchio, M., Verdecchia, M., Tomassetti, B., Aruffo, E., Bianco, S., Dotse, S.Q., Petra, M.I., Dagar, L., De Silva, L.C., 2018. Application of computational
et al., 2017b. Recursive neural network model for analysis and forecast of PM10 and intelligence techniques to forecast daily PM10exceedances in Brunei Darussalam.
PM2.5. Atmosph. Pollut. Res. 8 (4), 652–659. https://doi.org/10.1016/j.apr.2016. Atmosph. Pollut. Res. 9 (2), 358–368. https://doi.org/10.1016/j.apr.2017.11.004.
12.014. Drummond, C., Holte, R.C., 2003. C4.5, class imbalance, and cost sensitivity: why under-
Biancofiore, F., Verdecchia, M., Di, P., Tomassetti, B., Aruffo, E., Busilacchio, M., et al., sampling beats over-sampling. Retrieved from. https://pdfs.semanticscholar.org/
2015. Science of the Total Environment Analysis of surface ozone using a recurrent 144b/bbafe2f0876c23295019b6e380c9fe4feda3.pdf.
neural network. Sci. Total Environ. 514, 379–387. https://doi.org/10.1016/j. Dunea, D., Pohoata, A., Iordache, S., 2015. Using wavelet–feedforward neural networks
scitotenv.2015.01.106. to improve air pollution forecasting in urban environments. Environ. Monit. Assess.
Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University Press, 187 (7). https://doi.org/10.1007/s10661-015-4697-x.
Inc, New York, NY, USA. Durao, R.M., Mendes, M.T., Joao Pereira, M., 2016. Forecasting O3 levels in industrial
Borrego, C., Monteiro, A., Ferreira, J., Miranda, A.I., Costa, A.M., Carvalho, A.C., Lopes, area surroundings up to 24??h in advance, combining classification trees and MLP
M., 2008. Procedures for estimation of modelling uncertainty in air quality assess- models. Atmosph. Pollut. Res. 7 (6), 961–970. https://doi.org/10.1016/j.apr.2016.
ment. Environ. Int. 34 (5), 613–620. https://doi.org/10.1016/j.envint.2007.12.005. 05.008.
301
Dursun, S., Taylan, F.K.O., 2015. Modelling sulphur dioxide levels of Konya city using artificial neural network model and its application for quantifying impact factors of
artificial intelligent related to ozone , nitrogen dioxide and meteorological factors. urban air quality. Water Air Soil Pollut. 227 (7), 235. https://doi.org/10.1007/
Int. J. Environ. Sci. Technol. 12 (12), 3915–3928. https://doi.org/10.1007/s13762- s11270-016-2930-z.
015-0821-2. Heo, J., Kim, D., 2004. A new method of ozone forecasting using fuzzy expert and neural
Dutot, A.-L., Rynkiewicz, J., Steiner, F.E., Rude, J., 2007a. A 24-h forecast of ozone peaks network systems 325. pp. 221–237. https://doi.org/10.1016/j.scitotenv.2003.11.
and exceedance levels using neural classifiers and weather predictions. Environ. 009.
Model. Softw 22 (9), 1261–1269. https://doi.org/10.1016/J.ENVSOFT.2006.08.002. Hooyberghs, J., Mensink, C., Dumont, G., Fierens, F., Brasseur, O., 2005. A neural net-
Dutot, A.L., Rynkiewicz, J., Steiner, F.E., Rude, J., 2007b. A 24-h forecast of ozone peaks work forecast for daily average PM10concentrations in Belgium. Atmos. Environ. 39
and exceedance levels using neural classifiers and weather predictions. Environ. (18), 3279–3289. https://doi.org/10.1016/j.atmosenv.2005.01.050.
Model. Softw 22 (9), 1261–1269. https://doi.org/10.1016/j.envsoft.2006.08.002. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are
Elangasinghe, M.A., Singhal, N., Dirks, K.N., Salmond, J.A., 2014. Development of an universal approximators. Neural Network. 2 (5), 359–366. https://doi.org/10.1016/
ANN–based air pollution forecasting system with explicit knowledge through sensi- 0893-6080(89)90020-8.
tivity analysis. Atmosph. Pollut. Res. 5 (4), 696–708. https://doi.org/10.5094/APR. Hoshyaripour, G., Brasseur, G., Andrade, M.F., Gavidia-Calder, M., Bouarar, I., Ynoue,
2014.079. R.Y., 2016. Prediction of ground-level ozone concentration in S??o Paulo, Brazil:
Ettouney, R.S., Wahab, S.A., Elkilani, A.S., 2009. Emissions inventory , ISCST , and neural deterministic versus statistic models. Atmos. Environ. 145, 365–375. https://doi.org/
network modelling of air pollution in Kuwait Emissions inventory , ISCST , and neural 10.1016/j.atmosenv.2016.09.061.
network modelling of air pollution in Kuwait. Int. J. Environ. Stud. 7233. https://doi. Hrust, L., Klaić, Z.B., Križan, J., Antonić, O., Hercog, P., 2009. Neural network forecasting
org/10.1080/00207230902859929. of air pollutants hourly concentrations using optimised temporal averages of me-
Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., Wang, J., 2015. Artificial neural networks teorological variables and pollutant concentrations. Atmos. Environ. 43 (35),
forecasting of PM2.5 pollution using air mass trajectory based geographic model and 5588–5596. https://doi.org/10.1016/J.ATMOSENV.2009.07.048.
wavelet transformation. Atmos. Environ. 107, 118–128. https://doi.org/10.1016/j. Huang, C.J., Kuo, P.H., 2018. A deep cnn-lstm model for particulate matter (Pm2.5)
atmosenv.2015.02.030. forecasting in smart cities. Sensors 18 (7). https://doi.org/10.3390/s18072220.
Feng, Y., Zhang, W., Sun, D., Zhang, L., 2011. Ozone concentration forecast method based Humphrey, G.B., Maier, H.R., Wu, W., Mount, N.J., Dandy, G.C., Abrahart, R.J., Dawson,
on genetic algorithm optimized back propagation neural networks and support vector C.W., 2017. Improved validation framework and R-package for artificial neural
machine data classi fi cation. Atmos. Environ. 45 (11), 1979–1985. https://doi.org/ network models. Environ. Model. Softw 92, 82–106. https://doi.org/10.1016/J.
10.1016/j.atmosenv.2011.01.022. ENVSOFT.2017.01.023.
Fernando, H.J.S., Mammarella, M.C., Grandoni, G., Fedele, P., Di Marco, R., Dimitrova, Hunter, J.M., Maier, H.R., Gibbs, M.S., Foale, E.R., Grosvenor, N.A., Harders, N.P.,
R., Hyde, P., 2012. Forecasting PM10 in metropolitan areas: efficacy of neural net- Kikuchi-Miller, T.C., 2018. Framework for developing hybrid process-driven, artifi-
works. Environ. Pollut. 163, 62–67. https://doi.org/10.1016/j.envpol.2011.12.018. cial neural network and regression models for salinity prediction in river systems.
Folch-Fortuny, A., Arteaga, F., Ferrer, A., 2015. PCA model building with missing data: Hydrol. Earth Syst. Sci. 22 (5), 2987–3006. https://doi.org/10.5194/hess-22-2987-
new proposals and a comparative study. Chemometr. Intell. Lab. Syst. 146, 77–88. 2018.
https://doi.org/10.1016/J.CHEMOLAB.2015.05.006. Ibarra-Berastegi, G., Elias, A., Barona, A., Saenz, J., Ezcurra, A., Diaz de Argandoña, J.,
Fontes, T., Silva, L.M., Silva, M.P., Barros, N., Carvalho, A.C., 2014. Can artificial neural 2008. From diagnosis to prognosis for forecasting air pollution using neural net-
networks be used to predict the origin of ozone episodes? Sci. Total Environ. works: air pollution monitoring in Bilbao. Environ. Model. Softw 23 (5), 622–637.
488–489, 197–207. https://doi.org/10.1016/J.SCITOTENV.2014.04.077. https://doi.org/10.1016/J.ENVSOFT.2007.09.003.
Franceschi, F., Cobo, M., Figueredo, M., 2018. Discovering relationships and forecasting IEEE Spectrum, 2018. The 2018 top programming languages - IEEE Spectrum. Retrieved
PM10and PM2.5concentrations in bogotá, Colombia, using artificial neural networks, May 10, 2019, from. https://spectrum.ieee.org/at-work/innovation/the-2018-top-
principal component analysis, and k-means clustering. Atmosph. Pollut. Res programming-languages.
February), 0–1. https://doi.org/10.1016/j.apr.2018.02.006. IEEE Xplore, 2017. IEEE Xplore Digital Library. Retrieved. http://ieeexplore.ieee.org/
Freeman, B.S., Taylor, G., Gharabaghi, B., Thé, J., 2018. Forecasting air quality time Xplore/home.jsp, Accessed date: 27 October 2017.
series using deep learning. J. Air Waste Manag. Assoc. 68 (8), 866–886. https://doi. Inal, F., 2010. Artificial neural network prediction of tropospheric ozone concentrations
org/10.1080/10962247.2018.1459956. in istanbul, Turkey. Clean. - Soil, Air, Water 38 (10), 897–908. https://doi.org/10.
Galatioto, F., Zito, P., 2009. Traffic parameters estimation to predict road side pollutant 1002/clen.201000138.
concentrations using neural networks. Environmental modeling & assessment Jain, S., Khare, M., 2010. Adaptive Neuro-Fuzzy Modeling for Prediction of Ambient CO
365–374. https://doi.org/10.1007/s10666-007-9129-z. Concentration at Urban Intersections and Roadways, vols 203–212. https://doi.org/
Galelli, S., Humphrey, G.B., Maier, H.R., Castelletti, A., Dandy, G.C., Gibbs, M.S., 2014. 10.1007/s11869-010-0073-8.
An evaluation framework for input variable selection algorithms for environmental Jakeman, A.J., Letcher, R.A., Norton, J.P., 2006. Ten iterative steps in development and
data-driven models. Environ. Model. Softw 62, 33–51. https://doi.org/10.1016/J. evaluation of environmental models. Environ. Model. Softw 21 (5), 602–614. https://
ENVSOFT.2014.08.015. doi.org/10.1016/J.ENVSOFT.2006.01.004.
Gao, M., Yin, L., Ning, J., 2018. Artificial neural network model for ozone concentration Jiang, D., Zhang, Y., Hu, X., Zeng, Y., Tan, J., Shao, D., 2004. Progress in developing an
estimation and Monte Carlo analysis. https://doi.org/10.1016/j.atmosenv.2018.03. ANN model for air pollution index forecast. Atmos. Environ. 38, 7055–7064. https://
027. doi.org/10.1016/j.atmosenv.2003.10.066.
Gardner, M., Dorling, S., 1998. Artificial neural networks (the multilayer perceptron)—a Jiang, P., Dong, Q., Li, P., 2017. A novel hybrid strategy for PM2.5 concentration analysis
review of applications in the atmospheric sciences. Atmos. Environ. 32 (14–15), and prediction. J. Environ. Manag. 196, 443–457. https://doi.org/10.1016/j.
2627–2636. https://doi.org/10.1016/S1352-2310(97)00447-0. jenvman.2017.03.046.
Gass, S.I., 1983. Feature article-decision-aiding models: validation, assessment, and re- Jiang, P., Li, C., Li, R., Yang, H., 2018. An innovative hybrid air pollution early-warning
lated issues for policy analysis. https://doi.org/10.1287/opre.31.4.603. system based on pollutants forecasting and Extenics evaluation. Knowl. Based Syst.
Gennaro, G. De, Trizio, L., Di, A., Pey, J., Pérez, N., Cusack, M., et al., 2013. Science of the https://doi.org/10.1016/J.KNOSYS.2018.10.036.
Total Environment Neural network model for the prediction of PM 10 daily con- Juhos, I., Makra, L., Tóth, B., 2009. The behaviour of the multi-layer perceptron and the
centrations in two sites in the Western Mediterranean. Sci. Total Environ. 463–464, support vector regression learning methods in the prediction of NO and NO2 con-
875–883. The. https://doi.org/10.1016/j.scitotenv.2013.06.093. centrations in Szeged, Hungary. Neural Comput. Appl. 18 (2), 193–205. https://doi.
Gong, B., Ordieres-Meré, J., 2016. Prediction of daily maximum ozone threshold ex- org/10.1007/s00521-007-0171-1.
ceedances by preprocessing and ensemble artificial intelligence techniques: case Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., 2004. Methods for imputation of
study of Hong Kong. Environ. Model. Softw 84, 290–303. https://doi.org/10.1016/j. missing values in air quality data sets 38. pp. 2895–2907. https://doi.org/10.1016/j.
envsoft.2016.06.020. atmosenv.2004.02.026.
Grivas, G., Chaloulakou, A.Ã., 2006. Artificial neural network models for prediction of PM Kao, J.J., Huang, S.S., 2000. Forecasts using neural network versus box-jenkins metho-
10 hourly concentrations , in the Greater Area of Athens , Greece. Atmos. Environ. 40, dology for ambient air quality monitoring data. J. Air Waste Manag. Assoc. 50 (2),
1216–1229. https://doi.org/10.1016/j.atmosenv.2005.10.036. 219–226. https://doi.org/10.1080/10473289.2000.10463997.
Guardani, R., Nascimento, C.A.O., Guardani, M.L.G., Martins, M.H.R.B., Romano, J., Kingston, G.B., Lambert, M.F., Maier, H.R., 2005a. Bayesian training of artificial neural
1999. Journal of the air & Waste management association study of atmospheric ozone networks used for water resources modeling. Water Resour. Res. 41 (12), 1–11.
formation by means of a neural network-based model study of atmospheric ozone https://doi.org/10.1029/2005WR004152.
formation by means of a neural network-based model. J. Air Waste Manag. Assoc. 49, Kingston, G.B., Maier, H.R., Lambert, M.F., 2005b. Calibration and validation of neural
316–323. https://doi.org/10.1080/10473289.1999.10463806. networks to ensure physically plausible hydrological modeling. J. Hydrol. 314 (1–4),
Guoqiang Zhang, B., Eddy Patuwo, M.Y.H., 1998. Forecasting with artificial neural net- 158–176. https://doi.org/10.1016/J.JHYDROL.2005.03.013.
works: the state of the art. Int. J. Forecast. 14, 35–62. https://doi.org/10.1016/ Kingston, G.B., Maier, H.R., Lambert, M.F., 2008. Bayesian model selection applied to
S0169-2070(97)00044-7. artificial neural networks used for water resources modeling. Water Resour. Res. 44
Hagan, M.T., Demuth, H.B., Beale, M.H., 1995. Neural network design. Boston (4). https://doi.org/10.1029/2007WR006155.
Massachusetts PWS 2. pp. 734. https://doi.org/10.1007/1-84628-303-5. Kolehmainen, M., Martikainen, H., Ruuskanen, J., 2001. Neural networks and periodic
Hasham, F.A., Kindzierski, W.B., Stanley, S.J., 2004. Modeling of hourly NO x con- components used in air quality forecasting. Atmos. Environ. 35 (5), 815–825. https://
centrations using artificial neural networks 1. J. Environ. Eng. Sci. 3 (x), 111–119. doi.org/10.1016/S1352-2310(00)00385-X.
https://doi.org/10.1139/S03-084. Kotzias, D., Geiss, O., Tirendi, S., Barrero-Moreno, J., Reina, V., Gotti, A., et al., 2009.
He, H., Lu, W.-Z., Xue, Y., 2014. Prediction of particulate matter at street level using Public buildings, schools and kindergartensthe european indoor air monitoring and
artificial neural networks coupling with chaotic particle swarm optimization algo- exposure assessment (airmex) study. Eur. Comm. Joint Res. Centr. Inst. Health
rithm. Build. Environ. 78, 111–117. https://doi.org/10.1016/J.BUILDENV.2014.04. Consum. Protect. 18 (5), 670–681.
011. Kukkonen, J., Partanen, L., Karppinen, A., Ruuskanen, J., Junninen, H., Kolehmainen, M.,
He, J., Yu, Y., Xie, Y., Mao, H., Wu, L., Liu, N., Zhao, S., 2016. Numerical model-based et al., 2003. Extensive evaluation of neural network models for the prediction of NO2
302
and PM10 concentrations, compared with a deterministic modelling system and Kolehmainen, M., 2005. Evaluation of an integrated modelling system containing a
measurements in central Helsinki. Atmos. Environ. 37 (32), 4539–4550. https://doi. multi-layer perceptron model and the numerical weather prediction model HIRLAM
org/10.1016/S1352-2310(03)00583-1. for the forecasting of urban airborne pollutant concentrations. Atmos. Environ. 39
Kumar, N., Middey, A., Rao, P.S., 2017. Prediction and examination of seasonal variation (35), 6524–6536. https://doi.org/10.1016/j.atmosenv.2005.07.035.
of ozone with meteorological parameter through artificial neural network at NEERI, Nunnari, G., 2004. Modelling air pollution time-series by using wavelet functions and
Nagpur, India. Urban Clim. 20 (2), 148–167. https://doi.org/10.1016/j.uclim.2017. genetic algorithms. Soft Comput. 8, 173–178. https://doi.org/10.1007/s00500-002-
04.003. 0260-0.
Kurt, A., Oktay, A.B., 2010. Forecasting air pollutant indicator levels with geographic Olcese, L.E., Toselli, B.M., 2004. A method to estimate emission rates from industrial
models 3 days in advance using neural networks. Expert Syst. Appl. 37 (12), stacks based on neural networks. Chemosphere 57, 691–696. https://doi.org/10.
7986–7992. https://doi.org/10.1016/j.eswa.2010.05.093. 1016/j.chemosphere.2004.07.045.
Li, C., Zhu, Z., 2018. Research and application of a novel hybrid air quality early-warning Olden, J.D., Jackson, D.A., 2002. Illuminating the “black box”: a randomization approach
system: a case study in China. Sci. Total Environ. 626, 1421–1438. https://doi.org/ for understanding variable contributions in artificial neural networks. Ecol. Model.
10.1016/j.scitotenv.2018.01.195. 154 (1–2), 135–150. https://doi.org/10.1016/S0304-3800(02)00064-9.
Li, T., Shen, H., Yuan, Q., Zhang, X., Zhang, L., 2017. Estimating ground-level PM2.5 by Ordieres, J.B., Vergara, E.P., Capuz, R.S., Salazar, R.E., 2005. Neural network prediction
fusing satellite and station observations: a geo-intelligent deep learning approach. model for fine particulate matter (PM2.5) on the US–Mexico border in El Paso (Texas)
Geophys. Res. Lett. 44 (23) 11,985-11,993. https://doi.org/10.1002/ and Ciudad Juárez (Chihuahua). Environ. Model. Softw 20 (5), 547–559. https://doi.
2017GL075710. org/10.1016/J.ENVSOFT.2004.03.010.
Li, X., Peng, L., Hu, Y., Shao, J., Chi, T., 2016. Deep learning architecture for air quality Organisation for Economic Co-operation and Development, 2016. Policy Highlights - the
predictions. Environ. Sci. Pollut. Control Ser. 23 (22), 22408–22417. https://doi.org/ Economic Consequences of Outdoor Air Pollution.
10.1007/s11356-016-7812-9. Osowski, S., Garanty, K., 2007. Forecasting of the daily meteorological pollution using
Lightstone, S.D., Moshary, F., Gross, B., 2017. Comparing CMAQ forecasts with a neural wavelets and support vector machine. Eng. Appl. Artif. Intell. 20 (6), 745–755.
network forecast model for PM2.5 in New York. Atmosphere 8 (9). https://doi.org/ https://doi.org/10.1016/j.engappai.2006.10.008.
10.3390/atmos8090161. Özdemir, U., Taner, S., 2014. Impacts of meteorological factors on PM 10 : artificial
Liu, H., Wu, H., Lv, X., Ren, Z., Liu, M., Li, Y., Shi, H., 2019. An intelligent hybrid model neural networks ( ANN ) and multiple linear regression ( MLR ) approaches con-
for air pollutant concentrations forecasting: case of Beijing in China Keywords: air tributed articles impacts of meteorological factors on PM 10 : artificial neural net-
pollutant concentrations forecasting Empirical wavelet transform Multi-agent evo- works ( ANN ) and multiple linear regr. Environmental Forensics. 5922. https://doi.
lutionary genetic algorithm Nonlinear auto regressive models w. Sustain. Cities Soc. org/10.1080/15275922.2014.950774.
https://doi.org/10.1016/j.scs.2019.101471. Pak, U., Kim, C., Ryu, U., Sok, K., Pak, S., 2018. A hybrid model based on convolutional
López, V., Fernández, A., Moreno-Torres, J.G., Herrera, F., 2012. Analysis of preproces- neural networks and long short-term memory for ozone concentration prediction. Air
sing vs. cost-sensitive learning for imbalanced classification. Open problems on in- Qual. Atmosph. Health 11 (8), 883–895. https://doi.org/10.1007/s11869-018-
trinsic data characteristics. Expert Syst. Appl. 39 (7), 6585–6608. https://doi.org/10. 0585-1.
1016/J.ESWA.2011.12.043. Papaleonidas, A., Iliadis, L., 2013. Neurocomputing Techniques to Dynamically Forecast
Lu, W.Z., Wang, W.J., Wang, X.K., Xu, Z.B., T Leung, A.Y., 2003. Using improved neural Spatiotemporal Air Pollution Data. pp. 221–233. https://doi.org/10.1007/s12530-
network model to analyze rsp, NOx and NO2 levels in urban air in mong kok, Hong 013-9078-5.
Kong. Environ. Monit. Assess. 87 (2), 235–254. Paschalidou, A.K., Karakitsios, S., Kleanthous, S., Kassomenos, P.A., 2011. Forecasting
Luna, A.S., 2014. Prediction of ozone concentration in tropospheric levels using arti fi cial hourly PM10 concentration in Cyprus through artificial neural networks and multiple
neural networks and support vector machine at Rio de. Atmos. Environ. 98, 98–104. regression models: implications to local environmental management. Environ. Sci.
https://doi.org/10.1016/j.atmosenv.2014.08.060. Pollut. Control Ser. 18 (2), 316–327. https://doi.org/10.1007/s11356-010-0375-2.
Mahapatra, A., 2010. Prediction of daily ground-level ozone concentration. Environ. Peng, H., Lima, A.R., Teakles, A., Jin, J., Cannon, A.J., Hsieh, W.W., 2017. Evaluating
Monit. Assess. 170, 159–170. https://doi.org/10.1007/s10661-009-1223-z. hourly air quality forecasting in Canada with nonlinear updatable machine learning
Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of methods. Air Qual. Atmosph. Health 10 (2), 195–211. https://doi.org/10.1007/
water resources variables: a review of modelling issues and applications. Environ. s11869-016-0414-3.
Model. Softw 15 (1), 101–124. https://doi.org/10.1016/S1364-8152(99)00007-9. Perez, P., Trier, A., 2001. Prediction of NO and NO 2 concentrations near a street with
Maier, H.R., Jain, A., Dandy, G.C., Sudheer, K.P., 2010. Methods used for the develop- heavy traf c in Santiago. Chile. Atmos. Environ. 35 (10), 1783–1789.
ment of neural networks for the prediction of water resource variables in river sys- Perez, Patricio, 2012. Combined model for PM10 forecasting in a large city. Atmos.
tems: current status and future directions. Environ. Model. Softw 25 (8), 891–909. Environ. 60, 271–276. https://doi.org/10.1016/j.atmosenv.2012.06.024.
https://doi.org/10.1016/J.ENVSOFT.2010.02.003. Perez, Patricio, Salini, G., 2008. PM2.5forecasting in a large city: comparison of three
Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2018. The M4 Competition: results, methods. Atmos. Environ. 42 (35), 8219–8224. https://doi.org/10.1016/j.atmosenv.
findings, conclusion and way forward. Int. J. Forecast. 34 (4), 802–808. https://doi. 2008.07.035.
org/10.1016/j.ijforecast.2018.06.001. Pires, J.C.M., Ferraz, M.C.M.A., Pereira, M.C., Martins, F.G., Martins, F.G., 2010.
Mao, X., Shen, T., Feng, X., 2017. Prediction of hourly ground-level PM2.5concentrations Atmospheric Pollution Research Evolutionary procedure based model to predict
3 days in advance using neural networks with satellite data in eastern China. ground – level ozone concentrations. Atmosph. Pollut. Res. 1 (4), 215–219. https://
Atmosph. Pollut. Res. 8 (6), 1005–1015. https://doi.org/10.1016/j.apr.2017.04.002. doi.org/10.5094/APR.2010.028.
Martín, M.L., Turias, I.J., González, F.J., Galindo, P.L., Trujillo, F.J., Puntonet, C.G., Pisoni, E., Farina, M., Carnevale, C., Piroddi, L., 2009. Forecasting peak air pollution
Gorriz, J.M., 2008. Prediction of CO maximum ground level concentrations in the Bay levels using NARX models. Eng. Appl. Artif. Intell. 22 (4–5), 593–602. https://doi.
of Algeciras, Spain using artificial neural networks. Chemosphere 70 (7), 1190–1195. org/10.1016/j.engappai.2009.04.002.
https://doi.org/10.1016/j.chemosphere.2007.08.039. Plaia, A.Ã., Bondı, A.L., 2006. Single imputation method of missing values in environ-
Mishra, D., Goyal, P., 2015. Development of artificial intelligence based NO2 forecasting mental pollution data sets. Atmos. Environ. 40, 7316–7330. https://doi.org/10.
models at Taj Mahal, Agra. Atmosph. Pollut. Res. 6 (1), 99–106. https://doi.org/10. 1016/j.atmosenv.2006.06.040.
5094/APR.2015.012. Prakash, A., Kumar, U., Kumar, K., Jain, V.K., 2011. A wavelet-based neural network
Mishra, D., Goyal, P., 2016. Neuro-Fuzzy approach to forecasting Ozone Episodes over the model to predict ambient air pollutants' concentration. Environ. Model. Assess. 16
urban area of Delhi, India. Environ. Technol. Innov. 5, 83–94. https://doi.org/10. (5), 503–517. https://doi.org/10.1007/s10666-011-9270-6.
1016/J.ETI.2016.01.001. ProQuest, 2017. ProQuest | Databases, EBooks and Technology for Research. Retrieved.
Mount, N.J., Dawson, C.W., Abrahart, R.J., 2013. Legitimising data-driven models: ex- http://www.proquest.com/, Accessed date: 27 October 2017.
emplification of a new data-driven mechanistic modelling framework. Hydrol. Earth Qi, Y., Li, Q., Karimian, H., Liu, D., 2019. A hybrid model for spatiotemporal forecasting
Syst. Sci. 17, 2827–2843. https://doi.org/10.5194/hess-17-2827-2013. of PM 2.5 based on graph convolutional neural network and long short-term memory.
Moustris, K.P., Larissi, I.K., Nastos, P.T., Koukouletsos, K.V., Paliatsos, A.G., 2013. Sci. Total Environ. 664, 1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333.
Development and application of artificial neural network modeling in forecasting PM Qin, D., Yu, J., Zou, G., Yong, R., Zhao, Q., Zhang, B., 2019. A novel combined prediction
10 levels in a mediterranean city. https://doi.org/10.1007/s11270-013-1634-x. scheme based on CNN and LSTM for urban PM 2.5 concentration. IEEE Access 7,
Moustris, Konstantinos P., Ziomas, I.C., Paliatsos, A.G., 2010. 3-day-ahead forecasting of 20050–20059. https://doi.org/10.1109/ACCESS.2019.2897028.
regional pollution index for the pollutants NO2, CO, SO2, and O3 using artificial Radojević, D., Antanasijević, D., Perić-Grujić, A., Ristić, M., Pocajt, V., 2018. The sig-
neural networks in athens, Greece. Water Air Soil Pollut. 209 (1–4), 29–43. https:// nificance of periodic parameters for ANN modeling of daily SO2 and NOx con-
doi.org/10.1007/s11270-009-0179-5. centrations: a case study of Belgrade, Serbia. Atmospheric Pollution Research.
Mueller, S.F., Mallard, J.W., 2011. Contributions of natural emissions to ozone and PM2.5 https://doi.org/10.1016/J.APR.2018.11.004.
as simulated by the community multiscale air quality (CMAQ) model. Environ. Sci. Rahimi, A., 2017. Short-term prediction of NO2 and NO x concentrations using multilayer
Technol. 45 (11), 4817–4823. https://doi.org/10.1021/es103645m. perceptron neural network: a case study of Tabriz, Iran. Ecol. Process. 6 (1), 4.
Nagendra, S.M.S., Khare, M., 2006. Artificial neural network approach for modelling https://doi.org/10.1186/s13717-016-0069-x.
nitrogen dioxide dispersion from vehicular exhaust emissions. Ecol. Model. 190 Razavi, S., Tolson, B.A., 2011. A new formulation for feedforward neural networks. IEEE
(1–2), 99–115. https://doi.org/10.1016/j.ecolmodel.2005.01.062. Trans. Neural Netw. 22 (10), 1588–1598. https://doi.org/10.1109/TNN.2011.
Nidzgorska-Lencewicz, J., 2018. Application of artificial neural networks in the predic- 2163169.
tion of PM10levels in thewinter months: a case study in the Tricity Agglomeration, Russo, A., Lind, P.G., Raischel, F., Trigo, R., Mendes, M., 2015. Neural network forecast of
Poland. Atmosphere 9 (6). https://doi.org/10.3390/atmos9060203. daily pollution concentration using optimal meteorological data at synoptic and local
Niska, H., Hiltunen, T., Karppinen, A., Ruuskanen, J., Kolehmainen, M., 2004. Evolving scales. Atmosph. Pollut. Res. 6 (3), 540–549. https://doi.org/10.5094/APR.2015.
the neural network model for forecasting air pollution time series. Eng. Appl. Artif. 060.
Intell. 17 (2), 159–167. https://doi.org/10.1016/J.ENGAPPAI.2004.02.002. Russo, A., Raischel, F., Lind, P.G., 2013. Air quality prediction using optimal neural
Niska, H., Rantam, M., Hiltunen, T., Karppinen, A., Kukkonen, J., Ruuskanen, J., networks with stochastic variables. Atmos. Environ. 79, 822–830. https://doi.org/10.
303
1016/j.atmosenv.2013.07.072. Winters models. Air Quality, Atmosphere & Health. https://doi.org/10.1007/s11869-

Russo, A., Soares, A.O., 2014. Hybrid model for urban air pollution forecasting: a sto- 018-00660-x.
chastic spatio-temporal approach. Math. Geosci. 46 (1), 75–93. https://doi.org/10. Vlachogianni, A., Kassomenos, P., Karppinen, A., Karakitsios, S., Kukkonen, J., 2011.
1007/s11004-013-9483-0. Science of the Total Environment Evaluation of a multiple regression model for the
Samarasinghe, S., 2006. Neural Networks for Applied Sciences and Engineering. Taylor & forecasting of the concentrations of NO x and PM 10 in Athens and Helsinki. Sci.
Francis. https://doi.org/10.1017/CBO9781107415324.004. Total Environ. 409 (8), 1559–1571. https://doi.org/10.1016/j.scitotenv.2010.12.
Santos, G., Fernández-olmo, I., 2015. Estimation of PM 10 -bound as , Cd , Ni and Pb 040.
levels by means of statistical Modelling : PLSR and ANN approaches. Water, Air, & Voukantsis, D., Karatzas, K., Kukkonen, J., Räsänen, T., 2011. Science of the Total
Soil Pollution. . https://doi.org/10.1007/s11270-015-2526-z. Environment Intercomparison of air quality data using principal component analysis ,
Saptoro, A., Tadé, M.O., Vuthaluru, H., 2012. Chemical product and process modeling a and forecasting of PM 10 and PM 2 . 5 concentrations using arti fi cial neural net-
modified kennard-stone algorithm for optimal division of data for developing artifi- works , in Thessaloniki and Helsinki. Sci. Total Environ. 409 (7), 1266–1276. https://
cial neural network models a modified kennard-stone algorithm for optimal division doi.org/10.1016/j.scitotenv.2010.12.039.
of data for developing artificial neural network. Chem. Prod. Process Model. 7 (1), Wang, J., Song, G., 2018. A deep spatial-temporal ensemble model for air quality pre-
13. https://doi.org/10.1515/1934-2659.1645. diction. Neurocomputing 314, 198–206. https://doi.org/10.1016/J.NEUCOM.2018.
Schlink, U., Herbarth, O., Richter, M., Dorling, S., Nunnari, G., Cawley, G., Pelikan, E., 06.049.
2006. Statistical models to assess the health effects and to forecast ground-level Wang, W., Xu, Z., Weizhen Lu, J., 2003. Three improved neural network models for air
ozone. Environ. Model. Softw 21 (4), 547–558. https://doi.org/10.1016/J.ENVSOFT. quality forecasting. Eng. Comput. 20 (2), 192–210. https://doi.org/10.1108/
2004.12.002. 02644400310465317.
Schneider, T., 2001. Analysis of incomplete climate data: estimation of mean values and Wen, C., Liu, S., Yao, X., Peng, L., Li, X., Hu, Y., Chi, T., 2019. A novel spatiotemporal
covariance matrices and imputation of missing values. J. Clim Retrieved from. convolutional long short-term neural network for air pollution prediction. Sci. Total
https://journals.ametsoc.org/doi/pdf/10.1175/1520-0442%282001%29014% Environ. 654, 1091–1099. https://doi.org/10.1016/j.scitotenv.2018.11.086.
3C0853%3AAOICDE%3E2.0.CO%3B2. Williams, D.A., Nelsen, B., Berrett, C., Williams, G.P., Moon, T.K., 2018. A comparison of
ScienceDirect, 2017. ScienceDirect.com | Science, Health and Medical Journals, Full Text data imputation methods using Bayesian compressive sensing and Empirical Mode
Articles and Books. Retrieved. http://www.sciencedirect.com/, Accessed date: 27 Decomposition for environmental temperature data. https://doi.org/10.1016/j.
October 2017. envsoft.2018.01.012.
Shahraiyni, H.T., Sodoudi, S., 2016. Statistical modeling approaches for pm10 prediction World Health Organization, 2013. Review of Evidence on Health Aspects of Air Pollution
in urban areas; A review of 21st-century studies. Atmosphere 7 (2), 10–13. https:// – REVIHAAP Project, vol 309 World Health Organization Retrieved from. http://
doi.org/10.3390/atmos7020015. www.euro.who.int/en/health-topics/environment-and-health/air-quality/
Shahraiyni, H.T., Sodoudi, S., Kerschbaumer, A., Cubasch, U., 2015. New technique for publications/2013/review-of-evidence-on-health-aspects-of-air-pollution-revihaap-
ranking of air pollution monitoring stations in the urban areas based upon spatial project-final-technical-report.
representativity (Case study: PM monitoring stations in Berlin). Aerosol Air Qual. Res. World Health Organization, 2016. WHO | Ambient (Outdoor) Air Quality and Health.
15 (2), 743–748. https://doi.org/10.4209/aaqr.2014.12.0317. Retrieved. http://www.who.int/mediacentre/factsheets/fs313/en/, Accessed date:
Sharma, N., Chaudhry, K.K., Rao, C.V.C., 2005. Vehicular pollution modeling using ar- 14 September 2017.
tificial neural network technique : a review. J. Sci. Ind. Res. (India) 64 (September), World Health Organization, 2018. WHO | Ambient Air Pollution. Retrieved. http://
637–647. www.who.int/airpollution/ambient/en/, Accessed date: 10 September 2018.
Singh, K.P., Gupta, S., Kumar, A., Shukla, S.P., 2012. Linear and nonlinear modeling Wu, W., Dandy, G.C., Maier, H.R., 2014. Protocol for developing ANN models and its
approaches for urban air quality prediction. Sci. Total Environ. 426, 244–255. application to the assessment of the quality of the ANN model development process in
https://doi.org/10.1016/j.scitotenv.2012.03.076. drinking water quality modelling. Environ. Model. Softw 54, 108–127. https://doi.
Siwek, K., Osowski, S., 2012. Engineering Applications of Artificial Intelligence org/10.1016/J.ENVSOFT.2013.12.016.
Improving the accuracy of prediction of PM 10 pollution by the wavelet transfor- Wu, W., May, R.J., Maier, H.R., Dandy, G.C., 2013. A benchmarking approach for com-
mation and an ensemble of neural predictors. Eng. Appl. Artif. Intell. 25 (6), paring data splitting methods for modeling water resources parameters using artifi-
1246–1258. https://doi.org/10.1016/j.engappai.2011.10.013. cial neural networks. Water Resour. Res. 49 (11), 7598–7614. https://doi.org/10.
Slini, T., Kaprara, A., Karatzas, K., Moussiopoulos, N., 2006. PM10 forecasting for 1002/2012WR012713.
Thessaloniki, Greece. Environ. Model. Softw 21 (4), 559–565. https://doi.org/10. Yan, K., Jian, L., 2013. Neurocomputing Identification of significant factors for air pol-
1016/J.ENVSOFT.2004.06.011. lution levels using a neural network based knowledge discovery system.
Solaiman, T.A., Coulibaly, P., Kanaroglou, P., 2008. Ground-level ozone forecasting using Neurocomputing 99, 564–569. https://doi.org/10.1016/j.neucom.2012.06.003.
data-driven methods. Air Qual. Atmosph. Health 1 (4), 179–193. https://doi.org/10. Yeganeh, B., Hewson, M.G., Clifford, S., Knibbs, L.D., Morawska, L., 2017. A satellite-
1007/s11869-008-0023-x. based model for estimating PM2.5 concentration in a sparsely populated environment
Sousa, S.I.V., Martins, F.G., Alvim-Ferraz, M.C.M., Pereira, M.C., 2007. Multiple linear using soft computing techniques. Environ. Model. Softw 88, 84–92. https://doi.org/
regression and artificial neural networks based on principal components to predict 10.1016/J.ENVSOFT.2016.11.017.
ozone concentrations. Environ. Model. Softw 22 (1), 97–103. https://doi.org/10. Yeganeh, B., Hewson, M.G., Clifford, S., Tavassoli, A., Knibbs, L.D., Morawska, L., 2018.
1016/J.ENVSOFT.2005.12.002. Estimating the spatiotemporal variation of NO2concentration using an adaptive
Stamenković, L.J., Antanasijević, D.Z., Ristić, M., Perić-Grujić, A.A., Pocajt, V.V., 2017. neuro-fuzzy inference system. Environ. Model. Softw 100 (2), 222–235. https://doi.
Prediction of nitrogen oxides emissions at the national level based on optimized ar- org/10.1016/j.envsoft.2017.11.031.
tificial neural network model. Air Qual. Atmosph. Health 10 (1), 15–23. https://doi. Yetilmezsoy, K., Ozkaya, B., Cakmakci, M., 2011. Artificial intelligence-based prediction
org/10.1007/s11869-016-0403-6. models for environmental engineering. Neural Netw. World 21 (3), 193–218.
Sun, W., Zhang, H., Palazoglu, A., Singh, A., Zhang, W., Liu, S., 2013. Prediction of 24- Zhang, H., Liu, Y., Shi, R., Yao, Q., 2013. Evaluation of PM 10 forecasting based on the
hour-average PM2.5 concentrations using a hidden Markov model with different artificial neural network model and intake fraction in an urban area : a case study in
emission distributions in Northern California. Sci. Total Environ. 443, 93–103. Taiyuan City , Evaluation of PM 10 forecasting based on the artificial neural network
https://doi.org/10.1016/J.SCITOTENV.2012.10.070. model and intake fraction in an urban. J. Air Waste Manag. Assoc. 2247. https://doi.
SymoCnds, P., Taylor, J., Chalabi, Z., Mavrogianni, A., Davies, M., Hamilton, I., et al., org/10.1080/10962247.2012.755940.
2016. Development of an England-wide indoor overheating and air pollution model Zhang, J., Ding, W., 2017. Prediction of air pollutants concentration based on an extreme
using artificial neural networks. J. Build. Perform. Simul. 1493 (June), 1–14. https:// learning machine: the case of Hong Kong. Int. J. Environ. Res. Public Health 14 (2),
doi.org/10.1080/19401493.2016.1166265. 1–19. https://doi.org/10.3390/ijerph14020114.
Taylan, O., 2017. Modelling and analysis of ozone concentration by artificial intelligent Zhao, Y., Shrivastava, A.K., Tsui, K.L., 2016. Imbalanced classification by learning hidden
techniques for estimating air quality. Atmos. Environ. 150, 356–365. https://doi.org/ data structure. IIE Trans. 48 (7), 614–628. https://doi.org/10.1080/0740817X.2015.
10.1016/j.atmosenv.2016.11.030. 1110269.
The MathWorks, I., 2017. MATLAB Documentation - MathWorks United Kingdom. Zhou, Q., Jiang, H., Wang, J., Zhou, J., 2014. A hybrid model for PM2.5 forecasting based
Retrieved. https://uk.mathworks.com/help/matlab/, Accessed date: 30 October on ensemble empirical mode decomposition and a general regression neural network.
2017. Sci. Total Environ. 496, 264–274. https://doi.org/10.1016/J.SCITOTENV.2014.07.
The University of Sheffield, 2017. StarPlus. Retrieved April 3, 2017, from. https://find. 051.
shef.ac.uk/primo_library/libweb/action/search.do?vid=SFD_VU2&samlLogin=true Zhu, G., Zhang, P., Tshukudu, T., Yin, J., Fan, G., Zheng, X., 2015. Forecasting traf fi c-
&dscnt=0&dstmp=1556889134776&fromLogin=true. related nitrogen oxides within a street canyon by combining a genetic algorithm-back
Tsai, C., Chang, L., Chiang, H., 2009. Forecasting of ozone episode days by cost-sensitive propagation arti fi cial neural network and parametric models. Atmosph. Pollut. Res.
neural network methods. Sci. Total Environ. 407 (6), 2124–2135. https://doi.org/10. 6 (6), 1087–1097. https://doi.org/10.1016/j.apr.2015.06.006.
1016/j.scitotenv.2008.12.007. Zhu, S., Lian, X., Wei, L., Che, J., Shen, X., Yang, L., et al., 2018. PM2.5 forecasting using
Tzanis, C.G., Alimissis, A., Philippopoulos, K., Deligiorgi, D., 2019. Applying linear and SVR with PSOGSA algorithm based on CEEMD, GRNN and GCA considering me-
nonlinear models for the estimation of particulate matter variability. Environ. Pollut. teorological factors. Atmos. Environ. 183, 20–32. https://doi.org/10.1016/J.
246, 89–98. https://doi.org/10.1016/J.ENVPOL.2018.11.080. ATMOSENV.2018.04.004.
Ul-Saufie, A.Z., Yahaya, A.S., Ramli, N.A., Rosaida, N., Hamid, H.A., 2013. Future daily Zito, P., Chen, H., Bell, M.C., 2008. Predicting real-time roadside CO and NO2 con-
PM10 concentrations prediction by combining regression models and feedforward centrations using neural networks. IEEE Trans. Intell. Transp. Syst. 9 (3), 514–522.
backpropagation models with principle component analysis (PCA). Atmos. Environ. https://doi.org/10.1109/TITS.2008.928259.
77, 621–630. https://doi.org/10.1016/J.ATMOSENV.2013.05.017. Zou, B., Wang, M., Wan, N., Wilson, J.G., Fang, X., Tang, Y., 2015. Spatial Modeling of PM
Ventura, L.M.B., Fellipe De Oliveira Pinto, Laiza, Soares, M., Luna, A.S., Gioda, A., 2019. 2.5 Concentrations with a Multifactoral Radial Basis Function Neural Network. pp.
Forecast of daily PM 2.5 concentrations applying artificial neural networks and Holt- 10395–10404. https://doi.org/10.1007/s11356-015-4380-3.
304

Environmental Modelling and Software 119 (2019) 285-304

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Environmental Modelling and Software 119 (2019) 285-304

Uploaded by

Copyright:

Available Formats

Environmental Modelling and Software 119 (2019) 285–304

Contents lists available at ScienceDirect

Environmental Modelling & Software

A review of artiﬁcial neural network models for ambient air pollution T

Table 1 Table 1 (continued)

Authors (year) Location(s) Air pollutants

Nidzgorska-Lencewicz Tricity Agglomeration, PM10

Fig. 4. Number of occurrences various time steps have been used.

commonly determined by the sampling periods of the instruments used

3. Methods used for ANN model development

The design of ANN models can often be regarded as more of an art

Fig. 3. Distribution of papers by air pollutant variables predicted.

delved in more than one air pollutant variables.

predictors), (4) data splitting, (5) selection of model architecture, (6)

3.1. Collection of data

deletion approach appears to be a quick and practical approach, this

3.3. Selection of predictors

Fig. 12. Taxonomy of approaches to determining optimal model predictors.

predictors. Several guidelines in determining the most suitable pre-

3.4. Data splitting

ad-hoc manner. This may lead to uncertainties regarding the quality

3.5. Selection of model architecture

Fig. 16. Taxonomy of model architectures.

which is also applicable to other environmental engineering tasks, can

3.6. Determination of model structure

identiﬁed studies employed an ad-hoc approach to determining an

3.7. Model training

Fig. 21. Taxonomy of model calibration techniques.

quantify the performance of the developed ANN models. ANN model

3.8. Model validation

average absolute relative error (AARE), the normalized root mean

1016/j.atmosenv.2013.07.072. Winters models. Air Quality, Atmosphere & Health. https://doi.org/10.1007/s11869-

You might also like