Calibration of Low-Cost Particle Sensors by Using Machine-Learning Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/330595316

Calibration of Low-Cost Particle Sensors by Using Machine-Learning Method

Conference Paper · October 2018


DOI: 10.1109/APCCAS.2018.8605619

CITATIONS READS
10 324

9 authors, including:

Chen Chia Chen Chih-Ting Kuo


National Applied Research Laboratories National Chip Implementation Center
46 PUBLICATIONS   413 CITATIONS    15 PUBLICATIONS   31 CITATIONS   

SEE PROFILE SEE PROFILE

Chih-Hsing Lin Chun-Ming Huang


National Tsing Hua University National Chip Implementation Center
20 PUBLICATIONS   215 CITATIONS    71 PUBLICATIONS   259 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Modular wireless sensor platforms. View project

All content following this page was uploaded by Chen Chia Chen on 28 July 2019.

The user has requested enhancement of the downloaded file.


2018 IEEE Asia Pacific Conference on Circuits and Systems

Calibration of Low-Cost Particle Sensors by Using Machine-Learning Method

Chen-Chia Chen, Chih-Ting Kuo, Ssu-Ying Chen, Chih-Hsing Lin, Jin-Ju Chue, Yi-Jie Hsieh, Chun-Wen Cheng,
Chieh-Ming Wu, and Chun-Ming Huang*
National Chip Implementation Center
National Applied Research Laboratories
Hsinchu, Taiwan
*e-mail: [email protected]

Abstract—The measurement of particle matter (PM) of mass cannot replace expensive FRM/FEM instruments, these
concentration by low-cost PM sensor is strongly influenced by sensors have created new opportunities for providing high
environmental factors such as humidity, temperature, wind spatial and temporal resolution PM information and personal
speed, wind direction. In this study, we developed a machine health.[5]
learning-based calibration method for low-cost light-scattering However, most low-cost PM sensors have not been
PM sensor. A Feedforward Neural Network (FNN) was used to thoroughly evaluated with standardized calibration protocols.
compensate for the effect of environmental factors on the PM To improve accuracy in the measurement of PM mass
measurements. Experimental data were collected from 20 concentration by low-cost PM sensor, it is necessary to
March – 6 May 2018 in central Taiwan, and used to train and
establish quantitative calibration relationships against
evaluate the calibration model. Before calibrating PM sensor,
FRM/FEM approved reference instrument. This is usually
the PM2.5 mass concentration of low-cost PM sensors have the
lowest values of R-squared (R2), with 0.618±
±0.033 as compared
done under controlled laboratory environment or co-location
to the Environmental Protection Agency (EPA) approved
alongside a reference instrument in the field.[2,6] However,
federal equivalent method (FEM) instrument (BAM-1020, Met most low-cost PM sensors were weakly correlated (R2 < 0.3)
One Instruments). After calibrating PM sensor by using the with a tapered element oscillating microbalance (TEOM)[7]
FNN calibration model, the PM2.5 mass concentration of low- because low-cost optical PM sensors are very sensitive to
cost PM sensors show the highest linearity with an R2 value of environmental factors such as humidity, temperature, wind
0.905±
±0.013 for BAM-1020. It demonstrated that the machine- speed, wind direction in the field.
learning method could be used to calibrate a low-cost PM In this study, we aim to improve the calibration strategies
sensor and improve its accuracy. of low-cost PM sensors using three calibration models
(Linear Regression, Support Vector Machine (SVM), and
Keywords-Particulate matter, low-cost sensor, calibration, Feedforward Neural Network), which, to our knowledge,
machine learning, artificial neural network has not been previously applied to low-cost PM sensors
I. INTRODUCTION calibrations. To ensure calibration model robustness, we
were developed and validated four low-cost PM sensors
Particulate matter (PM) air pollution can have a against FEM approved reference instrument. Furthermore,
significant impact on human health, exposure to particulate the study was conducted over a 60-day period spanning
matter over long periods can cause
multiple seasons and a wide range of meteorological
asthma, respiratory inflammation, jeopardizes lung functions
conditions in the field.
and even promotes cancers.[1] Therefore, most countries in
the world strictly and continuously monitor ambient airborne II. EXPERIMENTALS
particulates to protect human health. Particulate matter can
be classified as PM10, PM2.5, or PM1, according to the Experiments were carried out around the Tunghai
aerodynamic sizes of less than 10 µm, 2.5 µm, and 1 µm, University, in Taichung City (24.181838, 120.595857). Our
respectively.[2] The national ambient air quality standards system and stations provided by the Environmental
(NAAQS) for the PM are specified based on the mass Protection Bureau collect data at the same time, as shown in
concentration of PM2.5 and measured by either a federal figure 1. The station is equipped with meteorological sensors
reference method (FRM) or a federal equivalent method (temperature, relative humidity, wind speed and wind
(FEM).[3] For example, Beta Attenuation Mass Monitor direction) and reference PM instruments (BAM-1020, Met
(BAM-1020, Met One Instruments) and tapered element One Instruments) for PM 2.5 and PM 10. These reference
oscillating microbalance (TEOM® 1405-F, Thermo measurements were used for low-cost PM sensor calibration
Scientific) are the U.S. Environmental Protection Agency and data validation.
(EPA) FEM monitoring system, these instruments can (A) System Architecture
automatically measure and record particulate matter mass The proposed PM data logger consists of four low-cost
concentration. However, these devices cost tens of thousands PM sensor nodes, an LTE gateway (MOXA G3470A), a
of dollars and require specific infrastructure and trained wireless access point (MOXA AWK3131), and a UDOO
people to operate.[4] Within the last ten years, more and
(UDOO QUAD) computer system. First, the UDOO system
more low-cost PM sensors are used to measure particulate
and four low-cost PM sensor nodes are actively connecting
matter mass concentration. Although low-cost PM sensors

978-1-5386-8240-1/18/$31.00 ©2018 IEEE 111


to the access point and then the UDOO system inspects each regression and distribute estimation. Assume we have a
low-cost PM sensor nodes connection status. When TCP large portion of observation data, and the values of observed
connection is created, the UDOO system sends the read PM mass concentration are distributed from 0 to 200, the
command (0XA3 0X00) to each PM sensor node and then estimation model can be achieved by simply classifying the
waits for response data of each PM sensor node. The input data into 0 to 200 since the observed PM mass
sending duration time of each PM sensor node is 5 minutes concentration values are always positive integers.
for one time which is controlling by the PM sensor node. Given a training set of observing data and PM mass
The UDOO system transmits the data to the remote server concentration pairs x ,  ,  1,2, … ,  where  ∈  and
by LTE gateway  ∈ 1, 1 , the SVM solves the following optimization
The low-cost PM sensor node (as shown in Figure 1) problem:[8]
includes a microprocessor, a PM2.5 sensor (PMS7003, 
min, ,! " # "  $ ∑ ' & (3)
Plantower) and a digital Relative Humidity sensor with a
Subject to  (" # )( *  * + 1  & where & + 0
Temperature output (HTU21D, Measurement Specialties), it
The function ) maps the training vector  into a higher
also equipped with wireless modules for data transmission.
(maybe finite) dimensional space, the SVM finds a linear
In the sensor node architecture, the microprocessor is in
charge of the sensor data collecting and transmitted to the hyperplane with the maximum separating margin in this
wireless module. The PM2.5 sensor is a low-cost light- higher dimensional space. $ , 0 is the penalty parameter of
scattering sensor with UART interface, which can detect the error term. In our case, the problem may be solved by
particles of different diameter from 0.3 µm to 10 µm. linear approach, thus the kernel function can be used by the
linear one as the following:
-. , / 0 ≡ )( *# )/   # / (4)

Feedforward Neural Network (FNN)


If we got a large amount of observation data, using a
neural network for predicting the after coming data is a
feasible idea. In our case, the neural network model consists
of two hidden layers, each hidden layer contains 512
neurons, as the Figure 2 shown below:

Figure 1. photographs of the custom-designed PM data logger (left) and


low-cost PM sensor node (right).

(B)Prediction Methods
Linear Regression
Linear regression is a simple and important method for
predictions in the analysis. The simplest form of an equation
for a straight line could be expressed as below:
     (1)
Where a is the slope and b is the y-intercept. When we
want to fit a linear equation to certain data with multiple
variables as our case – PM mass concentration, temperature,
humidity, etc., the equation for the straight line could be
expressed as below: Figure 2. An illustration of a Feedforward Neural Network with two
           ⋯      (2) hidden layers.
Where n is the number of the variables. The Linear
Regression model provided in scikit-learn offers an easy Given a training set of observing data and PM mass
way to retrieve the corresponding  where  0,1, … ,  , concentration pairs x ,  ,  1,2, … , , the requirement is to
and b. The calculation needed for linear regression is low, calculate for the parameters of the neural network so the
the method is great for the cases with high linearity. In low predicted output 2 is as close as possible to the actual
linearity cases, you can imagine it is hard to find optimized output  , statistically. The performance can be measured by
 for the equation, thus the output and the target could get the mean squared error:

great differences. 345  ∑ '(2   * (5)

All the layers use Keras 'normal' as the kernel initializers,
Support Vector Machine 'relu' as the activation functions, and 'adam' as the
Support Vector Machines (SVMs) are a popular method optimizers. PM2.5 values are rounded to integers after
in machine learning. They can be used for classification,

112
interpolation, all other values use single-precision floating-
points for computations.
(C) Training Dataset
The raw data selected from our sensors are from the date
2018/03/20 to 2018/05/06, where the sensors measured a
sample every 5 minutes. The hourly average data (ground-
truth) is obtained from the Environmental Protection Bureau
(EPB). In order to increase the size of the dataset, we have
interpolated the data from the EPB using cubic spline data
interpolator in SciPy (SciPy.org) to 12 samples an hour,
namely, there is an interpolated sample every minute after
the interpolation. Finally, we got 13,809 samples of training
data in total (invalid samples are eliminated).
III. RESULTS AND DISCUSSION
Some of the raw data (PM mass concentration, humidity,
and temperature) from for sensor node are given in Figure
3. A data series from this sensor node was a span from 20
March – 6 May 2018. Daily temperature/relative humidity
cycle was easily observed in Figure 3. We also know that
Figure 4. Pairwise correlation between PM2.5 mass concentration (µg/m3)
the optical PM sensors are easily sensitive to humidity and of uncalibrated low-cost PM sensors. Upper-right section: fitting equation
relative humidity, so we will calibrate them later. Figures 4 and R2 of linear regression models using the ordinary least squares (OLS)
presents graphically and statistically the pairwise method. Lower-left section: linear regression lines superimposed on
correlations between all the low-cost sensors’ PM2.5 mass pairwise plots.
concentrations. The PMS7003 sensors were well correlated
with each other (R2 ranging from 0.86 to 0.97). The linear regression model only solves the multivariate
The training dataset is split into two datasets – one with linear equation for fitting the data, for the far-from-linear
12,000 samples for training model, the rest of it for and nasty input (such as the PM10 data we used here), the
verification. The training data is shuffled before feeding into lower the accuracy of the calibration is quite expectable,
the training model, to eliminate the effects of the particular though it is the fastest method against the others. Due to the
data order information from being learned by the neural features of the SVMs that SVMs do not directly provide
network model. In the first run, we only put temperature, probability estimates, so SVMs only classify the inputs into
humidity, and PM mass concentration as input parameters. the output already known by the model. In our case, the
In the second run, we add another input parameters (wind SVM classifies our testing dataset into our training dataset
speed and wind direction) for training model. The scatter (known), if the testing dataset contains some radical or
plots of the original (uncalibrated), the calibrated (via Linear completely different kind of dataset, the SVM model would
Regression, SVM, and FNN) are shown in figure 5. still classify them into the already known ones, which
lowers the accuracy rate. From the results, we can say that
the two-hidden-layered feedforward neural network we
proposed on Keras is quite effective on data calibration –
especially when the input data is more sufficient and
correlated. The reason we chose to use 512 neurons in FNN
is due to performance-computation trade-off. In our
experiments, the FNN model with more neurons generally
produces better results as well as more time-consumption.
Using the R-squared (R2) of the linear regression model
as indicators of model fit, as shown in Table 1. The
feedforward neural network model that included relative
humidity (RH %), temperature (°C), PM mass concentration
(ug/m3), wind speed and wind direction was found to best
predict uncalibrated raw PM mass concentration to into
BAM-1020 PM mass concentrations.
Figure. 3 Time series plots of (top) PM concentration and (middle)
humidity and (bottom) temperature with a five-minute resolution in the
field (2018/3/20 to 2018/5/6).

113
REFERENCES
Table 1. Calibration models comparison using R2 of the linear regression [1] Y. F. Xing, Y. H. Xu, M. H. Shi, and Y. X. Lian, “The impact of
model PM2.5 on the human respiratory system”, J Thorac Dis. , 2016, Jan;
Input data Model R-squared (R2) 8(1): E69–E74,
PM 10 PM 2.5
[2] Y. Wang, J. Li, H. Jing, Q. Zhang, J. K. Jiang, and P. Biswas,
PM w/o calibration 0.234 ± 0.032 0.618 ±0.033 “Laboratory Evaluation and Calibration of Three Low-Cost Particle
Linear Sensors for Particulate Matter Measurement”, Aerosol Science and
Temperature 0.373 ± 0.037 0.701 ±0.039
Regression Technology, 2015, 49:1063–1077.
Humidity
SVM 0.486 ± 0.051 0.762 ±0.043 [3] Christopher A. Noble, Robert W. Vanderpool, Thomas M. Peters,
PM
FNN 0.705 ± 0.034 0.839 ±0.036 Frank F. McElroy, David B. Gemmill & Russell W. Wiener Federal
Temperature Linear Reference and Equivalent Methods for Measuring Fine Particulate
0.467 ± 0.020 0.728 ±0.030
Humidity Regression Matter, Aerosol Science & Technology, 2001, 34:5, 457-464.
PM SVM 0.756 ± 0.032 0.857 ±0.019 [4] Clements, A.L.; Griswold, W.G.; RS, A.; Johnston, J.E.; Herting,
Wind Speed M.M.; Thorson, J.; Collier-Oxandale, A.; Hannigan, M. Low-Cost Air
Wind Direction FNN 0.882 ± 0.029 0.905 ±0.012
Quality Monitoring Tools: From Research to Practice (A Workshop
Summary). Sensors 2017, 17, 2478.
[5] Lin, C.; Masey, N.; Wu, H.; Jackson, M.; Carruthers, D.J.; Reis, S.;
IV. CONCLUSIONS Doherty, R.M.; Beverland, I.J.; Heal, M.R. Practical Field Calibration
of Portable Monitors for Mobile Measurements of Multiple Air
This study demonstrated that the two hidden layers of Pollutants. Atmosphere, 2017, 8, 231.
feedforward neural network model could be used to [6] Lin, Yong-Qing; Lin, Yuan-Chien, “The Improvement of Spatial-
calibrate the low-cost PM sensors. The accuracy of the Temporal PM2.5 Resolution in Taiwan by Using Data Assimilation
low-cost PM sensors can be improved with further on-field Method”, 19th EGU General Assembly, EGU2017, proceedings from
the conference held 23-28 April 2017 in Vienna, Austria., p.3425
calibration and validation. Thus, future work will collect
[7] Karoline K. Johnson, Michael H. Bergin, Armistead G. Russell,
more data over one year in the field, and prepare datasets Gayle S.W. Hagler, “Field Test of Several Low-Cost Particulate
obtained from a range of environments. We will further Matter Sensors in High and Low Concentration Urban Environments”,
focus on the study of other data type (such as Sox, NOx, Aerosol and Air Quality Research, 2018, 18: 565–578
O3, etc.,) effects on the accuracy of the calibrated low-cost [8] C.-W. Hsu, C.-C. Chang, C.-J. Lin. A practical guide to support
PM sensor by machine learning method. vector classification
[9] Irwin W. Sandberg, James T. Lo, Craig L. Fancourt, José C. Principe,
ACKNOWLEDGMENT Shigeru Katagiri, Simon Haykin: Nonlinear Dynamical Systems:
Feedforward Neural Network Perspectives, 2001, 1-16.
The authors would like to thank Prof. Ta-Chih Hsiao at the
graduate institute of environmental engineering of National
Central University for providing the raw data from the
Environmental Protection Bureau stations used in the
current study.

Figure 5. The scatter plots of the (a) original PM sensor, and the calibrated PM sensor (via (b)Linear Regression, (c) SVM (d) FFN) against BAM-1020.

114

View publication stats

You might also like