Introducing Fuzzy Layers For Deep Learning: Stanton R. Price Steven R. Price Derek T. Anderson

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Introducing Fuzzy Layers for Deep Learning

Stanton R. Price Steven R. Price Derek T. Anderson


U.S. Army Engineer Department of Electrical Engineering Department of Electrical Engineering
Research and Development Center Mississippi College and Computer Science
Vicksburg, MS USA Clinton, MS USA University of Missouri
[email protected] [email protected] Columbia, MO, USA
[email protected]

Abstract—Many state-of-the-art technologies developed in re- technologies across many fields, e.g., pattern recognition, ma-
cent years have been influenced by machine learning to some chine learning (ML), neural networks (NNs), computational
extent. Most popular at the time of this writing are artificial intelligence, evolutionary computation, and so on. Recently,
intelligence methodologies that fall under the umbrella of deep
learning. Deep learning has been shown across many applications much buzz has surrounded deep learning (DL) for its ability to
to be extremely powerful and capable of handling problems that provide desirable results for a number of different applications.
possess great complexity and difficulty. In this work, we introduce AI has been shown to have great potential for finding opti-
a new layer to deep learning: the fuzzy layer. Traditionally, mized solutions to various problems across multiple domains.
the network architecture of neural networks is composed of an This technology has been heavily researched for computer
input layer, some combination of hidden layers, and an output
layer. We propose the introduction of fuzzy layers into the vision applications [1], [2], speech translations [3], [4], and
deep learning architecture to exploit the powerful aggregation optimization tasks [5], [6]. Herein, the focus is on an extremely
properties expressed through fuzzy methodologies, such as the relevant and popular branch of AI: deep learning. DL has
Choquet and Sugueno fuzzy integrals. To date, fuzzy approaches achieved much success in recent years on computer vision
taken to deep learning have been through the application of applications and has benefited from the surge of attention
various fusion strategies at the decision level to aggregate
outputs from state-of-the-art pre-trained models, e.g., AlexNet, being given to advance its theories to being extremely gen-
VGG16, GoogLeNet, Inception-v3, ResNet-18, etc. While these eralizable for many problems. Part of DL’s recent resurgence
strategies have been shown to improve accuracy performance is the implementation of its network architecture being well
for image classification tasks, none have explored the use of suited for processing on powerful, highly parallelized GPUs.
fuzzified intermediate, or hidden, layers. Herein, we present a This has allowed for extremely complex and deep network
new deep learning strategy that incorporates fuzzy strategies
into the deep learning architecture focused on the application of architectures to be developed that were infeasible to implement
semantic segmentation using per-pixel classification. Experiments on older compute technologies.
are conducted on a benchmark data set as well as a data set Fusion is a powerful technique used to combine informa-
collected via an unmanned aerial system at a U.S. Army test site tion from different sources. These sources could be different
for the task of automatic road segmentation, and preliminary features extracted, decisions, sensor output, etc., as well as dif-
results are promising.
Index Terms—fuzzy layers, deep learning, fuzzy neural nets, ferent combinations thereof. Fusion methodologies often strive
semantic segmentation, fuzzy measure, fuzzy integrals to improve system performance by combining information in
a beneficial way that enables more discriminatory power to the
I. I NTRODUCTION system in some form. This could be through the realization of
Artificial intelligence (AI) has emerged over the past decade more robust features that generalize well from one domain to
as one of the most promising technologies for advancing another. For the task of classification, fusion could be used to
mankind in a multitude of ways, from medicine discovery combine multiple decision makers, e.g., classifiers, to improve
and disease diagnostics to autonomous vehicles, semantic overall accuracy performance. Fusion is a very rich and
segmentation, and personal assistants. For many years there powerful technique that, when implemented appropriately, can
has been a desire to create computer algorithms/machines that lead to major algorithm improvements. Fusion is commonly
are able to replace or assist humans in signal understanding associated with fuzzy techniques, as Fuzzy Logic [7], [8] nat-
for tasks such as automatic buried explosive hazard detection, urally lends itself to gracefully considering data with varying
vehicle navigation, object recognition, and object tracking. AI degrees of belief. Ensemble approaches [9] are also commonly
has grown to loosely encompass a number of state-of-the-art used for fusion tasks. Generally, most fusion strategies attempt
to properly assign weights that encode the significance, or
This work was partially supported under the Maneuver in Complex Environ- importance, to the different information sources, and these
ments R&D program to support the U.S. Army ERDC. This effort is also based
on work supported by the Defense Threat Reduction Agency/Joint Improvised- weights are the driving mechanism behind the fused result.
Threat Defeat Organization (DTRA/JIDO). Any use of trade names is for Historically, weights used when fusing multiple information
descriptive purposes only and does not imply endorsement by the U.S. sources are either human-derived or found via an optimization
Government. Permission to publish was granted by Director, Geotechnical
and Structures Laboratory, U.S. Army ERDC. Approved for public release; function/strategy such as group lasso [10], [11]. However,
distribution is unlimited. there has been little-to-no research done on utilizing DL for

978-1-5386-1728-1/19/$31.00 ©2019 IEEE


optimizing fusion performance. Herein, we propose a new III. M ETHODOLOGY
strategy to explore the potential benefits of combining DL with In this section, we introduce our proposed fuzzy layer to
fuzzy-based fusion techniques. Specifically, we introduce fuzzy incorporate fuzzy-based fusion approaches directly into the DL
layers to the DL architecture. architecture. First, we briefly discuss the problem that is being
The remainder of this work is organized as follows. In considered herein: semantic image segmentation using DL.
Section II, related works are presented that explore using Semantic segmentation is the process of assigning class labels
fusion strategies to improve the classification performance of (e.g., car, tank, road, building, etc.) to each pixel in an image.
the outputs from different state-of-the-art DL models (fusion DL for semantic segmentation is most commonly implemented
strategies and DL have been compartmentalized in their uti- in what can be separated into two parts: (1) standard CNN
lization). Fuzzy layers are introduced in Section III along with network with the exception of an output layer and (2) an up-
the intuition behind this new strategy and their integration into sampling CNN network on the back half that produces a per-
the DL architecture. Experiments and results are detailed in pixel class label output. Zeiler et al. introduced deconvolution
Section IV, and in Section V, we conclude the paper. networks in [22] for the task of visualizing learned CNNs
to help bridge the gap of understanding what a CNN has
learned. Therein, Zeiler et al. defined a deconvolution network
II. R ELATED W ORK
that attempts to reconstruct a feature map that identifies
As noted previously, DL is becoming the standard approach what a CNN has learned through unpooling, rectification,
for classification tasks. However, the performances exhibited and filtering. In [23], Noh et al. modified this approach for
by DL classifiers are often the result of exhaustive evaluation semantic segmentation by using DL to learn the up-sampling,
of hyperparamenters and various network architectures for the or deconvolution, filters rather than inverting (via transposing)
particular data set used in the work. Fusion techniques can the corresponding filters learned in the convolution network.
help alleviate the comprehensive evaluation by combining the Herein, we implement a similar approach as Noh et al.,
classification outputs from multiple DL classifiers and thus utilizing DL strategies to learn the up-sampling filters rather
taking advantage of different DL classifier strengths. That is, if than performing true deconvolution to reconstruct the feature
the strengths of multiple classifiers can be appropriately fused, maps at each layer. Additionally, this work is focused strictly
then finding the optimal solution may not require finding the on road segmentation, i.e., each pixel is labeled as either
ideal parameters and architecture for a particular data set. road or non-road. Representing the architecture of our learned
model as f (x, γ), where γ represents the parameters that are
Recently, fusion strategies have been employed that ag-
learned by the network such that the error for an example xi
gregate pre-trained models for improved classification perfor-
given its ground-truth yi is minimized and can be described
mance. The appropriate fusion strategy largely considers the
as
formats of classifier outputs. Traditionally, a classifier output XN
consists of a definitive label (i.e., hard-decision), and typically, γ̂ = argmin (L(f (xi , γ), yi ), (1)
γ
majority voting is the fusion strategy implemented. However, i=1
if the classifier can generate soft membership measures (e.g., where N is the training data set and L is the sof tmax (cross-
fuzzy measures), the fusion strategy implemented can vary entropy) loss.
greatly [12]–[15]. As this paper is focused on the introduction of a new fuzzy
Fusion strategies, notably those associated with fuzzy mea- layer that can be injected directly into the DL architecture, a
sures (FMs), are conventionally applied at the decision level defined network architecture is not presented. Rather, we ex-
to aggregate outputs and improve performance. For example, plore different use cases of the fuzzy layers at different points
DL classifier outputs were fused in [16]–[18] to improve throughout the network architecture. For comparison, the fuzzy
classification performance in remote sensing applications by layers are utilized either in the down-sampling (convolution
deriving FMs for each class and then fusing the measures with network), up-sampling (“deconvolution” network), or both
the classifiers’ outputs through either the Sugeno or Choquet sections of the semantic segmentation network. To maintain
integral (ChI) [19]. Still, fusion strategies occurring at the consistency in our exploration, a template network architecture
input level can also benefit classification performance. was used such that the only change in the network architecture
Rather than attempt to perform fusion at either the output or was the inclusion or removal of one or more fuzzy layers. The
feature level, it is the attempt of this work to incorporate fusion details of the architecture template used are given in Table I,
techniques (utilizing FMs) within the architecture of a DL with ‘*’ denoting the points in which a fuzzy layer might
classification system. While efforts have developed techniques be included in the architecture. Note: it is not required that a
applying fuzzy sets and fuzzy inference systems in NNs, fuzzy layer be incorporated after a rectified linear unit (ReLU);
application of fuzzy strategies concerning DL architectures is this occurs in the results presented herein to maintain more
limited [20]. One recent approach to implementing fuzzy sets consistency across experiments in this exploratory work. It
in DL evaluated the use of Sugeno fuzzy inference systems as would have been equally valid to implement a fuzzy layer after
the node in the hidden layer of an NN and therefore could be any convolution or pooling layer (referencing layers utilized in
extended to DL architectures by extending the concept [21]. this architecture). The best way(s) to implement fuzzy layers
TABLE I Definition 2. (Choquet Integral) For a finite set of N
T EMPLATE ARCHITECTURE IN DETAIL . T HE ‘*’ REPRESENTS LOCATIONS information sources, X, FM g, and partial support function
IN THE ARCHITECTURE THAT A FUZZY LAYER MIGHT BE INCLUDED
HEREIN . T HIS IS NOT A RESTRICTION . Nf AND Ncl REPRESENT THE h : X → [0, 1], the ChI is
NUMBER OF FUSED OUTPUTS AT THAT LAYER AND THE NUMBER OF Z N
CLASSES , RESPECTIVELY.
X
h◦g = wi h(xπ(i) ), (2)
i=1
Name Kernel Size Stride Output Size
where wi = (Gπ(i) − Gπ(i−1) ), G(i) = g({xπ(1) , . . . , xπ(i) }),
input data - - 512 × 512 × 3
conv1 1 5×5 1 512 × 512 × 64 Gπ(0) = 0, h(xi ) is the strength in the hypothesis from source
conv1 2 5×5 1 512 × 512 × 64 xi , and π(i) is a sorting on X such that h(xπ(1) ) ≥ . . . ≥
relu1 - - 512 × 512 × 64 h(xπ(N ) )
* - - 512 × 512 × Nf
pool1 2×2 2 256 × 256 × 64 The FM can be obtained in a number of ways: human de-
conv2 1 5×5 1 256 × 256 × 64 fined, quadratic program, learning algorithm, S-Decomposable
relu2 - - 256 × 256 × 64 measure (e.g., Sugeno λ-fuzzy measure), etc. Herein, we
* - - 256 × 256 × Nf
pool2 2×2 2 128 × 128 × 64 define the FM to be five known OWA operators and one
conv3 1 5×5 1 128 × 128 × 64 random (but valid) OWA operator. Specifically, the more well
relu3 - - 128 × 128 × 64 known operators used are max, min, average, soft max, and
* - - 128 × 128 × Nf
up-conv1 6×6 2 256 × 256 × 30 soft min. The top 5 sources (i.e., convolution/deconvolution
relu4 - - 256 × 256 × 30 filter outputs) were sorted based on their entropy value and
* - - 256 × 256 × Nf fused via the ChI. Therefore, the fuzzy layer accepts the output
up-conv2 6×6 2 512 × 512 × 30
relu5 - - 512 × 512 × 30 from the previous layer as its input, sorts the images (sources)
* - - 512 × 512 × Nf by some metric (entropy used herein), and performs the ChI
output - - 512 × 512 × Ncl for each of the defined FMs resulting in six fused outputs
(we have six different FMs) that are passed on to the next
layer in the network. An example of a potential fuzzy layer
within the DL architecture is an open question and one that implementing the ChI as its aggregation method is shown in
requires additional research. Technically, a fuzzy layer can be Figure 1.
implemented anywhere in the DL architecture as long as it
follows the input layer and precedes the output layer. B. Why Have Fuzzy Layers?
As the architecture of DL continues to grow more com-
plex, there is a need to help alleviate the ill-conditioning
A. Fuzzy Layer
that is prone to occur during learning due to the weights
Theoretically, the fuzzy layer can encompass any fuzzy approaching zero. Additionally, a well-known problem when
aggregation strategy desired to be utilized. Herein, we fo- training deep networks is the internal-covariate-shift problem
cus on the Choquet integral as the means for fusion. Let [24]. This results in difficulty to optimize the network due
X = {x1 , . . . , xN } be N sources, e.g., sensor, human, or to the input distributions changing at each layer over itera-
algorithm. In general, an aggregation function is a mapping tions during training with the changes in distribution being
of data from our N sources, denoted by h(xi ) ∈ R, to data, amplified through propagation across layers. While there are
f (h(x1 ), . . . , h(xN ), Θ) ∈ R, where Θ are the parameters of other approaches that seek to help with this (e.g., batch
f . The ChI is a nonlinear aggregation function parameterized normalization [24]), fusion poses itself as a viable solution to
by the FM. FMs are often used to encode the (possibly aiding with this problem. One example of the potential benefit
subjective) worth of different subsets of information sources. of this is fusion can take 10s, 100s, etc. of inputs (outputs of
Thus, the ChI parameterized by the FM provides a way to previous layers) and condense that information into a fraction
combine the information encoded in the FM with the (objec- of images. For example, if the output of a convolution, ReLU,
tive) evidence or support of some hypothesis, e.g., sensor data, or pooling layer had 256 feature maps, a fuzzy layer could be
algorithm outputs, expert opinions, etc. The FM and ChI are utilized to fuse these 256 feature maps down to some arbitrary
defined as follows. reduced number of feature maps, e.g., 30, that capture relevant
information in unique ways from all, or some subset of the 256
Definition 1. (Fuzzy Measure) For a finite set of N infor- feature maps (dependent on the FMs used as well as the metric
mation sources, X, the FM is a set-valued function g : 2X → used for sorting). Thus, this alone has two potential major
[0, 1] with the following conditions: benefits: (1) reduced model complexity and (2) improving the
1) (Boundary Conditions) g(∅) = 0 and g(X) = 1 utilization of the information learned at the previous layer in
2) (Monotonicity) If A, B ⊆ X with A ⊆ B, then g(A) ≤ the network.
g(B). IV. E XPERIMENTS & R ESULTS
Note, if X is an infinite set, there is a third condition This section first describes the dataset and implementation
guaranteeing continuity. details. Next, we present and analyze the results for various
Fig. 1. Illustration of the fuzzy layer. In this example, the layer feeding into the fuzzy layer is a convolution layer. The feature maps are passed as inputs to
the fuzzy layer where they are then sorted, as required for the ChI, based on some metric (entropy used herein). The ChI is computed for six different FMs,
producing six ChI fused resultant images. These six images are then passed on to the next layer in the network.

network configurations as we investigate the implementation


of fuzzy layers.
A. Dataset
The dataset was collected from an UAS with a mounted
MAPIR Survey2 RGB camera. Specifically, the sensor used is
a Sony Exmor IMX206 16MP RGB sensor, which produces
a 24-bit 4,608×3,456 pixel RGB image. The UAS was flown
at an altitude of approximately 60 meters from the ground.
The dataset was captured by flying the UAS in a grid-like
pattern over an area of interest at a U.S. Army test site.
The dataset used in this work comes from a single flight
over this area, which contains 252 images, 20 of which were
selected as training data. The imagery was scaled to size (i.e.,
512×512 pixels) using bilinear interpolation to make them
more digestible by the DL algorithms implemented on the
computer system used herein. As is common with training
for DL algorithms (and ML algorithms in general), data
augmentation strategies are employed in part to increase the
amount of training data available during learning, but also to
lead to more robust solutions. Herein, each image from the Fig. 2. Example of data set augmentation used (image rotation). Starting at
training data set is rotated 90°, 180°, and 270° to provide a the top left and going clockwise: 0°, 90°, 180°, 270°.
total of 80 images used for training (example shown in Figure
2). Finally, the image road mask for each of the 252 instances
were annotated by hand (see Figure 3).
the number of convolution layers and filters used throughout
B. Implementation Details the network; however, the VGG-16 framework served as the
We based the template network architecture shown in Table motivation behind the defined template architecture imple-
I on the VGG-16 framework [25]. There are modifications to mented. Initially, we implemented the standard stochastic gra-
TABLE II
E VALUATION RESULTS ON TEST DATASET FOR ROAD SEGMENTATION .

Method Mean Std. Dev.


baseline 78.22% 12.3%
conv-FLs 62.43% 21.5%
deconv-FLs 80.79% 14.8%
conv-FLs+deconv-FLs 68.76% 20.7%

Fig. 3. Road masks shown (right column) for two sample images.

dient descent with momentum for optimization but achieved


Fig. 4. Example feature maps and final segmentation for a randomly selected
poor results. The Adam algorithm [26] provided the best image. Also, the feature maps were randomly chosen at each layer.
optimization performance on this dataset and network design
and was used for all experiments reported, with the initial
learning rate, gradient decay rate, and squared gradient decay as ‘fuzzyLayer1’ and ‘fuzzyLayer2’, we see evidence of the
weight set to 0.1, 0.9, and 0.999, respectively. Dropout [27] fuzzy layers’ aggregation strategy accumulating evidence of
is used after pooling with a dropout rate of 50%. road information. We note that ‘deconv-FLs’ only performs ap-
proximately 2% better than the baseline method, while having
C. Evaluation
a slightly higher standard deviation. Nevertheless, this helps
To measure network classification accuracy performance, show fuzzy layers potential of improving classification perfor-
we utilize an evaluation protocol that is based on Intersection mance. It is our conjecture that, for this problem, applying the
over Union (IoU) between the ground-truth and predicted fuzzy layers during the convolution stage (results shown as
segmentation. We report the mean and standard deviation ‘conv-FLs’) results in the loss of too much information from
of the IoU scores for all test images for each approach prior layers (after each ReLU, we summarize 64 filters down
investigated. For clarity, we denote the different experiments to 6– this is likely too extreme for such an early stage of
(i.e., different architecture configurations) as follows learning). Hence, we see a noticeable drop in performance
• baseline– no fuzzy layers; for both experiments that include fuzzy layers during the
• conv-FLs– fuzzy layers are implemented after ‘relu1’, convolution stage (‘conv-FLs’ and ‘conv-FLs+deconv-FLs’).
‘relu2’, and ‘relu3’ in the convolution network (down- However, there are a number of factors involved that could
sampling half); lead to improved performance during the convolution phase,
• deconv-FLs– fuzzy layers are implemented after ‘relu4’ e.g., increased number of FMs, perhaps a different metric for
and ‘relu5’ in the “deconvolution” network (up-sampling sorting should be used, different fuzzy aggregation method,
half); etc. It should also be noted that the inclusion of the fuzzy
• conv-FLs+deconv-FLs– fuzzy layers are implemented layers had minimal impact on training time (total training time
after every ReLU. increased by seconds to a few minutes at most).
The quantitative results for these different architectures are
presented in Table II. V. C ONCLUSION
From these preliminary results, we see that the inclusion of We proposed a new layer to be used for DL: the fuzzy
fuzzy layers shows promise for improving DL performance layer. The proposed fuzzy layer is extremely flexible, capable
(in terms of accuracy). In particular, these results indicate of implementing any fuzzy aggregation method desired, as
that fuzzy layers are better utilized in the deconvolution phase well as capable of being included anywhere in the network
of the architecture. Example feature maps randomly selected architecture, depending on the desired behavior of the fuzzy
from one instance at each layer (the ReLU output is ommited layer. This work was focused on the introduction and early
in the deconvolution network for compactness) are shown in exploration of the fuzzy layer, and additional research is
Figure 4. Looking specifically at the feature maps denoted needed to further advance the fuzzy layer for DL. For ex-
ample, future work should consider investigating the metric [21] S. Rajurkar and N. K. Verma, “Developing deep fuzzy network with
used for sorting the information sources and its effect on takagi sugeno fuzzy inference system,” in Fuzzy Systems (FUZZ-IEEE),
2017 IEEE International Conference on. IEEE, 2017, pp. 1–6.
accuracy performance. Future work is planned to investigate [22] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
how the FM should be defined for aggregating via fuzzy tional networks,” in European conference on computer vision. Springer,
integrals. Additionally, where are the fuzzy layers best utilized 2014, pp. 818–833.
[23] H. Noh, S. Hong, and B. Han, “Learning deconvolution network
in the network architecture (problem dependent; however, for semantic segmentation,” in Proceedings of the IEEE international
can a general guidance be developed)? These are but a few conference on computer vision, 2015, pp. 1520–1528.
questions that need to be addressed for the fuzzy layer and its [24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,” arXiv preprint
implementation. arXiv:1502.03167, 2015.
[25] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
R EFERENCES large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[26] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[1] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, arXiv preprint arXiv:1412.6980, 2014.
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End [27] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, dinov, “Dropout: a simple way to prevent neural networks from over-
2016. fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification 1929–1958, 2014.
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
[3] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
2014.
[4] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
H. Schwenk, and Y. Bengio, “Learning phrase representations using
rnn encoder-decoder for statistical machine translation,” arXiv preprint
arXiv:1406.1078, 2014.
[5] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
vol. 20, no. 3, pp. 273–297, 1995.
[6] J. Kennedy, “Particle swarm optimization,” in Encyclopedia of machine
learning. Springer, 2011, pp. 760–766.
[7] L. A. Zadeh, “A fuzzy-algorithmic approach to the definition of complex
or imprecise concepts,” in Systems Theory in the Social Sciences.
Springer, 1976, pp. 202–282.
[8] T. J. Ross, Fuzzy logic with engineering applications. John Wiley &
Sons, 2005.
[9] C. Zhang and Y. Ma, Ensemble machine learning: methods and appli-
cations. Springer, 2012.
[10] M. Yuan and Y. Lin, “Model selection and estimation in regression with
grouped variables,” Journal of the Royal Statistical Society: Series B
(Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
[11] L. Meier, S. Van De Geer, and P. Bühlmann, “The group lasso for logistic
regression,” Journal of the Royal Statistical Society: Series B (Statistical
Methodology), vol. 70, no. 1, pp. 53–71, 2008.
[12] D. Ruta and B. Gabrys, “An overview of classifier fusion methods,”
Computing and Information systems, vol. 7, no. 1, pp. 1–10, 2000.
[13] Z. Liu, Q. Pan, J. Dezert, J. Han, and Y. He, “Classifier fusion with
contextual reliability evaluation,” IEEE Transactions on Cybernetics,
vol. 48, no. 5, pp. 1605–1618, May 2018.
[14] L. I. Kuncheva, J. C. Bezdek, and R. P. Duin, “Decision templates
for multiple classifier fusion: an experimental comparison,” Pattern
recognition, vol. 34, no. 2, pp. 299–314, 2001.
[15] N. J. Pizzi and W. Pedrycz, “Aggregating multiple classification results
using fuzzy integration and stochastic feature selection,” International
Journal of Approximate Reasoning, vol. 51, no. 8, pp. 883–894, 2010.
[16] G. J. Scott, R. A. Marcum, C. H. Davis, and T. W. Nivin, “Fusion
of deep convolutional neural networks for land cover classification of
high-resolution imagery,” IEEE Geoscience and Remote Sensing Letters,
vol. 14, no. 9, pp. 1638–1642, 2017.
[17] G. J. Scott, K. C. Hagan, R. A. Marcum, J. A. Hurt, D. T. Anderson, and
C. H. Davis, “Enhanced fusion of deep neural networks for classification
of benchmark high-resolution image data sets,” IEEE Geoscience and
Remote Sensing Letters, vol. 15, no. 9, pp. 1451–1455, 2018.
[18] D. T. Anderson, G. J. Scott, M. A. Islam, B. Murray, and R. Marcum,
“Fuzzy choquet integration of deep convolutional neural networks for
remote sensing,” in Computational Intelligence for Pattern Recognition.
Springer, 2018, pp. 1–28.
[19] J. M. Keller, D. B. Fogel, and D. Liu, Fundamentals of computational
intelligence: neural networks, fuzzy systems, and evolutionary computa-
tion. John Wiley & Sons, 2016.
[20] J. J. Buckley and Y. Hayashi, “Fuzzy neural networks: A survey,” Fuzzy
sets and systems, vol. 66, no. 1, pp. 1–13, 1994.

You might also like