Facial Expression Recognition Via Deep Sparse Autoencoders - Nianyin Zeng, Hong Zhang, Baoye Song, Weibo Liu, Yurong Li, Abdullah M. Dobaie

Facial Expression Recognition
via Learning Deep Sparse

Autoencoders (DSAE)
Niyanyin Zeng, Hong-Zhang, Baoye Song,
Weibo Liu,Yurong Li
Elsevier - Neurocomputing (2017)
Presented by: Md. Rashid Abid, 1018052031

Motivation
 Intensive Criminal Investigation
 Virtual Reality
 Intelligent Tutoring System
 Health Care
 Data Driven Animation
Sparse Autoencoders
SAE
•Activation function, hi
•W1 and b1 represent the weight matrix and the bias between the input and
hidden layer
• sigm(x) denotes the logistic sigmoid function
•In the decoding stage, the hidden representation hi is mapped back to the
reconstruction of input features, denoted by
•W2 and b2 denote the weight matrix and the bias between the hidden and
output layer
SAE
• Loss / Reconstruction Error
• Reconstruction Error of SAE
• no of hidden nodes = m, sparsity penalty item weight= β, sparsity

parametre= ρ
• Average Activation of Hidden Node j,
• KL= Kullback- Leibler divergence

SAE
 Purpose of training SAE is to find appropriate parametres
to minimize the objective function
 Backpropagation and LBFGS (Limited Memory Broyden–Fletcher–

Goldfarb–Shanno) algorithms are used to train the model
 Selecting a suitable number of hidden nodes is more important

than the learning algorithm or the depth of the model
Deep Sparse Autoencoders (DSAE)
DSAE
• WL is the weight of the last layer, Wi , k and bi,k denote the weight and bias
of the k-th layer
• yi represents the label of xi

• L = No of Layers , J = Objective function
• η = learning rate
• R = dimension of input features
Methodology
 Firstly, it is crucial to locate the accurate position of dense facial
landmarks with face alignment method and the typical active
appearance model (AAM) is adopted in this paper.
 Based on the characteristic of facial expressions, we choose 51

landmarks of the inner face because of their high accuracy and
reliability.
 Then, we extract the descriptors from the patches centered at

landmarks, and the high-dimensional feature is formally composed
by concatenating all descriptors.
 We particularly utilize three different descriptors:

- Histogram of oriented gradients (HOG)
- Local binary patterns (LBP)
- Gray value
 However, the usage of the high-dimensional feature will bring great
challenges on followed training, calculation, and storage.
 We need to compress the feature to make it practical and efficient

to apply, therefore, a linear dimension reduction method called as
the Principal Component Analysis (PCA) approach is utilized.
 Finally, the feature compressed by PCA is utilized as the input data

of the deep sparse autoencoders.
Methodology
• AAM = Active Appearance Model

• HOG = Histogram of Oriented Gradients
Experiments
Datasets
 Extended CK+ (Cohn-Kanade) Datasets
 8 Emotion States
- Angry
- Contempt
- Disgust
- Fear
- Happy
- Sadness
- Surprise
- Neutral
Datasets
 1635 Image Sequences
 Non- duplicated Images
 123 Subjects
 180 anger, 72 contempt, 236 disgust, 100
fear, 276 happy, 112 sadness, 332 surprise
and 327 neutral
 Leave One-Subject-Out Cross Validation
8 Class Emotion Sets
 One Input, Two Hidden, One Output
Layer
 100 Hidden Nodes ineach layer
 Iteration Parametre 0.05
 Iteration No 50 and 100 in pre-tuning
and fine-tuning state
 Mini Batch Size for both stages = 12
Results
Evaluation
• HOG is a good local feature descriptor

Confusion Matrix of 7-Class
Expression Recognition
 All seven expressions are distinguished with high accuracy.
 Happy, Disgust and Surprise achieve excellent performances

owing to their distinctive features in the regions of eye and mouth.
 Meanwhile, the expressions of Anger and Contempt provide

satisfactory results, while they are easily confused with the Sadness.
 In addition, the Anger expression is easily misclassified as the

Disgust.
 Accuracy percentages of Fear and Sadness expressions have a

slightly poorer recognition performance than other expressions
due to having subtle nuance of expression in both shape and
appearance features.
Confusion Matrix of 8-Class Facial
Expression Recognition
 Recognition accuracy drops slightly after adding the Neutral
expression.
 All expressions are easily confused with the Neutral expression

since the error rates reach up to 7.22%, 13.89%, 7.20%, 10% and
13.39% for the expressions of Anger, Contempt, Disgust, Fear
and Sadness.
 The expressions of Happy and Surprise get better performance.
 Expressions of Anger, Disgust and Neutral also have

satisfactory accuracy.
 Accuracy percentage of expressions of Contempt, Fear and

Sadness are lower than the others.
Conclusions
 A high-dimensional feature is introduced to the facial expression
recognition due to its containing the accurate and comprehensive
information of emotions.
 High Dimensional feature is a combination of

- Facial Geometric Features
- Appearance Features
 A DSAE-based deep learning framework is established for facial expression

recognition with high accuracy by learning robust and discriminative
features from the data set.
 The presented DSAE-based approach is successfully applied to distinguish
different facial expressions on the CK+ database.
 This work focuses on the 7-class and 8-class (including the neural) facial
expression recognition.
 The results shown that the DSAE-based approach outperforms other
three state-of-the-art approaches for 7-class recognition by as much as
3.17, 4.09, and 7.41 respectively, and achieves a good performance with
satisfactory accuracy for 8-class recognition.

Facial Expression Recognition Via Deep Sparse Autoencoders - Nianyin Zeng, Hong Zhang, Baoye Song, Weibo Liu, Yurong Li, Abdullah M. Dobaie

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Facial Expression Recognition Via Deep Sparse Autoencoders - Nianyin Zeng, Hong Zhang, Baoye Song, Weibo Liu, Yurong Li, Abdullah M. Dobaie

Uploaded by

Copyright:

Available Formats

Facial Expression Recognition

via Learning Deep Sparse

Presented by: Md. Rashid Abid, 1018052031

• Loss / Reconstruction Error

• Reconstruction Error of SAE

• no of hidden nodes = m, sparsity penalty item weight= β, sparsity

• KL= Kullback- Leibler divergence

 Backpropagation and LBFGS (Limited Memory Broyden–Fletcher–

 Selecting a suitable number of hidden nodes is more important

• yi represents the label of xi

 Based on the characteristic of facial expressions, we choose 51

 Then, we extract the descriptors from the patches centered at

 We particularly utilize three different descriptors:

 We need to compress the feature to make it practical and efficient

 Finally, the feature compressed by PCA is utilized as the input data

• AAM = Active Appearance Model

• HOG is a good local feature descriptor

 Happy, Disgust and Surprise achieve excellent performances

 Meanwhile, the expressions of Anger and Contempt provide

 In addition, the Anger expression is easily misclassified as the

 Accuracy percentages of Fear and Sadness expressions have a

 All expressions are easily confused with the Neutral expression

 The expressions of Happy and Surprise get better performance.

 Expressions of Anger, Disgust and Neutral also have

 Accuracy percentage of expressions of Contempt, Fear and

 High Dimensional feature is a combination of

 A DSAE-based deep learning framework is established for facial expression

You might also like