Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Facial Expression Recognition

via Learning Deep Sparse


Autoencoders (DSAE)
Niyanyin Zeng, Hong-Zhang, Baoye Song,
Weibo Liu,Yurong Li
Elsevier - Neurocomputing (2017)

Presented by: Md. Rashid Abid, 1018052031


Motivation
 Intensive Criminal Investigation
 Virtual Reality
 Intelligent Tutoring System
 Health Care
 Data Driven Animation
Sparse Autoencoders
SAE

•Activation function, hi
•W1 and b1 represent the weight matrix and the bias between the input and
hidden layer
• sigm(x) denotes the logistic sigmoid function
•In the decoding stage, the hidden representation hi is mapped back to the
reconstruction of input features, denoted by
•W2 and b2 denote the weight matrix and the bias between the hidden and
output layer
SAE

• Loss / Reconstruction Error

• Reconstruction Error of SAE

• no of hidden nodes = m, sparsity penalty item weight= β, sparsity


parametre= ρ
• Average Activation of Hidden Node j,

• KL= Kullback- Leibler divergence


SAE
 Purpose of training SAE is to find appropriate parametres
to minimize the objective function

 Backpropagation and LBFGS (Limited Memory Broyden–Fletcher–


Goldfarb–Shanno) algorithms are used to train the model

 Selecting a suitable number of hidden nodes is more important


than the learning algorithm or the depth of the model
Deep Sparse Autoencoders (DSAE)
DSAE

• WL is the weight of the last layer, Wi , k and bi,k denote the weight and bias
of the k-th layer

• yi represents the label of xi


• L = No of Layers , J = Objective function

• η = learning rate
• R = dimension of input features
Methodology
 Firstly, it is crucial to locate the accurate position of dense facial
landmarks with face alignment method and the typical active
appearance model (AAM) is adopted in this paper.

 Based on the characteristic of facial expressions, we choose 51


landmarks of the inner face because of their high accuracy and
reliability.

 Then, we extract the descriptors from the patches centered at


landmarks, and the high-dimensional feature is formally composed
by concatenating all descriptors.

 We particularly utilize three different descriptors:


- Histogram of oriented gradients (HOG)
- Local binary patterns (LBP)
- Gray value
 However, the usage of the high-dimensional feature will bring great
challenges on followed training, calculation, and storage.

 We need to compress the feature to make it practical and efficient


to apply, therefore, a linear dimension reduction method called as
the Principal Component Analysis (PCA) approach is utilized.

 Finally, the feature compressed by PCA is utilized as the input data


of the deep sparse autoencoders.
Methodology

• AAM = Active Appearance Model


• HOG = Histogram of Oriented Gradients
Experiments
Datasets
 Extended CK+ (Cohn-Kanade) Datasets
 8 Emotion States
- Angry
- Contempt
- Disgust
- Fear
- Happy
- Sadness
- Surprise
- Neutral
Datasets
 1635 Image Sequences
 Non- duplicated Images
 123 Subjects
 180 anger, 72 contempt, 236 disgust, 100
fear, 276 happy, 112 sadness, 332 surprise
and 327 neutral
 Leave One-Subject-Out Cross Validation
8 Class Emotion Sets
 One Input, Two Hidden, One Output
Layer
 100 Hidden Nodes ineach layer
 Iteration Parametre 0.05
 Iteration No 50 and 100 in pre-tuning
and fine-tuning state
 Mini Batch Size for both stages = 12
Results
Evaluation

• HOG is a good local feature descriptor


Confusion Matrix of 7-Class
Expression Recognition
 All seven expressions are distinguished with high accuracy.

 Happy, Disgust and Surprise achieve excellent performances


owing to their distinctive features in the regions of eye and mouth.

 Meanwhile, the expressions of Anger and Contempt provide


satisfactory results, while they are easily confused with the Sadness.

 In addition, the Anger expression is easily misclassified as the


Disgust.

 Accuracy percentages of Fear and Sadness expressions have a


slightly poorer recognition performance than other expressions
due to having subtle nuance of expression in both shape and
appearance features.
Confusion Matrix of 8-Class Facial
Expression Recognition
 Recognition accuracy drops slightly after adding the Neutral
expression.

 All expressions are easily confused with the Neutral expression


since the error rates reach up to 7.22%, 13.89%, 7.20%, 10% and
13.39% for the expressions of Anger, Contempt, Disgust, Fear
and Sadness.

 The expressions of Happy and Surprise get better performance.

 Expressions of Anger, Disgust and Neutral also have


satisfactory accuracy.

 Accuracy percentage of expressions of Contempt, Fear and


Sadness are lower than the others.
Conclusions
 A high-dimensional feature is introduced to the facial expression
recognition due to its containing the accurate and comprehensive
information of emotions.

 High Dimensional feature is a combination of


- Facial Geometric Features
- Appearance Features

 A DSAE-based deep learning framework is established for facial expression


recognition with high accuracy by learning robust and discriminative
features from the data set.
 The presented DSAE-based approach is successfully applied to distinguish
different facial expressions on the CK+ database.
 This work focuses on the 7-class and 8-class (including the neural) facial
expression recognition.
 The results shown that the DSAE-based approach outperforms other
three state-of-the-art approaches for 7-class recognition by as much as
3.17, 4.09, and 7.41 respectively, and achieves a good performance with
satisfactory accuracy for 8-class recognition.

You might also like