Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

FACIAL EMOTION RECOGNITION: A HOLLISTIC REVIEW

ADITYA SINGH (), ANURAG YADAV (),

ARYAN MANDAL (), ARYAN KUMAR LATIYAN ()

STUDENTS GCET, B-TECH(EEE)

AKTU (ABDUL KALAM TECHNICAL UNIVERSITY), GREATER NOIDA


ABSTRACT- Facial emotion recognition is a field seeking to provide a comprehensive overview of its
having rapid enhancement with every day passing, evolution, methodologies, challenges, and
exploring and advancing the field of surveillance, promising applications. By delving into the
security, healthcare and numerous other critical
historical context, current state-of-the-art
infrastructures. Its underlying mechanism,
techniques, and future directions in this field, this
application, and burgeoning prominence in
contemporary society are highly valuable. This paper aims to shed light on the rapid progress and
research paper gives a precise overview of the current interdisciplinary nature of facial emotion
whereabouts in the field of emotional recognition and recognition research.
detection and the future need for responsible
deployment of facial recognition in the digital age. We begin by exploring the fundamental principles
underlying human facial expressions and emotions,
KEYWORDS- CONVOLUTIONAL NEURAL recognizing the role they play in human social
NETWORK(CNN), MULTI TASKING interaction and communication. We then review the
CNN(MTCNN), LONG SHORT-TERM
evolution of facial emotion recognition, from early
MEMORY(LSTM), Facial emotion recognition
(FER), ACTION UNIT AU, ADA boost, DNN (Deep manual methods to the contemporary machine
Neural Network), Haar Cascades, Artificial Neural learning and deep learning approaches that have
Network (ANN), Facial animation parameter (FAPs), revolutionized the field.
TDNN TIME DELAY NEURAL NETWORK,
Holistic Method, Feature based method, Hybrid Next, we delve into the methodologies and
Method. technologies that enable the automatic recognition
of emotions from facial expressions. This section
1. INTRODUCTION will encompass the various data sources, feature
extraction techniques, and machine learning
In an increasingly interconnected world driven by
algorithms that are commonly employed to achieve
technological innovation, the ability to accurately
high levels of accuracy in emotion detection.
perceive and interpret human emotions plays a
pivotal role in human-computer interaction, However, the journey of facial emotion recognition
psychology, and various other domains. Facial is not without its challenges. Ethical concerns
emotion recognition, a subfield of computer vision surrounding privacy, bias, and the responsible use
and artificial intelligence, has emerged as a crucial of this technology loom large. We will explore
area of study with far-reaching applications. It these ethical considerations and discuss the
holds the promise of enhancing human-machine measures taken to address them.
communication, revolutionizing marketing
strategies, improving mental health diagnosis, and 2. HISTORICAL EVALUATION
even contributing to the development of
Current research and public interest in facial
emotionally intelligent robots.
expression recognition stems from a rich history.
Facial emotion recognition involves the automated Scientific study and understanding of emotion is
detection and interpretation of emotional states thought to have begun in the 19th century with
from facial expressions, providing valuable insights Charles Darwin’s The Expression of the Emotions
into an individual's emotional well-being and in Man and Animals (originally published in 1872)
intentions. The significance of this technology is and G.G. Duchenne de Boulogne’s The Mechanism
underscored by its potential to bridge of Human Facial Expression (originally published
communication gaps, facilitate empathetic in 1862) (Mayne & Bonanno, 2001). These early
interactions, and offer insights into human works focused on the important role of facial
behaviour that were once elusive. displays in emotional life and introduced the theory
that emotions may be understood as biologically-
This research paper embarks on a journey through based reflex behaviours serving adaptive functions.
the landscape of facial emotion recognition, The Darwinian theory, that emotions serve to aid in
survival and that facial expressions and other To examine this increasingly recognized topic more
physiological responses serve to communicate deeply it is fitting to begin with the research on
intentions, was firmly rooted in the view of how facial recognition is thought to develop in
emotions as catalysts for physiological action. childhood and throughout the life span. As well, the
role of culture, class, gender and cognitive ability
More recent theorists “have begun to systematically in FER is important to review. The following
link specific emotions to social functions” [1]. For sections provide an overview of the literature on
example, Lazarus’ (1991) theory of emotion these topics.
emphasized the role of individual appraisal (such as
how the impact of an event is evaluated in terms of Facial expression recognition ability tends to
self-concept and relationships) on the experience of follow a developmental path, increasing in
emotion. How emotions are differentiated has been accuracy through experiences with others and
a “prominent recurrent question” [2]. Izard’s (1977, cognitive development. The ability to identify
1991) differential emotions theory pos its that emotions from facial expressions begins in infancy,
emotions have distinct neural substrates and facial and the ability to attach labels to basic emotions
configurations. begins for most children by age 18 months
(Bretherton, McNew, & Beeghly-Smith, 1981).
Facial expression recognition has been the topic of Findings from cross-sectional studies have
some special issues in academic journals [3], suggested that the recognition of certain emotions
including Behaviour Modification (Singh & Ellis, (happy, sad, and angry) improves to near-adult
1998). In this issue, Singh and Ellis presented level by age 5 years. Although the ability to
several articles that provided research data on the distinguish more sophisticated expressions (e.g.,
FER ability of individuals with different clinical disgust and surprise) appears to develop later, most
conditions. According to Singh and Ellis, children are able to identify and label the basic
understanding why some people have difficulty emotions of happy and angry by approximately 3
correctly recognizing the six basic facial years of age [4].
expressions of emotion at a socially acceptable
level is vital to helping them learn to interpret The precise mechanisms involved in the
facial expressions accurately. The ultimate goal, development and processes of facial expression
being, of course, that we not only treat the recognition ability are unclear and continue to be
underlying clinical disorders but also improve the the subjects of much research. However, there is
quality of people’s lives by enhancing their ability evidence for the importance of both early
to engage fully in the human experience. childhood experiences [5][6] and the development
of emotion processing neural systems [7] in the
Ekman (1992, 1993), known for decades of facial development of facial expression recognition
expression and emotion research, has made a case abilities. How individuals process nonverbal
for the existence of basic emotions (e.g., fear, joy, emotion expressions and how this processing may
sadness, anger, disgust, and surprise), which are affect social interaction and behaviour has been of
recognizable in facial expressions. A popular increasing research interest since the 1990’s [6][8]
television series, Lie to Me: The Truth is Written [9].
All Over our Faces is based on Ekman’s scientific
study of human facial expressions
(www.fox.com/lietome). This TV hit focuses on
and follows a scientist who studies facial ACCESS control Convolutional neural network
expressions to uncover lies and truth in difficult (CNN) is the most popular way of analyzing
legal cases. As well, Ekman’s work has received images. CNN is different from a multi-layer
attention from other media sources such as Time perceptron (MLP) as they have hidden layers,
Magazine and he has several mainstream books in called convolutional layers. The proposed method
publication. Thus, the growing interest in the is based on a two-level CNN framework. The first
structural and functional properties of facial level recommended is background removal, used to
expressions as well as individuals’ ability to extract emotions from an image. Here, the
recognize facial expressions is being demonstrated conventional CNN network module is used to
in mainstream popular culture and seems to parallel extract primary expressional vector (EV). The
research interests in the topic. expressional vector (EV) is generated by tracking
down relevant facial points of importance. EV is
2.1 DEVELOPMENT OF FACIAL directly related to changes in expression. The EV is
EXPRESSION RECOGNITION obtained using a basic perceptron unit applied on a
background-removed face image. In the proposed convolution operation. All the convolutional layers
FERC model, we also have a non-convolutional used are capable of pattern detection. Within each
perceptron layer as the last stage. Each of the convolutional layer, four filters were used. The
convolutional layers receives the input data (or input image fed to the first-part CNN (used for
image), transforms it, and then outputs it to the next background removal) generally consists of shapes,
level. edges, textures, and objects along with the face
[18].

All the convolutional layers used are capable of


pattern detection. Within each convolutional layer,
four filters were used. The input image fed to the
first-part CNN (used for background removal)
generally consists of shapes, edges, textures, and
objects along with the face. The edge detector,
circle detector, and corner detector filters are used
at the start of the convolutional layer 1. Once the
face has been detected, the second-part CNN filter
catches facial features, such as eyes, ears, lips,
nose, and cheeks. The second-part CNN consists of
layers with 3 × 3 kernel matrix, e.g., [0.25, 0.17,
0.9; 0.89, 0.36, 0.63; 0.7, 0.24, 0.82]. These
numbers are selected between 0 and 1 initially.
These numbers are optimized for EV detection,
based on the ground truth we had, in the
supervisory training dataset. Here, we used
minimum error decoding to optimize filter values.
Once the filter is tuned by supervisory learning, it
is then applied to the background-removed face
(i.e., on the output image of the first-part CNN), for
detection of different facial parts (e.g., eye, lips.
nose, ears, etc.) To generate the EV matrix, in all 24
various facial features are extracted. The EV
feature vector is nothing but values of normalized
Euclidian distance between each face part.(AD2)

3. METHODALOGY

Convolutional neural network (CNN) is the most


popular way of analyzing images. CNN is different
from a multi-layer perceptron (MLP) as they have
hidden layers, called convolutional layers. The
proposed method is based on a two-level CNN FIGURE 1[15]
framework. The first level recommended is
The edge detector, circle detector, and corner
background removal, used to extract emotions from
detector filters are used at the start of the
an image. Here, the conventional CNN network
convolutional layer 1. Once the face has been
module is used to extract primary expressional
detected, the second-part CNN filter catches facial
vector (EV) [17]. The expressional vector (EV) is
features, such as eyes, ears, lips, nose, and cheeks.
generated by tracking down relevant facial points
The edge detection filters used in this layer are
of importance. EV is directly related to changes in
shown in Fig. 3a. The second-part CNN consists of
expression. The EV is obtained using a basic
layers with 3 × 3 kernel matrix, e.g., [0.25, 0.17,
perceptron unit applied on a background-removed
0.9; 0.89, 0.36, 0.63; 0.7, 0.24, 0.82]. These
face image. In the proposed FERC model, we also
numbers are selected between 0 and 1 initially.
have a non-convolution perceptron layer as the last
These numbers are optimized for EV detection,
stage. Each of the convolutional layers receives the
based on the ground truth we had, in the
input data (or image), transforms it, and then
supervisory training dataset. Here, we used
outputs it to the next level. This transformation is
minimum error decoding to optimize filter values.
Once the filter is tuned by supervisory learning, it
is then applied to the background-removed face
(i.e., on the output image of the first-part CNN), for
detection of different facial parts (e.g., eye, lips.
nose, ears, etc.)

FIGURE 3

FERC is a novel technique for facial emotion


recognition that uses a two-part CNN. FERC
achieved an accuracy of 96% on a dataset of 10,000
images and good results on a larger dataset of
750,000 images. FERC could be used in a variety
of applications, such as predictive learning and lie
detection. If the input to FERC is a video, then the
frame with the maximum aggregated sum of white
FIGURE 2[16] pixels after Canny edge detection is selected as the
input, since this frame has the most detail.
To generate the EV matrix, in all 24 various facial
features are extracted. The EV feature vector is To extract human body parts from the input image,
nothing but values of normalized Euclidian a skin tone detection algorithm is applied. The skin
distance between each face part 3D face tone detection algorithm depends on the type of
recognition uses 3D face data for training and input image. If the image is grayscale, then Hough
testing purposes. Compared to 2D images, 3D faces transform is used. The background removal CNN
contain richer geometric information, which can (first-part CNN) uses two input features.
provide more discriminative features and help face
recognition systems overcome the inherent defects Convolutional neural network (CNN) is the most
and drawbacks of 2D face recognition [19]. popular way of analysing images. The proposed
method is based on a two-level CNN framework.
The first level recommended is background
removal, used to extract emotions from an image.
Here, the conventional CNN network module is
used to extract primary expressional vector (EV).
Once the filter is tuned by supervisory learning, it
is then applied to the background-removed face
(i.e., on the output image of the first-part CNN), for
detection of different facial parts (e.g., eye, lips.
nose, ears, etc [18].

3D face recognition has become an active research


topic in recent years due to its greater recognition
accuracy and robustness compared to 2D face
recognition.
3D face recognition uses 3D face data for training The essential contributing factors in the field of
and testing purposes. Compared to 2D images, 3D Facial Expression Recognition during the past
faces contain richer geometric information, which decade, its current whereabouts and the future
can provide more discriminative features and help prospects have been surveyed in this paper. Various
face recognition systems overcome the inherent processes discussed are CNN (convolutional neural
defects and drawbacks of 2D face recognition. One networks), FERC (facial emotional recognition), 3-
of the main challenges to 3D face recognition is the D Face recognition and IR (infrared imaging)
acquisition of 3D training images [20]. which contribute towards recognition more or less
being dependent on each other or in some case
Regular face recognition technology is really good independently. The crucial emphasis lies in refining
at recognizing faces in well-lit places where these techniques to align with the standards of a
everything is stable. But it struggles when things precise science, rather than relying solely on the
change, like when the lighting is different, or when accumulation of empirical knowledge. Real time
a person's face is in a different position or making a Processing, facial changes due to long span or
different expression. makeup, Lighting Conditions are the problems that
Unlike computers, our brains are amazing at are faced currently in Facial Expression
recognizing faces even when things change a little Recognition. To overcome these problems,
bit. We're great at recognizing faces even after a combining other modalities like speech and body
long time or if the face looks a bit different. language can enhance accuracy but also introduces
additional complexity. Therefore, we hope that our
So, in simple terms, IR pictures have special research helps future researchers in this field.
qualities that make it easier to recognize faces even
when things change, like expressions or lighting.
We wanted to see if infrared (IR) face images, 5. REFENRENCES
which are less affected by factors like face
direction and expression, can compensate for the [1] Keltner, D., & Haidt, J. (2001). Social
loss of specific visual details like edges, skin functions of emotions. In T. Mayne & G. A.
patterns, and texture. To do this, we used a Bonanno (Eds.)
classification algorithm based on Eigenfaces.[10]
[11][13][14] [2] Frida, N. H. (2004). The psychologists’ point
of view. In M. Lewis & J. M. Haviland-Jones
We tested this on a database with 40 different faces (Eds.)
in various positions and expressions. This can be
seen in fig. 4. Each face had 2-4 training images [3] Happe’, F. (2004). Editorial: Introduction to
(total of 96) and 5-10 test images (total of 250). We special section on pace processing. Journal of
chose the most important Eigenfaces by setting a Child Psychology and Psychiatry
threshold of 99%, which gave us a set of 78 [4] Izard, C. E., & Harris, P. (1995). Emotional
Eigenfaces. development and developmental
psychopathology. In Cicchetti & D. J. Cohen
(Eds.), Developmental psychopathology, Vol.
1: Theory and method (pp.467–503). New
York: John Wiley & Sons.

[5] Gibb, B. E., Schofield, C. A., & Coles, M. E.


(2009). Reported history of childhood abuse
and young adults’ information–processing
FIGURE 4 biases for facial display of emotion, Child
Maltreatment, 14, 148–156.
The key findings of this study indicate that IR face
images are less affected by changes in pose or [6] Pollak, S. D., & Sinha, P. (2002). Effects of
facial expression. They also allow for a early experience on children’s recognition of
straightforward method of detecting facial features. facial displays of emotion. Developmental
The paper explores various aspects of face Psychology, 38, 784–791.
recognition using IR images.
[7] Herba, C. M., Landau, S., Russell, T., Ecker,
4. CONCLUSIONS C., & Phillips, M. L. (2006). The development
of emotion processing in children: Effects of
age, emotion, and intensity. Journal of Child [19] Yaping Jing1, Xuequan Lu, and Shang
Psychology and Psychiatry, 47, 1098–2106. Goa, “3D-Face Recognition”.
[8] Crick, N. R., & Dodge, K. A. (1994). A review [20] Huibin Li, Di Huang, Jean Merie
and reformulation of social information- Morvan, Yunhong Wang, Liming Chen
processing mechanisms and children social
adjustment. Psychological Bulletin, 115, 74–
101.

[9] Maxim, L. A., & Nowicki, S. J., Jr. (2003).


Developmental associations between
nonverbal ability and social competence.
Philosophy, Sociology and Psychology, 2(10),
745–758.

[10] M. Turk and A. Pentland (1991).


Eigenfaces for recognition, J. Cog.
Neuroscience
[11] M. Kirby and L. Sirovich (1990)
Application of the karhunen-loeve
procedure for the Characterization of
human faces, IEEE Pattern Analysis and
Machine Intelligence.
[12] P. Phillips, H. Wechsler, J. Huang, and P.
Rauss (1998). The FERET database and
evaluation procedure for face recognition
algorithms, Image and Vision Computing.
[13] R. Cutler (1996). Face recognition using
infrared images and eigenfaces.
http://research.microsoft.com/~rcutler/face
/face.html
[14] J. Wilder, P.J. Phillips, C. Jiang, and S.
Wiener. Comparison of Visible and Infra-
Red Imagery for Face Recognition,
Proceedings of the 2nd International
Conference on Automatic Face and
Gesture Recognition, Killington.

[15] Wikipedia.com “Image recognition


process” algorithm.

[16] C.A. Hansen, “Face Recognition”,


Institute for computer Science University
of Tromso, Norway.
[17] Yann Lecum, “CNN”, AT&T Bell
Laboratories.
[18] Ninad Mehendale 1,2, “FERC”, Springer
Nature Switzerland AG 2020.

You might also like