This document provides a holistic review of facial emotion recognition. It begins by exploring the fundamental principles of human facial expressions and emotions. Next, it reviews the evolution of the field from early manual methods to contemporary machine learning and deep learning approaches. The document then delves into current methodologies and technologies used for automatic emotion recognition from facial expressions, including data sources, feature extraction, and machine learning algorithms. Finally, it explores ethical challenges around privacy, bias, and responsible use of this technology.
This document provides a holistic review of facial emotion recognition. It begins by exploring the fundamental principles of human facial expressions and emotions. Next, it reviews the evolution of the field from early manual methods to contemporary machine learning and deep learning approaches. The document then delves into current methodologies and technologies used for automatic emotion recognition from facial expressions, including data sources, feature extraction, and machine learning algorithms. Finally, it explores ethical challenges around privacy, bias, and responsible use of this technology.
This document provides a holistic review of facial emotion recognition. It begins by exploring the fundamental principles of human facial expressions and emotions. Next, it reviews the evolution of the field from early manual methods to contemporary machine learning and deep learning approaches. The document then delves into current methodologies and technologies used for automatic emotion recognition from facial expressions, including data sources, feature extraction, and machine learning algorithms. Finally, it explores ethical challenges around privacy, bias, and responsible use of this technology.
ABSTRACT- Facial emotion recognition is a field seeking to provide a comprehensive overview of its having rapid enhancement with every day passing, evolution, methodologies, challenges, and exploring and advancing the field of surveillance, promising applications. By delving into the security, healthcare and numerous other critical historical context, current state-of-the-art infrastructures. Its underlying mechanism, techniques, and future directions in this field, this application, and burgeoning prominence in contemporary society are highly valuable. This paper aims to shed light on the rapid progress and research paper gives a precise overview of the current interdisciplinary nature of facial emotion whereabouts in the field of emotional recognition and recognition research. detection and the future need for responsible deployment of facial recognition in the digital age. We begin by exploring the fundamental principles underlying human facial expressions and emotions, KEYWORDS- CONVOLUTIONAL NEURAL recognizing the role they play in human social NETWORK(CNN), MULTI TASKING interaction and communication. We then review the CNN(MTCNN), LONG SHORT-TERM evolution of facial emotion recognition, from early MEMORY(LSTM), Facial emotion recognition (FER), ACTION UNIT AU, ADA boost, DNN (Deep manual methods to the contemporary machine Neural Network), Haar Cascades, Artificial Neural learning and deep learning approaches that have Network (ANN), Facial animation parameter (FAPs), revolutionized the field. TDNN TIME DELAY NEURAL NETWORK, Holistic Method, Feature based method, Hybrid Next, we delve into the methodologies and Method. technologies that enable the automatic recognition of emotions from facial expressions. This section 1. INTRODUCTION will encompass the various data sources, feature extraction techniques, and machine learning In an increasingly interconnected world driven by algorithms that are commonly employed to achieve technological innovation, the ability to accurately high levels of accuracy in emotion detection. perceive and interpret human emotions plays a pivotal role in human-computer interaction, However, the journey of facial emotion recognition psychology, and various other domains. Facial is not without its challenges. Ethical concerns emotion recognition, a subfield of computer vision surrounding privacy, bias, and the responsible use and artificial intelligence, has emerged as a crucial of this technology loom large. We will explore area of study with far-reaching applications. It these ethical considerations and discuss the holds the promise of enhancing human-machine measures taken to address them. communication, revolutionizing marketing strategies, improving mental health diagnosis, and 2. HISTORICAL EVALUATION even contributing to the development of Current research and public interest in facial emotionally intelligent robots. expression recognition stems from a rich history. Facial emotion recognition involves the automated Scientific study and understanding of emotion is detection and interpretation of emotional states thought to have begun in the 19th century with from facial expressions, providing valuable insights Charles Darwin’s The Expression of the Emotions into an individual's emotional well-being and in Man and Animals (originally published in 1872) intentions. The significance of this technology is and G.G. Duchenne de Boulogne’s The Mechanism underscored by its potential to bridge of Human Facial Expression (originally published communication gaps, facilitate empathetic in 1862) (Mayne & Bonanno, 2001). These early interactions, and offer insights into human works focused on the important role of facial behaviour that were once elusive. displays in emotional life and introduced the theory that emotions may be understood as biologically- This research paper embarks on a journey through based reflex behaviours serving adaptive functions. the landscape of facial emotion recognition, The Darwinian theory, that emotions serve to aid in survival and that facial expressions and other To examine this increasingly recognized topic more physiological responses serve to communicate deeply it is fitting to begin with the research on intentions, was firmly rooted in the view of how facial recognition is thought to develop in emotions as catalysts for physiological action. childhood and throughout the life span. As well, the role of culture, class, gender and cognitive ability More recent theorists “have begun to systematically in FER is important to review. The following link specific emotions to social functions” [1]. For sections provide an overview of the literature on example, Lazarus’ (1991) theory of emotion these topics. emphasized the role of individual appraisal (such as how the impact of an event is evaluated in terms of Facial expression recognition ability tends to self-concept and relationships) on the experience of follow a developmental path, increasing in emotion. How emotions are differentiated has been accuracy through experiences with others and a “prominent recurrent question” [2]. Izard’s (1977, cognitive development. The ability to identify 1991) differential emotions theory pos its that emotions from facial expressions begins in infancy, emotions have distinct neural substrates and facial and the ability to attach labels to basic emotions configurations. begins for most children by age 18 months (Bretherton, McNew, & Beeghly-Smith, 1981). Facial expression recognition has been the topic of Findings from cross-sectional studies have some special issues in academic journals [3], suggested that the recognition of certain emotions including Behaviour Modification (Singh & Ellis, (happy, sad, and angry) improves to near-adult 1998). In this issue, Singh and Ellis presented level by age 5 years. Although the ability to several articles that provided research data on the distinguish more sophisticated expressions (e.g., FER ability of individuals with different clinical disgust and surprise) appears to develop later, most conditions. According to Singh and Ellis, children are able to identify and label the basic understanding why some people have difficulty emotions of happy and angry by approximately 3 correctly recognizing the six basic facial years of age [4]. expressions of emotion at a socially acceptable level is vital to helping them learn to interpret The precise mechanisms involved in the facial expressions accurately. The ultimate goal, development and processes of facial expression being, of course, that we not only treat the recognition ability are unclear and continue to be underlying clinical disorders but also improve the the subjects of much research. However, there is quality of people’s lives by enhancing their ability evidence for the importance of both early to engage fully in the human experience. childhood experiences [5][6] and the development of emotion processing neural systems [7] in the Ekman (1992, 1993), known for decades of facial development of facial expression recognition expression and emotion research, has made a case abilities. How individuals process nonverbal for the existence of basic emotions (e.g., fear, joy, emotion expressions and how this processing may sadness, anger, disgust, and surprise), which are affect social interaction and behaviour has been of recognizable in facial expressions. A popular increasing research interest since the 1990’s [6][8] television series, Lie to Me: The Truth is Written [9]. All Over our Faces is based on Ekman’s scientific study of human facial expressions (www.fox.com/lietome). This TV hit focuses on and follows a scientist who studies facial ACCESS control Convolutional neural network expressions to uncover lies and truth in difficult (CNN) is the most popular way of analyzing legal cases. As well, Ekman’s work has received images. CNN is different from a multi-layer attention from other media sources such as Time perceptron (MLP) as they have hidden layers, Magazine and he has several mainstream books in called convolutional layers. The proposed method publication. Thus, the growing interest in the is based on a two-level CNN framework. The first structural and functional properties of facial level recommended is background removal, used to expressions as well as individuals’ ability to extract emotions from an image. Here, the recognize facial expressions is being demonstrated conventional CNN network module is used to in mainstream popular culture and seems to parallel extract primary expressional vector (EV). The research interests in the topic. expressional vector (EV) is generated by tracking down relevant facial points of importance. EV is 2.1 DEVELOPMENT OF FACIAL directly related to changes in expression. The EV is EXPRESSION RECOGNITION obtained using a basic perceptron unit applied on a background-removed face image. In the proposed convolution operation. All the convolutional layers FERC model, we also have a non-convolutional used are capable of pattern detection. Within each perceptron layer as the last stage. Each of the convolutional layer, four filters were used. The convolutional layers receives the input data (or input image fed to the first-part CNN (used for image), transforms it, and then outputs it to the next background removal) generally consists of shapes, level. edges, textures, and objects along with the face [18].
All the convolutional layers used are capable of
pattern detection. Within each convolutional layer, four filters were used. The input image fed to the first-part CNN (used for background removal) generally consists of shapes, edges, textures, and objects along with the face. The edge detector, circle detector, and corner detector filters are used at the start of the convolutional layer 1. Once the face has been detected, the second-part CNN filter catches facial features, such as eyes, ears, lips, nose, and cheeks. The second-part CNN consists of layers with 3 × 3 kernel matrix, e.g., [0.25, 0.17, 0.9; 0.89, 0.36, 0.63; 0.7, 0.24, 0.82]. These numbers are selected between 0 and 1 initially. These numbers are optimized for EV detection, based on the ground truth we had, in the supervisory training dataset. Here, we used minimum error decoding to optimize filter values. Once the filter is tuned by supervisory learning, it is then applied to the background-removed face (i.e., on the output image of the first-part CNN), for detection of different facial parts (e.g., eye, lips. nose, ears, etc.) To generate the EV matrix, in all 24 various facial features are extracted. The EV feature vector is nothing but values of normalized Euclidian distance between each face part.(AD2)
3. METHODALOGY
Convolutional neural network (CNN) is the most
popular way of analyzing images. CNN is different from a multi-layer perceptron (MLP) as they have hidden layers, called convolutional layers. The proposed method is based on a two-level CNN FIGURE 1[15] framework. The first level recommended is The edge detector, circle detector, and corner background removal, used to extract emotions from detector filters are used at the start of the an image. Here, the conventional CNN network convolutional layer 1. Once the face has been module is used to extract primary expressional detected, the second-part CNN filter catches facial vector (EV) [17]. The expressional vector (EV) is features, such as eyes, ears, lips, nose, and cheeks. generated by tracking down relevant facial points The edge detection filters used in this layer are of importance. EV is directly related to changes in shown in Fig. 3a. The second-part CNN consists of expression. The EV is obtained using a basic layers with 3 × 3 kernel matrix, e.g., [0.25, 0.17, perceptron unit applied on a background-removed 0.9; 0.89, 0.36, 0.63; 0.7, 0.24, 0.82]. These face image. In the proposed FERC model, we also numbers are selected between 0 and 1 initially. have a non-convolution perceptron layer as the last These numbers are optimized for EV detection, stage. Each of the convolutional layers receives the based on the ground truth we had, in the input data (or image), transforms it, and then supervisory training dataset. Here, we used outputs it to the next level. This transformation is minimum error decoding to optimize filter values. Once the filter is tuned by supervisory learning, it is then applied to the background-removed face (i.e., on the output image of the first-part CNN), for detection of different facial parts (e.g., eye, lips. nose, ears, etc.)
FIGURE 3
FERC is a novel technique for facial emotion
recognition that uses a two-part CNN. FERC achieved an accuracy of 96% on a dataset of 10,000 images and good results on a larger dataset of 750,000 images. FERC could be used in a variety of applications, such as predictive learning and lie detection. If the input to FERC is a video, then the frame with the maximum aggregated sum of white FIGURE 2[16] pixels after Canny edge detection is selected as the input, since this frame has the most detail. To generate the EV matrix, in all 24 various facial features are extracted. The EV feature vector is To extract human body parts from the input image, nothing but values of normalized Euclidian a skin tone detection algorithm is applied. The skin distance between each face part 3D face tone detection algorithm depends on the type of recognition uses 3D face data for training and input image. If the image is grayscale, then Hough testing purposes. Compared to 2D images, 3D faces transform is used. The background removal CNN contain richer geometric information, which can (first-part CNN) uses two input features. provide more discriminative features and help face recognition systems overcome the inherent defects Convolutional neural network (CNN) is the most and drawbacks of 2D face recognition [19]. popular way of analysing images. The proposed method is based on a two-level CNN framework. The first level recommended is background removal, used to extract emotions from an image. Here, the conventional CNN network module is used to extract primary expressional vector (EV). Once the filter is tuned by supervisory learning, it is then applied to the background-removed face (i.e., on the output image of the first-part CNN), for detection of different facial parts (e.g., eye, lips. nose, ears, etc [18].
3D face recognition has become an active research
topic in recent years due to its greater recognition accuracy and robustness compared to 2D face recognition. 3D face recognition uses 3D face data for training The essential contributing factors in the field of and testing purposes. Compared to 2D images, 3D Facial Expression Recognition during the past faces contain richer geometric information, which decade, its current whereabouts and the future can provide more discriminative features and help prospects have been surveyed in this paper. Various face recognition systems overcome the inherent processes discussed are CNN (convolutional neural defects and drawbacks of 2D face recognition. One networks), FERC (facial emotional recognition), 3- of the main challenges to 3D face recognition is the D Face recognition and IR (infrared imaging) acquisition of 3D training images [20]. which contribute towards recognition more or less being dependent on each other or in some case Regular face recognition technology is really good independently. The crucial emphasis lies in refining at recognizing faces in well-lit places where these techniques to align with the standards of a everything is stable. But it struggles when things precise science, rather than relying solely on the change, like when the lighting is different, or when accumulation of empirical knowledge. Real time a person's face is in a different position or making a Processing, facial changes due to long span or different expression. makeup, Lighting Conditions are the problems that Unlike computers, our brains are amazing at are faced currently in Facial Expression recognizing faces even when things change a little Recognition. To overcome these problems, bit. We're great at recognizing faces even after a combining other modalities like speech and body long time or if the face looks a bit different. language can enhance accuracy but also introduces additional complexity. Therefore, we hope that our So, in simple terms, IR pictures have special research helps future researchers in this field. qualities that make it easier to recognize faces even when things change, like expressions or lighting. We wanted to see if infrared (IR) face images, 5. REFENRENCES which are less affected by factors like face direction and expression, can compensate for the [1] Keltner, D., & Haidt, J. (2001). Social loss of specific visual details like edges, skin functions of emotions. In T. Mayne & G. A. patterns, and texture. To do this, we used a Bonanno (Eds.) classification algorithm based on Eigenfaces.[10] [11][13][14] [2] Frida, N. H. (2004). The psychologists’ point of view. In M. Lewis & J. M. Haviland-Jones We tested this on a database with 40 different faces (Eds.) in various positions and expressions. This can be seen in fig. 4. Each face had 2-4 training images [3] Happe’, F. (2004). Editorial: Introduction to (total of 96) and 5-10 test images (total of 250). We special section on pace processing. Journal of chose the most important Eigenfaces by setting a Child Psychology and Psychiatry threshold of 99%, which gave us a set of 78 [4] Izard, C. E., & Harris, P. (1995). Emotional Eigenfaces. development and developmental psychopathology. In Cicchetti & D. J. Cohen (Eds.), Developmental psychopathology, Vol. 1: Theory and method (pp.467–503). New York: John Wiley & Sons.
[5] Gibb, B. E., Schofield, C. A., & Coles, M. E.
(2009). Reported history of childhood abuse and young adults’ information–processing FIGURE 4 biases for facial display of emotion, Child Maltreatment, 14, 148–156. The key findings of this study indicate that IR face images are less affected by changes in pose or [6] Pollak, S. D., & Sinha, P. (2002). Effects of facial expression. They also allow for a early experience on children’s recognition of straightforward method of detecting facial features. facial displays of emotion. Developmental The paper explores various aspects of face Psychology, 38, 784–791. recognition using IR images. [7] Herba, C. M., Landau, S., Russell, T., Ecker, 4. CONCLUSIONS C., & Phillips, M. L. (2006). The development of emotion processing in children: Effects of age, emotion, and intensity. Journal of Child [19] Yaping Jing1, Xuequan Lu, and Shang Psychology and Psychiatry, 47, 1098–2106. Goa, “3D-Face Recognition”. [8] Crick, N. R., & Dodge, K. A. (1994). A review [20] Huibin Li, Di Huang, Jean Merie and reformulation of social information- Morvan, Yunhong Wang, Liming Chen processing mechanisms and children social adjustment. Psychological Bulletin, 115, 74– 101.
[9] Maxim, L. A., & Nowicki, S. J., Jr. (2003).
Developmental associations between nonverbal ability and social competence. Philosophy, Sociology and Psychology, 2(10), 745–758.
[10] M. Turk and A. Pentland (1991).
Eigenfaces for recognition, J. Cog. Neuroscience [11] M. Kirby and L. Sirovich (1990) Application of the karhunen-loeve procedure for the Characterization of human faces, IEEE Pattern Analysis and Machine Intelligence. [12] P. Phillips, H. Wechsler, J. Huang, and P. Rauss (1998). The FERET database and evaluation procedure for face recognition algorithms, Image and Vision Computing. [13] R. Cutler (1996). Face recognition using infrared images and eigenfaces. http://research.microsoft.com/~rcutler/face /face.html [14] J. Wilder, P.J. Phillips, C. Jiang, and S. Wiener. Comparison of Visible and Infra- Red Imagery for Face Recognition, Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition, Killington.
[15] Wikipedia.com “Image recognition
process” algorithm.
[16] C.A. Hansen, “Face Recognition”,
Institute for computer Science University of Tromso, Norway. [17] Yann Lecum, “CNN”, AT&T Bell Laboratories. [18] Ninad Mehendale 1,2, “FERC”, Springer Nature Switzerland AG 2020.