Emotion Recognition in Real-Time Video Using Temporal-Aware Multi-Value Restricted Boltzmann Machines (TA-MvRBM)
Keywords:
Facial expression recognition, multi-scale CNN, adaptive fusion, attention mechanism, real-time processing, unconstrained environments, deep learning, computer vision, affective computingAbstract
This paper introduces an innovative Adaptive Multi-Scale Convolutional Neural Network (AMS-CNN) architecture for real-time facial expression recognition in unconstrained environments. Traditional facial expression recognition systems often struggle with variations in scale, illumination, and pose commonly encountered in real-world scenarios. Our proposed AMS-CNN addresses these challenges through a novel multi-scale feature extraction mechanism that adapts to different facial expression intensities and environmental conditions. The architecture incorporates three parallel convolutional pathways operating at different scales, coupled with an adaptive fusion module that dynamically weights features based on their discriminative power. Additionally, we introduce a Context-Aware Attention Mechanism (CAAM) that focuses on salient facial regions while suppressing noise from background and irrelevant facial areas. Extensive experiments on four benchmark datasets (FER2013, CK+, JAFFE, and SFEW) demonstrate superior performance, achieving 94.2% accuracy on FER2013 and 96.8% on CK+, representing improvements of 3.7% and 2.4% respectively over state-of-the-art methods. The system maintains real-time processing capabilities with 30 FPS on standard hardware while exhibiting robust performance under challenging conditions including partial occlusions, varying illumination, and head pose variations up to 45 degrees. Our contributions include the novel AMS-CNN architecture, the adaptive fusion strategy, comprehensive evaluation protocols, and a publicly available implementation framework for research reproducibility.











