Description
Transformer-based architectures have recently advanced and showed promise in a number of machine learning applications. Speech emotion recognition (SER) has been a successful use of these systems in the audio domain. Nevertheless, the impact of pre-training data and model size on downstream performance has not been assessed in previous studies, and generalization, robustness, fairness, and efficiency have received scant consideration. This contribution does a comprehensive examination of these aspects on many pre-trained versions of wav2vec 2.0 and HuBERT, which we optimized on MSP-Podcast dimensions arousal, dominance, and valence. We also examine cross-corpus generalization using IEMOCAP and MOSI. This project is implemented for finding the emotion classification using voice dataset in the .wav file extension and preprocess the sound waves by the Data Augmentation, noise removal, spectrogram conversion. Extract the features according to the classes like happy, sad, and so on. Data splitting with the ratio of 8:2 and apply the classification algorithms like CNN and prediction is done for the algorithms and get the output of which class is this. Finally get the performance analysis for the deep learning model.
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.