There have been various attempts to recognize emotions from images or images. It is a well-known application field that is provided in the cloud API and becomes a topic on SNS (joy 95%, etc.).
The thesis linked below contains the content that in recognizing emotions from images, performance is improved by reflecting not only facial expressions but also body movements. In particular, it is said that the facial expression alone has an improvement effect for ambiguous cases. (It's a fun situation, but the face is almost indistinguishable from a crying face, etc.) There are also differences for each emotion. For example, in the case of Happiness, it is much more accurate to recognize it as a facial expression, whereas in the case of Fear, when it is recognized as a facial expression, it is 42%, body movement. It is said that the accuracy of 98% was obtained when recognized as. Although the tested dataset is a mini-size that considers a specific situation, it is said that it can reach the level of human recognition rate for the task if it is made considering both face and body movements.
In addition, I think that adding up the audio signal, that is, the result of extracting emotions from the voice, will yield a more meaningful result.