Diagnosis of mental health disorders through the evaluation of facial expression with AI

Researchers in Germany have developed a method for identifying mental disorders based on facial expressions interpreted by computer vision.

The new approach can not only distinguish between affected and unaffected subjects, but can also correctly distinguish depression from schizophrenia, as well as the degree to which the patient is currently affected by the illness.

The researchers have provided a composite image representing the control group for their tests (on the left in the image below) and the patients suffering from mental disorders (right). The identities of various people are mixed in the depictions, with none of the images depicting a particular individual:

Source: https://arxiv.org/pdf/2208.01369.pdf

Source: https://arxiv.org/pdf/2208.01369.pdf

People with affective disorders tend to have arched eyebrows, leaden eyes, puffy faces, and embarrassed dog-mouth expressions. To protect patient privacy, these composite images are the only ones available to support the new work.

Until now, facial affect recognition has been used primarily as a potential tool for basic diagnosis. Instead, the new approach offers a possible method of assessing a patient’s progress throughout treatment, or else (potentially, although the paper does not suggest it) in their own home environment for outpatient follow-up.

The document says*:

‘Going beyond the automatic diagnosis of depression in affective computing, which has been developed in previous studieswe show that the measurable affective state estimated by computer vision contains much more information than the pure categorical classification.’

Researchers have named this technique Optoelectronic encephalography (OEG), a completely passive method of inferring mental status using facial image analysis rather than topical sensors or beam-based medical imaging technologies.

The authors conclude that OEG could potentially be not only a mere secondary aid to diagnosis and treatment, but, in the long term, a potential replacement for certain evaluative parts of the treatment process, and that it could reduce the time needed for the patient follow-up and initial diagnosis. They note:

‘In general, machine-predicted outcomes show better correlations compared to questionnaires based solely on clinical observer rating and are also objective. The relatively short measurement period of a few minutes for computer vision approaches is also noteworthy, while hours are sometimes required for clinical interviews.’

However, the authors wish to emphasize that patient care in this field is a multimodal quest, with many other indicators of the patient’s condition to consider besides their facial expressions, and that it is too early to consider that such a system could completely replace traditional approaches to mental disorders. Nonetheless, they see OEG as a promising complementary technology, particularly as a method of classifying the effects of pharmaceutical treatment on a patient’s prescribed regimen.

the paper its titled The face of affective disordersand comes from eight researchers from a wide range of institutions in the public and private medical research sector.


(The new document deals mainly with the various theories and methods that are currently popular in diagnosing patients with mental disorders, with less than usual attention to the actual technologies and processes used in the tests and various experiments)

Data collection was carried out at the Aachen University Hospital, with 100 gender-balanced patients and a control group of 50 unaffected people. The patients included 35 people suffering from schizophrenia and 65 people suffering from depression.

For the patient portion of the test group, baseline measurements were taken at the time of the first hospitalization and the second before hospital discharge, over an average interval of 12 weeks. Control group participants were arbitrarily recruited from the local population, with their own induction and “discharge” reflecting that of actual patients.

Indeed, the most important ‘ground truth’ for such an experiment must be the diagnosis obtained by standard and approved methods, and this was the case with the OEG assays.

However, the data collection stage yielded additional data more suitable for machine interpretation: interviews lasting an average of 90 minutes were captured in three phases using a Logitech c270 consumer webcam running at 25 fps.

The first session comprised a standard hamilton interview (based on research originated around 1960), as would normally be given at the time of admission. In the second phase, unusually, the patients (and their counterparts in the control group) were videos of a series of facial expressions, and they were asked to imitate each of them, while expressing their own estimation of their mental condition at that moment, including emotional state and intensity. This phase lasted about ten minutes.

In the third and final phase, participants were shown 96 videos of actors, each lasting just over ten seconds, apparently recounting intense emotional experiences. Participants were then asked to rate the emotion and intensity depicted in the videos, as well as their own corresponding feelings. This phase lasted about 15 minutes.


To arrive at the median average of captured faces (see the first image, above), emotional milestones were captured with the EmoNet structure. Subsequently, the correspondence between the shape of the face and the shape of the mean (averaged) face was determined through affine transformation by parts.

Recognition of dimensional emotions Y gaze prediction was performed on each landmark segment identified in the previous stage.

At this point, audio-based emotion inference has indicated that a teachable moment has arrived in the patient’s mental state, and the task is to capture the corresponding facial image and develop that dimension and mastery of their affective state.

(In the video above, we see the authors’ work on the dimensional emotion recognition technologies used by the researchers for the new work.)

The geodesic shape of the material was calculated for each frame of the data and the Singular Value Decomposition (SVD) reduction applied. The resulting time series data was finally modeled as a I WAS process, and then reduced further via SVD before MAP Adaptation.

Workflow for the geodesic reduction process.

Workflow for the geodesic reduction process.

The valence and excitation values ​​in the EmoNet network were also processed in a similar way with VAR modeling and sequence kernel calculation.


As explained above, the new work is primarily a medical research paper rather than a standard computer vision presentation, and we refer the reader to the paper itself for detailed coverage of the various OEG experiments performed by the researchers.

However, to summarize a selection of them:

Signs of affective disorder

Here 40 participants (not from the control or patient group) were asked to rate the assessed mean faces (see above) with respect to a series of questions, without being informed of the context of the data. The questions were:

What is the gender of the two faces?
Do the faces have an attractive appearance?
Are these expensive people trustworthy?
How do you assess the capacity for action of these people?
What is the emotion of the two faces?
What is the appearance of the skin of the two faces?
What is the impression of the look?
Do both faces have drooping corners of their lips?
Do both faces have raised brown eyes?
Are these people clinical patients?

The researchers found that these blind evaluations correlated with the recorded state of the processed data:

Box plot results for the 'middle face' survey.

Box plot results for the ‘middle face’ survey.

Clinical Evaluation

To measure the usefulness of OEG in the initial evaluation, the researchers first evaluated how effective the standard clinical evaluation is on its own, measuring the levels of improvement between induction and the second phase (at which point the patient usually receives treatment based on in medicines).

The researchers concluded that the status and severity of symptoms could be well assessed with this method, achieving a correlation of 0.82. However, an accurate diagnosis of schizophrenia or depression proved more challenging, as the standard method only scored -0.03 at this early stage.

The authors comment:

‘In essence, the patient’s status can be determined relatively well using the usual questionnaires. However, that is essentially all that can be concluded from it. It is not indicated if someone is depressed or rather schizophrenic. The same applies to response to treatment.’

The results of the machine process were able to obtain higher scores in this problem area and comparable scores for the initial patient evaluation aspect:

Higher numbers are better.  On the left, interview-based standard assessment accuracy results across four phases of the test architecture;  on the right, machine-based results.

Higher numbers are better. On the left, interview-based standard assessment accuracy results across four phases of the test architecture; on the right, machine-based results.

Diagnosis of the disorder

Distinguishing depression from schizophrenia through static facial images is it’s not a trivial matter. With cross-validation, the machine process was able to obtain highly accurate scores in the various phases of the tests:

In other experiments, the researchers were able to demonstrate evidence that OEG can perceive patient improvement through pharmacological treatment and general treatment of the disorder:

‘Causal inference on empirical background knowledge from data collection adjusted pharmacological treatment to observe a return to physiological regulation of facial dynamics. Such a return could not be observed during the clinical prescription.

“At the moment, it is unclear whether such a machine-based recommendation would actually result in significantly greater success of therapy. Especially since it is known what side effects drugs can have over a long period of time.

‘Nevertheless, [these kinds] of patient-tailored approaches would break down the barriers of the common categorical classification scheme that is still predominantly used in daily life.’

* My conversion of the authors’ online citations to hyperlinks.

First published on August 3, 2022.

Leave a Reply

Your email address will not be published.