With just a short signal from the eye, the human brain can instantaneously pick a face out of a crowd, then recognize in that face any number of facial expressions--from a smile to a frown to a yawn.
But getting a computer to distinguish a human face from a basketball, a balloon or a kewpie doll has long stymied computer scientists.
Now, though, advances by psychologists and computer scientists have allowed a pair of competing research teams to design software that cannot only recognize a human face, but can precisely read its expression, detecting subtle changes in facial features.
If their designs are successful, a variety of researchers and clinicians could apply the tool in far-reaching ways. Researchers of the human condition could use it to help infer people's emotions. Clinicians could screen for possible psychopathologies. And movie-makers could easily program life-like cartoon expressions.
"If successful, these computer systems will provide automatic measurement of the face," says University of California, San Francisco, facial expression expert Paul Ekman, PhD, who has collaborated with both teams. "They'll be capable of doing everything from helping with basic research to monitoring air-traffic controllers for fatigue to market research on people's responses to advertising. Eventually they may even lead to interactive personal computers that can evaluate your mood."
Neither of the two computer programs--one being developed through the University of Pittsburgh and Carnegie Mellon University, the other by a team at the Salk Institute in La Jolla, Calif.--would be possible without the elaborate facial action coding system known as FACS. Developed by Ekman and Wallace Friesen, PhD, of the Langley Porter Neuropsychiatric Institute in San Francisco, the FACS system allows expert coders to manually measure facial expressions by breaking them down into component movements of individual facial muscles.
And though each team of psychologists and computer scientists is using different strategies in its development of the software, both are essentially teaching computers to recognize facial muscle movements in the same way FACS coders do but far more quickly. While it takes about an hour for an expert to code one minute of videotape, it takes the computers just five minutes and, eventually, they'll be able to code faces in real time. So far, both programs have proven as accurate in recognizing a subset of standard facial actions as human experts doing the same thing.
To determine which version of software works best, the two programs will go head-to-head this winter in a contest that several federal agencies have expressed interest in funding. Observers say it is likely that a combination of both programs will ultimately be part of a computerized facial expression reader. And that device could be available in just a couple of years, says Terrence Sejnowski, PhD, head of the Salk team and director of the institute's computational neurobiology laboratory.
Just the FACS
To understand the software, one needs to first understand the FACS system, which Ekman and Friesen developed in 1976 as an objective technique for measuring facial movements.
When people make faces--whether spontaneous expressions or deliberate contortions--they engage muscles around the eyes, mouth, nose and forehead. With FACS, Ekman and Friesen detailed which muscles move during which facial expressions.
For example, during a spontaneous smile, the corners of the mouth lift up through movement of a muscle called zygomaticus major, and the eyes crinkle, causing "crow's feet," through contraction of the orbicularis oculi muscle. FACS assigns each muscle movement an "action unit" number, so a smile is described as AU12--representing an uplifted mouth--plus AU6--representing crinkled eyes.
In all, Ekman and Friesan identified 46 distinct action units. In a theory still being debated by psychologists (see article, page 44) the two propose that specific combinations of action units represent prototypic expressions of emotion--such as joy, anger and fear. However, emotion labels, per se, are not a part of FACS. Instead, it is a purely descriptive system, providing a code for muscle movement alone.
"FACS is still a subjective method," says psychologist Jeffrey Cohn, PhD, a main collaborator on the Pittsburgh team. "But it's rigorously based on description of facial motion. Therefore, it provides a ground truth that can be used in expression recognition."
The goal of the computer programs, says Cohn, is to recognize specific action units, so that even when a face is performing multiple actions, the computer can tease them apart. It's also critical to design a system that detects expressions on any face, he adds, regardless of a person's gender, race or age. For him and his colleagues, that means "training" the computer with a database of videotaped facial expressions from more than 200 people of different racial and ethnic groups.
So far, their program has correctly identified 15 action units and action-unit combinations (see Psychophysiology, Vol. 36, p. 35-43 and Journal of Robotics and Autonomous Systems, in press).
The Salk group has had equally good success using images of 12 individual action units, many of which are difficult to discriminate from each other (see Psychophysiology, Vol. 36, p. 35-43 and IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, p. 974-989). And they're in the process of analyzing more complex expressions provided to them by Cohn and his colleagues.
"The hardest and most time-consuming part of all this work is collecting a database of images that is diverse enough and big enough to train the computer," says Sejnowski.
The ultimate 'face' off
Even though the two programs are competing, they have begun pooling resources to speed up progress. Meanwhile, in the planned competition, each program will attempt to analyze the expressions from the same set of videos provided by the University of San Francisco's Ekman.
They hope to have the results next fall.
"It's a healthy competition," says Cohn. "Our goal is a way of improving the methodology rather than coming out with a decision about whether one or the other technique is the way to go. We can learn a lot about our methods by working on a common data set."
Letters to the Editor
- Send us a letter