HOME SITE MAP CONTACT APA ONLINE
APA ONLINE  

VOLUME 29 , NUMBER 8 -August 1998

The latest techno tool: essay-grading computers

Forget spell and grammar check. New psychologist-designed software can grade the whole essay.

By Bridget Murray
Monitor staff

Last fall, Peter Foltz, PhD, assigned his undergraduates an essay on word recognition. But Foltz and his teaching assistants didn?t grade the bulk of the essays.

Instead, students in his psycholinguistics class at New Mexico State University opted to let a computer do the grading. They simply submitted their essay to a web site. Less than 30 seconds later, the computer?aided by software Foltz helped to develop?popped back a grade and feedback.

Perhaps students viewed the computer grader as less fallible than a professor, Foltz theorizes. Most likely, though, they relished the computer?s offer to let them revise their essays for a better grade, he says. "[The software] was useful because it pointed out what you missed, giving you several chances to develop your essay," says senior psychology major Monica Talachy, a student who took Foltz?s class. And instead of taking several days to grade the paper, it yielded immediate feedback, says Karl Bean, another senior who took the class.

"Right away you could correct your mistakes, add in missing items and submit the essay again," says Bean.

Known as the "Intelligent Essay Assessor," the software judges the thoroughness of an essay?s content by examining the meaning of the information it contains. The strategy is based on a form of artificial intelligence called "latent semantic analysis," an approach originated by psychologist Thomas Landauer, PhD, of the University of Colorado (UC) at Boulder. Foltz and Darrell Laham, a UC psychology doctoral student, helped Landauer develop the approach.

"The software looks for semantic similarities, which are associations between words and concepts," says Foltz. "If the concept is ?the doctor operated on the patient? and the student writes ?the surgeon wielded a scalpel,? the program would find them semantically similar."

The software grades consistently, whereas professors can grow weary or make mistakes, say its developers. It can serve as tutor and tester, they say. In addition to helping students practice writing and improve their essays, they argue that it enables essay-grading in large-scale testing?introductory college classes, for example, or standardized testing for entrance to professional schools.

"It?s ideal for essay responses to factual questions," says Landauer, who claims the essay assessor is a stronger measure of expression and knowledge retrieval than multiple choice.

"Everyone thinks it?s important for students to express themselves in words, and this software may allow us to test for that instead of using multiple choice," he says.

Many educators oppose computerized assessment of writing, however. Some doubt a computer can judge an argument?s cogency or grasp linguistic nuances the same way people can. Others worry that it stifles spontaneity and creativity, encouraging regurgitation of facts at a moment when education seeks to shed "drill-and-grill" approaches.

Computer in the grader?s seat

But, the developers argue, their software mimics human judgment. While in the past computerized writing assessors merely checked for word length and mechanics, this assessor checks for topic- and question-specific learning, says Landauer. To grade a set of essays, the software must itself "learn" about the topic by reading between 50,000 to 10 million words from online texts.

The software also uses one or more "gold-standard essays," written by professors or other experts on the topic, or other student essays already graded by the instructor, as the guidelines for its grading, says Foltz. In addition, he says, grades awarded by the software and human graders "are surprisingly well correlated." In a recent study now in press, Landauer found high levels of agreement on the way he, his teaching assistants and the computer graded 500 introductory psychology students? essays. Landauer is so impressed with the software?s reliability, he uses it to ensure that he and his teaching assistants grade students? papers consistently.

Does it stifle creativity?

Dazzled as they are by Intelligent Essay Assessor?s unswerving judgments, some educators and students worry that it hobbles creativity and overlooks writing quality. Foltz?s former students Karl Bean and Monica Talachy say it worked for focused questions, but they doubt that it?s fit for English class.

"Are you going to make your essay any more readable or entertaining if a computer is the only audience? Probably not," says Bean.

Certainly, the Intelligent Essay Assessor has its limits, say officials at the Educational Testing Service (ETS). But they are finalizing new software that they claim takes computerized essay grading to the next level?judging the caliber of an essay?s structure and arguments as well as its inclusion of key concepts.

"We look at argument structure and syntax in addition to semantics," says psychologist Larry Frase, PhD, executive director of the division of cognitive and instructional science at ETS. "We?ve taken advice out of books on good writing?varying sentence length, writing for clarity and readability?and put it into the software."

The new ETS software, known as "e-rater," looks for cues signifying the path of a writer?s argument, says Karen Kukich, a senior research scientist and head of the natural language processing group at ETS. It awards writers points for phrases such as "first, second and third," "by comparison," and "for example"?all phrases indicating transition or essay development, she says.

E-rater?s agreement with human graders is "exceptionally high," says Kukich?between 87 percent and 94 percent?and ETS may eventually use it to help grade essays for the Graduate Management Admissions Test, Graduate Record Examination and Test of English as a Foreign Language. However, the idea of using the technology in standardized testing raises red flags for many educators. One of them is Samuel Cameron, PhD, of Beaver College, chief faculty consultant for the psychology Advanced Placement (AP) exams. He balks at using software to grade AP essays. Human graders use a rubric, or scoring criteria, as a useful guideline for grading AP essays, but a computer would struggle to think outside that guideline, Cameron argues.

"A computer would be very literal in its interpretations, unable to make inferences," he says. "There are all sorts of ambiguities in writing due to students? varying use of grammar and different words and expressions. A person could get through the nuances, but a computer probably couldn?t."

Cameron also worries that a computer would overlook the value of innovative ideas.

"The program might work in 50 percent to 60 percent of cases, but there will always be those creative answers it won?t get," he says. "In psychology there are often multiple ways to get a satisfactory answer."

Cameron also thinks essay-grading software would escalate students? complaints about the way that their essays are graded. Students would argue that the computer missed their point or misunderstood their meaning, he predicts. Also raising concerns is psychologist Neil Lutsky, PhD, president-elect of APA?s Div. 2 (Society for the Teaching of Psychology). Lutsky hesitates to criticize software he hasn?t used, but he says it calls into question the very function of student essays.

"If the idea of essay writing is to stimulate more active, higher-order thinking rather than direct reflection of the material just learned, we need to question whether this technology is helping us do that," says Lutsky.

Not to worry

The software developers acknowledge educators? concerns about their essay assessors? affects on creativity and reflective thinking. But, they say, their software isn?t meant to judge creative or sophisticated writing. Rather, it is geared for expository essays on factual topics?papers describing how a psychologist?s theory works, for example, or relating the structure of the human heart, says Landauer.

"It?s meant for topics about which the chances of saying something new and creative are small," says Landauer. "We?re not pretending you can use this to score essays that are supposed to be creative about a new topic."

Also, Foltz argues, the program does allow for some creativity. It identifies essays that are unlike essays it has graded previously, leaving them to a human grader to check. The program isn?t meant to dole out grades to students, but to provide information that helps the instructor assign a grade, Landauer says.

Psychology professor Diane Halpern, PhD, president of Div. 2, agrees that the software could be a useful tool. She views it as a way to encourage writing at the undergraduate level.

"It could be a valuable assessment technique for short-answer essay questions in large introductory courses," says Halpern, chair of the psychology department at California State University?San Bernardino. "By reducing the grading load on professors, it could free us up to ask and grade more open-ended questions."

Cover Page for This Issue




© PsycNET 2008 American Psychological Association