Taking a look at the problems with SPOTs

End-of-the-semester course evaluations are supposed to reflect a teacher’s abilities, but often just report students’ personal biases

Courtesy of Pixabay

Joe Pye, Senior Writer
February 2, 2018

Teaching evaluations should give solid feedback from students to department heads on whether an instructor is worth continued employment — but that’s not always what happens.

From low response rates to gender, ethnicity, and age biases, teaching evaluations, like FAU’s Student Perception of Teaching survey, have a long way to go until they’re a concrete measurement of a teacher’s abilities.

Flawed from the start

On average, just 58 percent of FAU’s 30,000 students take the Student Perception of Teaching evaluations, said James Capp, assistant provost for Academic Operations and Planning. This is down from the 66 percent response rate previously seen with the original paper format. (See SPOTs FAQs for a brief history on the teaching evaluation.)

Researchers argue that any response rate under 66 percent is “utterly useless,” like Richard Freishtat, director of the University of California Berkeley Center for Teaching and Learning. He added that any evaluation with a response rate under 100 percent should be interpreted with caution.

Teaching evaluations researcher Philip B. Stark, a UC Berkeley statistics professor, agrees with Freishtat, saying the current response rate is an “imperfect census.”

FAU even admits the SPOT survey has limitations, but still views it as a valuable tool.

“Despite the limitations of the SPOT forms, the department acknowledges the validity of student input as one part of a holistic approach to the evaluation of teaching,” according to FAU’s annual evaluation criteria.

Yet tenure and promotion committees at FAU consider the responses to SPOT’s sixth question: “Rate your instructor’s overall teaching effectiveness in this course” as part of a review of a faculty member.

A SPOT evaluation is what is known as a Likert scale survey — where a student is asked to select how much they agree or disagree with a statement.

Freishtat feels FAU, along with other universities, uses this system because it’s easy to administer. It’s a quick way to get a number to use as a mean score that promotion committees can then use to compare faculty members in the department.

More often than not, he believes universities rely on these evaluations too much. And sometimes, they’re the only source that is taken into consideration at all when judging a teacher’s performance.

SPOTs’ current response rate, on average, is 58 percent. Teaching evaluations researcher Richard Freishtat said any response rate under 66 percent is “utterly useless.” For the following semesters, the bolded percentage is how many courses had over a 66 percent response rate. This means that Freishtat would consider the majority of 2016-17’s SPOT evaluations “useless.”

The average of these four semesters is 25 percent.

Summer 2016: 22 percent
Fall 2016: 27 percent
Spring 2017: 30 percent
Summer 2017: 21 percent

Teaching evaluations researcher Philip B. Stark. Illustration by Ivan Benavides

Are teaching evaluations biased?

Research suggests yes.

Both Freishtat and Stark agree that end-of-the-semester teaching evaluations don’t evaluate teaching at all. Instead, they measure gender, ethnic, and student satisfaction biases.

Students often write in the comment sections about an instructor’s accent and ethnicity as reasons for their poor scores, according to the two researchers. Meanwhile, women are criticized over their looks and personalities, as well as their gender.

A 2014 study published in the journal, “Innovative Higher Education,” showed that sexism is prevalent throughout teaching evaluations.

The study involved an online class that was split into four groups, with a male and a female professor each teaching two groups. The professors then switched their online identities, with the male instructor assuming the identity of the female instructor and vice versa. The female professor, who was posing as a male teacher, received higher marks while the latter earned lower.

Students also used stereotypical gendered terms to describe both professors in their comments. They referred to, who they thought, was the female professor as “motherly,” and the “male professor” as “brilliant” or “funny.”

This sexism can carry over to SPOTs and ultimately affect the future of teachers at FAU. If female teachers are unfairly critiqued because of their gender, that could carry over to a department’s review committee on whether to promote them.

On top of this, students’ responses often contradict each other: what one student perceives as a flaw in the instructor’s performance, another finds to be a strength rather than a weakness.

And because these surveys are anonymous, students tend to lose any filter they’d have if their name was attached, sometimes leading to openly prejudiced comments.

Faculty are torn on how to handle these evaluations.

UC Berkeley has been hesitant to make changes with its Student Evaluations of Teaching. Currently only half of the campus has switched to an online version, Freishtat says.

When the school’s Faculty Senate voted on the change, many were doubtful.

Whereas at FAU, the majority of the Faculty Senate was in favor of the change despite the low response rate, with only a select minority concerned, according to former Faculty Union President Robert Zoeller. The Senate meets to discuss general educational policy four times every semester.

Former Faculty Union President Robert Zoeller. Illustration by Ivan Benavides

Zoeller said that he raised questions when the Senate voted to move SPOTs online in 2014.

“I remember saying to the Senate, ‘Just cause it’s electronic, doesn’t make it better.’ Ya’ know, it’s sexier and fancier, but is it really better?” Zoeller asked.

When they were administered on paper, SPOT evaluations had undergone a process called “validation.” This is where research is conducted to determine if a survey does what it says it does. But when the survey was shortened, altered, and moved online, it did not undergo this process.

Zoeller again questioned the Senate in 2015 when it voted in favor of decreasing SPOTs’ number of questions from 21 to six. He feels that the survey is no longer a valid tool.

“They seemed to be in a hurry by my impression,” Zoeller said. “It’s kind of important, why are we not making sure that we don’t have a valid instrument? A number of people agreed, but when it came to the vote it went through overwhelmingly.”

Freishtat has been studying student evaluations of teachers at UC Berkeley for five years now, and he said he doesn’t believe there’s a way to quantitatively measure whether an instructor is an effective teacher.

The researcher said that students’ perceptions of teachers are just one piece of a small puzzle that would make up a sufficient tool to measure an instructor’s teaching effectiveness.

“We have no single objective quantifiable measure to say, ‘This student learned this much, in this class, because of this teacher,’” he said. “That doesn’t exist, nor do I know that it can exist.”

The largest class to have a 100 percent response rate had only 29 students. The fall and spring semesters have courses with up to 568 students.

Summer 2016: 1.9 percent of undergraduate classes had a 100 percent student response rate.

Fall 2016: 0.98 percent of undergraduate classes had a 100 percent student response rate.

Spring 2017: 1.4 percent of undergraduate classes had a 100 percent student response rate.

Summer 2017: 1.2 percent of undergraduate classes had a 100 percent student response rate.

Joe Pye is the SPOTs special issue writer for the University Press. For information regarding this or other stories, email [email protected] or tweet him @jpeg3189.

< Return to UP Issue 9 Articles