student evaluations

What we now know about student evaluations is much more depressing than you thought

Many in the education sector believe Students Evaluations of Teaching (SETs) evidence a gender bias. There have been decades of research into whether this is the case, however the results are often inconclusive. Although a recent large study at UNSW, relying on over 5000,000 survey results across 7 years, found evidence of bias against teachers identifying as women, and those with non-English speaking backgrounds, other studies have been inconclusive, or found that gender bias does not exist.

This disagreement can sometimes be explained by investigating the research design used in each of the studies more closely. For example, qualitative studies that look beyond the scores achieved by teachers  tend to suggest that gender may lead students to reward different kinds of behaviour in male and female identified teachers, including in my field, political science.

Given this disagreement, our team of researchers decided to undertake a new study which focussed on the comments that students wrote in their evaluations. In our paper, Gendered mundanities: gender bias in student evaluations of teaching in political science, we looked at all the evaluations in the School of Political Science and International Studies at the University of Queensland from 2015 to 2018, and examined the students’ answers to the standard qualitative questions in the SETs. What aspects of this teacher’s approach best helped your learning? What would you have liked this teacher to have done differently?

The University has an internal procedure for removing egregiously offensive comments from the surveys before they are passed on to the staff. This is important, since evidence from other Australian universities shows that some allow these comments to be passed on to staff, having a significant negative impact on their wellbeing and safety at work. The set of data we worked with had 0.15% of comments redacted, a very small proportion.

It is important to note that results of these evaluations in terms of the actual scores were high. They also showed no evidence of gender bias in so far as there was no statistically significant difference in the scores achieved by male and female identified teachers. This enabled us to focus on the question of whether the exact same set of data – showing that both male and female identified teachers achieve similarly high teaching scores – may produce a different result using a qualitative research design. We undertook a qualitative content analysis of the students’ answers to the two open ended questions.

Our first finding was that both male and female identified students evaluated female identified teachers in similar ways, but that male and female identified students evaluated male identified teachers in different ways. This implies that gender is doing some work, because otherwise the results would be similar for both groups of teachers. So we needed to look further to find out what kind of work that was.

We delved more closely into the comments about female identified teachers, and found that the most prominent traits associated with these teachers (who had achieved high numerical scores on the evaluations) were: approachable, questions, discussion, helpful, encouraged, input, time, friendly, ideas and feedback. Both male and female identified students evaluated female identified teachers consistently.  This led to our second finding: that when students comment on what they find most helpful about the teaching they receive, the traits most rewarded in female identified teachers are those related to stereotypically gendered expectations of women. Female-identified teachers were described as helping students’ learning when they were approachable, encouraged questions and discussion, allowed for student input, gave time, were friendly, and gave more feedback out of class time. These activities are time consuming, and emotionally burdensome.

We also delved more closely into the comments about male identified teachers (who had also achieved high numerical scores on the evaluations). There was greater variability in how students evaluated male identified teachers. Male identified students evaluated male identified teachers with a focus on knowledge, knowledgeable, inspiring, excellent, theoretical, passionate and best. Female identified students evaluated male identified teachers with a focus on funny, knows, and fun. Both male and female identified students evaluated male identified teachers based on their enthusiasm, passion and teaching style.  This led to our third finding: that the traits most commonly associated with male-identified teachers are likely to be related to stereotypically gendered masculine expectations. These are traits such as being knowledgeable, theoretical, engaging, and passionate. Notably, exhibiting these traits is unlikely to require additional time beyond normal preparation for teaching, or to constitute additional, burdensome, emotional labour.

Overall, our study showed that analysis of students’ comments can, and does, reveal a gender bias that may be invisible when one focusses solely on the scores achieved. We showed that the ways in which gender bias present can be mundane – we termed them gendered mundanities; harmful expectations of gendered behaviour that are invisible because of their everyday nature. The patterns we identified constituted regular reminders about what behaviour is required from male identified and female identified teachers to be seen by students as good at their teaching role. 

This means that SETs may be rewarding female and male staff for behaviours that conform to gender stereotypes. It also may mean that female and male staff are rewarded for behaviours that have differentiated impacts on the amount of time and energy they have available for other activities, including of course research. 

It is clear that SETs do not only measure the quality of teaching performance. They interact in gendered ways with students’ expectations of their male and female teachers. Universities still need to evaluate teaching performance, but they need to find a range of ways to do so, and be attentive to the gendered mundanities of students’ expectations of their teachers when doing so.

Katharine Gelber is a Professor in the School of Political Science and International Studies at the University of Queensland, a Fellow of the Academy of Social Sciences Australia, and a former ARC Future Fellow (2012-2015). 

Students love to complain about women and people of colour – their teachers. Here’s what happens next.


Any minute, your university students will get an email with a link. That link leads to one of the most dire tools of university performance, the evaluations of course content  and teaching quality.

These evaluations are meant to provide feedback to enhance course design and teaching methods. However, for several decades research has shown that despite the questions being asked, the factors influencing students’ responses have a minimal amount to do with either the course or teaching quality. 

They are instead shaped by student demographics, prejudice towards the teaching academic, and biases shaped by the classroom and university setting.

Despite the clear flaws underpinning the data student evaluations collect, universities continue to use this data as a measure of an academic’s teaching performance. Evaluation results influence an academic’s likelihood of being hired on a continuing basis for contract and sessional staff, receiving promotions for existing staff, and being fired or managed out during staff restructures.

This is a flawed method of evaluating people and it raises questions of why the sector continues to use student evaluations. But the negative impact is complicated further by the fact that we know evaluations impact on different groups of academics to different degrees. The groups impacted the most are the groups the academy declares to value, hopes to protect, and claims to have an interest in fostering their careers.

I recently completed a study where I reviewed the findings of existing research about student evaluations of courses and teaching. The paper, Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching, found that across studies covering more than 1,000,000 student evaluations, it is clear that women are at a disadvantage compared to men.

Different studies suggest the disadvantage can vary in size, and is highly dependent on disciplinary area, student demographics and other factors, but across the board, women are judged more harshly than men. At the extreme, this means women are more likely to fail evaluations than men, and researchers have routinely cited examples of more capable and higher performing women receiving lower scores than their less capable male counterparts. These results predictably mean women fare worse in job applications and promotions, and has been cited as a reason why women are represented less in the professoriate, and fill fewer leadership positions.

The same is true of factors such as race, gender, sexual identity, disability, language and other marginalising characteristics. Studies in different locations across more than two decades of solid research continually find that if an academic is not a white, English speaking, male in the approximately 35-50 year old age group and who students perceive to be able-bodied and heterosexual, this will result in some form of lower evaluation result. The negative repercussions of these results are also cumulative; a woman will receive lower results, and a person with a visible disability will receive lower, so a woman with a visible disability is likely to be treated extra harshly in the evaluations of her course and teaching.

What also cannot be ignored is that as a majority of the existing data originates from large-scale quantitative surveys, repeatedly researchers have noted that the rates of people within the sector who are disabled, identify as LGBTIQA+, people of colour, are refugees or immigrants, or a part of other marginalised groups are so underrepresented in the higher education sector that they do not count as a valid sample size.

At the broadest level, multiple studies showed that evaluation results can be impacted by disciplinary area and assessment type. Several studies have shown that academics in the sciences and associated fields receive lower evaluation results than their counterparts in the humanities and social sciences. Similarly, it has been noted that academics whose courses use essays and presentations for assessment fare better than those who rely on exams.

Institutional factors that have nothing to do with the class, or the academic teaching the class, have also been cited as reasons an academic will receive a lower evaluation score. Lower results can be given because of the class scheduling, class location, classroom design, class cleanliness, library facilities, and even the food options available on campus; all factors beyond the control of the academic teaching the class.

Official university responses to why they continue to carry out student evaluations when evaluations are so flawed and prejudiced towards the sector’s most vulnerable groups are rare. Existing studies suggest universities need data about course content, teaching quality, and student satisfaction, and student evaluations are the most cost and time effective method of gaining this information. In the past, perhaps the lack of data around evaluations was enough to convince institutions that a method of data collection that was seemingly not perfect was still acceptable due to the data that could be obtained rather quickly and easily. 

Considering what we know in 2021, time and cost effectiveness are not good enough reasons to continue a flawed practice that so blatantly discriminates against the sector’s women and those from marginalised groups.

Dr Troy Heffernan is Lecturer in Leadership at La Trobe University. His research examines higher education administration and policy with a particular focus on investigating the inequities that persist in the sector.