Over the last few decades, a considerable body of research, commentary, criticism, and concerns has focused on issues related to the reliability and validity of student ratings of instruction (SRI). Despite impressive research support of SRI reliability and validity, unsubstantiated claims of potential biases continue to flourish and to be believed by faculty. Every so often, a study claims to have identified factors proven to bias SRI results whereas a large body of research evidence refutes these allegations.
Bias of SRI validity ?exists when a student, teacher, or course characteristic affects the evaluations made, either positively or negatively, but is unrelated to any criteria of good teaching, such as increased student learning? (Centra, 2003, p. 498). Thus, in order to establish a factor as biasing SRI validity, it should satisfy two conditions:
1. It should show significant relationships with, or have effect on, results of SRIs; and
2. It should not relate to quality of teaching or to promoting student learning.
Some extraneous factors that have been accused of biasing SRI results are: For the instructor: academic rank, teaching experience, personal characteristics, physical attractiveness, research productivity, and having an Asian accent; For the student: year in college, and personality; For the instructor and student: age, gender, race, ethnicity, nationality, and other diversity issues; For the course?the time of day it is offered, the number of rows in the classroom, and length of class meetings. However, established research shows most consistently (see Chapter 4) that none of these factors satisfies the two conditions as above (Centra, 2008; Remedios & Lieberman, 2008). In some cases, the studies identifying these biasing factors may have significant deficiencies.
This chapter examines relationships between student ratings and several factors that teachers can control. Faculty who believe that these factors can be manipulated to increase instructor ratings while not improving teaching or learning (that is, that they bias SRI validity) may adopt undesired teaching behaviors that would lead to damaging consequences.
Does instructor popularity/expressiveness/enthusiasm bias SRI validity?
Many faculty members believe that SRIs measure instructor expressiveness or style rather than the substance or content of teaching. They argue that ?Most student rating schemes are nothing more than a popularity contest with the warm, friendly, humorous instructor emerging as the winner every time? (Aleamoni, 1999, p.154).
Indeed, interesting and engaging presentations are highly correlated with student ratings of Overall Teaching (Feldman, 2007; Hativa, 1999; 2008; Hativa, Barak & Simhi, 2001) but they are by no means the sole explanation of high ratings. Aleamoni (1999) found that students praised instructors for their warm, friendly, humorous manner in the classroom but frankly criticized them if their courses were not well organized or their methods of stimulating students to learn were deficient.
Expressive instructors may also receive higher ratings because their expressiveness stimulates and maintains student attention and thus helps students learn. Furthermore, expressiveness includes a range of specific behaviors related to good lecturing, such as speaking emphatically, using humor, and moving around during the lecture. Trained observers have found that highly rated faculty exhibit these behaviors more frequently than other faculty (Murray, 2007).
In sum, even if the factor popularity/expressiveness/enthusiasm is found to soundly correlate with student ratings, research evidence indicates that it tends to enhance learning and therefore cannot be considered a biasing factor (Cashin, 1995; Gravestock & Gregor-Greenleaf, 2008).
Does perceived course difficulty/workload bias SRI validity?
In almost all SRI studies, the level of course difficulty or workload is established not by direct measurement of actual difficulty or workload, but rather by student ratings of relevant questionnaire items. Thus, the accurate title for the factor ?course difficulty/workload? should be ?perceived course difficulty? or ?perceived workload?.
Many faculty members are concerned that perceived course difficulty/workload substantially affect student ratings and thus bias SRI results. They believe that there is an inverse relationship between difficulty/workload and ratings on Overall Teaching, that is, the easier and the less demanding the course, the higher the rating on Overall Teaching. Nonetheless, the large majority of studies that examined this issue found almost zero (or very small and non-significant) correlations between course difficulty/workload and teacher ratings, that is, almost no relationships (Cohen, 1981; Marsh and Dunkin, 1997).
In summary, perceived course difficulty/workload shows almost no relationship or only weak relationship with teacher ratings so that it does not bias SRI results.
Do expected grades bias SRI validity?
Many faculty members strongly believe that students tend to rate them more highly when they expect to receive good grades, and that low ratings might reflect students? retribution for low grades (Aleamoni, 1999; Beran, Violato, Kline, & Frideres, 2005). Marsh (1987) found that over two thirds of faculty members hold this belief.
We should note that ?grades? in SRI studies usually refer to expected grades?those that students expect to receive based on their performance to the day of SRI administration, and sometimes on the instructor?s cues, or on rumors from students of previous offerings of the same course. ?Grades? do not refer usually to actual grades because teacher ratings are generally administered during the last few weeks of the term, before students take the final exam and receive their actual grades.
A few studies (Greenwald & Gillmore, 1997; Wachtel, 1998) indeed found a direct relationship between expectations of high grades and positive teacher evaluations. They interpreted this as a clear indication that students reward instructors for lenient grading by increasing their ratings, and thus that grading leniency may bias SRI results. A different possible interpretation is that even if some positive expected grade-SRI correlations are identified, they may reflect a positive effect of expecting high grades on encouraging students to work harder and learn more. Students with these expectations may learn better and would rate the course and teacher highly (Centra, 2003; Feldman, 2007; Marsh & Roche, 2000; Wachtel, 1998). However, the large majority of studies on this issue (e.g., Abrami, 2001; Centra, 2003; Marsh & Roche, 2000; Theall, Franklin, & Ludlow, 1990) deny the existence of such relationships, showing that correlations between expected grades and instructor ratings
are very small, almost zero.
Altogether, student expected grades and grading leniency are not biasing factors of SRI validity.
Do actual grades bias SRI results?
Because SRIs are usually administered before the final exams take place and grades are assigned, only few studies examined relationships between students? ratings of their teacher and their actual final grades in that course. In studies that did use actual grades, the researchers gathered them at some later point in time. These studies show a very small and non-significant positive association between average class grades and teacher ratings.
Similar to the discussion above on expected grades, even if a small positive association is identified in some studies, it would not indicate bias of the ratings? validity. Positive associations may well indicate that good teachers, those who help their students learn the most and consequently to do well on course exams, tend to be rated highly by their students. In this case, both the higher grades of students and the higher ratings of teachers are well deserved (Feldman, 1976).
All in all, student actual grades do not bias SRI validity.
Can students be manipulated to give faculty higher ratings?
The most popular beliefs among faculty are that they can ?buy? higher ratings by lowering course requirements, that is, that ?bribing? students by entertaining them, watering-down the course material, reducing difficulty/workload, and giving undeserved high grades will translate into higher student ratings (Franklin & Theall, 1991; Heckert, Latier, Ringwald-Burton, & Drazen, 2006; Marsh & Roche, 2000). Of the large number of faculty SRI-related beliefs, these are probably the most potentially damaging, because they may lead faculty to resort to counter-productive teaching strategies. Faculty may be tempted to grade higher and to lower the level of difficulty/workload in order to receive higher ratings from students (Centra, 2003). This, in turn, may lead to grade inflation and to a decline in the amount of effort that students put into their courses. The ultimate consequence could be the ?dumbing down? of college education (Greenwald & Gillmore, 1997). Here are some examples
for dumbing down course content:
Building the subject slowly from the bottom up, giving lots of examples in class, dropping topics from the syllabus when convenient,
and using homework problems as ?models? for exam problems (Zucker, 2010, p. 821).
There is some evidence that these damaging behaviors have already been adopted by certain instructors:
This performance measurement has lead to both unethical grade inflation and coursework deflation as faculty try to entertain students
rather than educating them? instructors ease grading, inflate grades and deflate course work when SET data is used for faculty
evaluation purposes. By inflating grades, easing grades, and deflating coursework, an instructor games the system and thus, is more
likely to receive positive evaluations (Crumbley et al., 2010, pp. 187-8).
Although many sources have discounted the likelihood of grade inflation resulting from instructors trying to ?buy? better student ratings of instruction, many faculty members still believe that there is widespread manipulation of grades (Franklin & Theall, 1991, p. 1). There is a negative ethical aspect to manipulating situations and students in order to raise ratings. However, one cannot blame SRIs if the real issue/problem is unethical teacher behavior.
The ironic point is that most of these manipulative behaviors have not proven effective in raising teacher ratings as shown below, but nonetheless they continue to be tried out by instructors. The assumption underlying these behaviors is that many students strive for high grades with easy course demands and a low workload. However, research does not support the generality of this belief and on the contrary, there is almost a consensus among experts refuting faculty beliefs about all types of manipulation, as next explained.
Can manipulating difficulty/workload level increase faculty ratings?
Workloads that students perceive as excessive may indeed negatively affect their learning. Overloaded students may develop feelings of stress and failure and adopt unhelpful learning strategies. However, contrary to faculty beliefs that decreasing difficulty/workload will increase their ratings, research evidence shows that if success is too easily achieved as a result of an overly light workload, students may lose interest and devalue such learning. Courses demanding the least amount of work tend to receive lower, rather than higher ratings. Students tend to value learning and achievement more highly when they involve a substantial degree of challenge and commitment and require investing time and effort (Marsh & Roche, 2000). Students seem to appreciate a workload that is of the right magnitude but is still sensible and presents a challenge. Courses for which students indicated that the level of difficulty or the workload was appropriate or that they had expended more effort
s, were rated higher than other courses. The more effort expended, the higher students perceived the value of the course (Heckert et al., 2006; Marsh & Roche, 2000).
Can manipulating grades? level increase faculty ratings?
Students are not as likely to be positively affected if an ineffective teacher seems to be trying to buy good ratings with easy grades. In fact, the attempt may boomerang (McKeachie, 1997). McKeachie brings as an example a faculty member whose grades were the highest in his department but who received the lowest student ratings. Assigning undeserved higher grades to students may have a negative effect on instructor ratings (Abrami, Dickens, Perry, & Leventhal, 1980). However, the assumption that giving higher grades can raise ratings may be correct if the instructor can convince students that they have learned more than is typical and therefore they deserve the higher grades.
Abrami, P. C. (2001). Improving judgments about teaching effectiveness using teacher rating forms. In M. Theall, P. C. Abrami & L. A. Mets (Eds.), The student ratings debate: Are they valid? How can we best use them? New directions for institutional research (Vol. 109, pp. 59-87). San Francisco: Jossey-Bass.
Abrami, P. C., Dickens, W. J., Perry, R. P., & Leventhal, L. (1980). Do teacher standards for assigning grades affect student evaluations of instruction? Journal of Educational Psychology(72), 107-118.
Aleamoni, L. M. (1999). Student rating myths versus research facts from 1924 to 1998. Journal of Personnel Evaluation in Education, 13(2), 153-166.
Beran, T., Violato, C., Kline, D., & Frideres, J. (2005). The utility of student ratings of instruction for students, faculty, and administrators: A ?consequential validity? study. Canadian Journal of Higher Education, 35(2), 49-70.
Cashin, W. E. (1995). Student ratings of teaching: The research revisited. IDEA Paper No. 32. Manhattan, KS: Kansas State University Center for Faculty Evaluation & Development.
Centra, J. A. (2003). Will teachers receive higher student evaluations by giving higher grades and less course work? Research in Higher Education, 44(5), 495-518.
Centra, J. A. (2008). Differences in student ratings of instruction: Is it bias? Paper presented at the 88th annual meeting of the American Educational Research Association, New York.
Cohen, P. A. (1981). Student-ratings of instruction and student-achievement - a meta-analysis of multisection validity studies. Review of Educational Research, 51(3), 281-309.
Crumbley, D. L., Flinn, R. E., & Reichelt, K. J. (2010). What is ethical about grade inflation and coursework deflation? Journal of Academic Ethics, 8(3), 187-197.
Feldman, K. A. (1976). Grades and college students' evaluations of their courses and teachers. Research in Higher Education, 4(1), 69-111.
Feldman, K. A. (2007). Identifying exemplary teachers and teaching: Evidence from student ratings In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 93-143). Dordrecht, The Netherlands: Springer.
Franklin, J., & Theall, M. (1991). Grade inflation and student ratings: A closer look. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Gravestock, P., & Gregor-Greenleaf, E. (2008). Student course evaluations: Research, models and trends. Toronto: Higher Education Quality Council of Ontario, from http://www.heqco.ca/SiteCollectionDocuments/Student Course Evaluations.pdf
Greenwald, A. G., & Gillmore, G. M. (1997). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology, 89(4), 743-751.
Hativa, N. (1999). Towards a conceptual framework of dimensions of effective instruction: The role of high-intermediate-and low-inference teaching behaviors. Instructional Evaluation and Faculty Development, 18, 3-10. Retrieved from http://www.aera.net/Default.aspx?menu_id=168&id=914
Hativa, N. (2008). Lecturing for effective learning: Disc 1--Making lessons interesting. Sterling, VA: Stylus.
Hativa, N., Barak, R., & Simhi, E. (2001). Exemplary university teachers: Knowledge and beliefs regarding effective teaching dimensions and strategies. Journal of Higher Education, 72(6), 699-729.
Heckert, T. M., Latier, A., Ringwald-Burton, A., & Drazen, C. (2006). Relations among student effort, perceived class difficulty appropriateness, and student evaluations of teaching: Is it possible to" buy" better evaluations through lenient grading? College Student Journal, 40(3), 588.
Marsh, H. W. (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11(3), 253-388.
Marsh, H. W., & Dunkin, M. J. (1997). Students' evaluations of university teaching: A multidimensional perspective. In R. P. Perry & J. C. Smart (Eds.), Effective teaching in higher education: Research and practice (pp. 241-313). New York: Agathon Press.
Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students' evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92(1), 202.
McKeachie, W. J. (1997). Student ratings: The validity of use. American Psychologist, 52(11), 1218-1225. Retrieved from http://psycnet.apa.org/journals/amp/52/11/1218.pdf
Remedios, R., & Lieberman, D. A. (2008). I liked your course because you taught me well: The influence of grades, workload, expectations and goals on students' evaluations of teaching. British Educational Research Journal, 34(1), 91-115.
Theall, M., Franklin, J., & Ludlow, L. (1990). Attributions and retributions: The locus of student ratings and perceptions of performance. Paper presented at the annual meeting of the American Educational Research Association, Boston: MA.
Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review. Assessment & Evaluation in Higher Education, 23(2), 191-212.