Over the last few decades, a considerable body of
research, commentary, criticism, and concerns has focused on issues related to
the reliability and validity of student ratings of instruction (SRI). Despite
impressive research support of SRI reliability and validity, unsubstantiated
claims of potential biases continue to flourish and to be believed by faculty.
Every so often, a study claims to have identified factors proven to bias SRI
results whereas a large body of research evidence refutes these allegations.
Bias of SRI validity ?exists when a student, teacher, or
course characteristic affects the evaluations made, either positively or
negatively, but is unrelated to any criteria of good teaching, such as
increased student learning? (Centra, 2003, p. 498). Thus, in order to establish
a factor as biasing SRI validity, it should satisfy two conditions:
1. It should show significant relationships with, or have
effect on, results of SRIs; and
2. It should not relate to quality of teaching or to
promoting student learning.
Some extraneous factors that have been accused of biasing
SRI results are: For the instructor: academic rank, teaching experience,
personal characteristics, physical attractiveness, research productivity, and
having an Asian accent; For the student: year in college, and personality; For
the instructor and student: age, gender, race, ethnicity, nationality, and
other diversity issues; For the course?the time of day it is offered, the
number of rows in the classroom, and length of class meetings. However,
established research shows most consistently (see Chapter 4) that none of these
factors satisfies the two conditions as above (Centra, 2008; Remedios &
Lieberman, 2008). In some cases, the studies identifying these biasing factors
may have significant deficiencies.
This chapter examines relationships between student
ratings and several factors that teachers can control. Faculty who believe that
these factors can be manipulated to increase instructor ratings while not
improving teaching or learning (that is, that they bias SRI validity) may adopt
undesired teaching behaviors that would lead to damaging consequences.
Does instructor popularity/expressiveness/enthusiasm bias
SRI validity?
Many faculty members believe that SRIs measure instructor
expressiveness or style rather than the substance or content of teaching. They
argue that ?Most student rating schemes are nothing more than a popularity
contest with the warm, friendly, humorous instructor emerging as the winner
every time? (Aleamoni, 1999, p.154).
Indeed, interesting and engaging presentations are highly
correlated with student ratings of Overall Teaching (Feldman, 2007; Hativa,
1999; 2008; Hativa, Barak & Simhi, 2001) but they are by no means the sole
explanation of high ratings. Aleamoni (1999) found that students praised
instructors for their warm, friendly, humorous manner in the classroom but
frankly criticized them if their courses were not well organized or their
methods of stimulating students to learn were deficient.
Expressive instructors may also receive higher ratings
because their expressiveness stimulates and maintains student attention and
thus helps students learn. Furthermore, expressiveness includes a range of
specific behaviors related to good lecturing, such as speaking emphatically,
using humor, and moving around during the lecture. Trained observers have found
that highly rated faculty exhibit these behaviors more frequently than other
faculty (Murray, 2007).
In sum, even if the factor popularity/expressiveness/enthusiasm
is found to soundly correlate with student ratings, research evidence indicates
that it tends to enhance learning and therefore cannot be considered a biasing
factor (Cashin, 1995; Gravestock & Gregor-Greenleaf, 2008).
Does perceived course difficulty/workload bias SRI
validity?
In almost all SRI studies, the level of course difficulty
or workload is established not by direct measurement of actual difficulty or
workload, but rather by student ratings of relevant questionnaire items. Thus,
the accurate title for the factor ?course difficulty/workload? should be
?perceived course difficulty? or ?perceived workload?.
Many faculty members are concerned that perceived course
difficulty/workload substantially affect student ratings and thus bias SRI
results. They believe that there is an inverse relationship between
difficulty/workload and ratings on Overall Teaching, that is, the easier and
the less demanding the course, the higher the rating on Overall Teaching.
Nonetheless, the large majority of studies that examined this issue found
almost zero (or very small and non-significant) correlations between course
difficulty/workload and teacher ratings, that is, almost no relationships
(Cohen, 1981; Marsh and Dunkin, 1997).
In summary, perceived course difficulty/workload shows
almost no relationship or only weak relationship with teacher ratings so that
it does not bias SRI results.
Do expected grades bias SRI validity?
Many faculty members strongly believe that students tend
to rate them more highly when they expect to receive good grades, and that low
ratings might reflect students? retribution for low grades (Aleamoni, 1999;
Beran, Violato, Kline, & Frideres, 2005). Marsh (1987) found that over two
thirds of faculty members hold this belief.
We should note that ?grades? in SRI studies usually refer
to expected grades?those that students expect to receive based on their
performance to the day of SRI administration, and sometimes on the instructor?s
cues, or on rumors from students of previous offerings of the same course.
?Grades? do not refer usually to actual grades because teacher ratings are
generally administered during the last few weeks of the term, before students
take the final exam and receive their actual grades.
A few studies (Greenwald & Gillmore, 1997; Wachtel,
1998) indeed found a direct relationship between expectations of high grades
and positive teacher evaluations. They interpreted this as a clear indication
that students reward instructors for lenient grading by increasing their
ratings, and thus that grading leniency may bias SRI results. A different
possible interpretation is that even if some positive expected grade-SRI
correlations are identified, they may reflect a positive effect of expecting
high grades on encouraging students to work harder and learn more. Students
with these expectations may learn better and would rate the course and teacher
highly (Centra, 2003; Feldman, 2007; Marsh & Roche, 2000; Wachtel, 1998).
However, the large majority of studies on this issue (e.g., Abrami, 2001;
Centra, 2003; Marsh & Roche, 2000; Theall, Franklin, & Ludlow, 1990)
deny the existence of such relationships, showing that correlations between
expected grades and instructor ratings
are very small,
almost zero.
Altogether, student expected grades and grading leniency
are not biasing factors of SRI validity.
Do actual grades bias SRI results?
Because SRIs are usually administered before the final
exams take place and grades are assigned, only few studies examined
relationships between students? ratings of their teacher and their actual final
grades in that course. In studies that did use actual grades, the researchers
gathered them at some later point in time. These studies show a very small and
non-significant positive association between average class grades and teacher
ratings.
Similar to the discussion above on expected grades, even
if a small positive association is identified in some studies, it would not
indicate bias of the ratings? validity. Positive associations may well indicate
that good teachers, those who help their students learn the most and
consequently to do well on course exams, tend to be rated highly by their
students. In this case, both the higher grades of students and the higher
ratings of teachers are well deserved (Feldman, 1976).
All in all, student actual grades do not bias SRI
validity.
Can students be manipulated to give faculty higher
ratings?
The most popular beliefs among faculty are that they can
?buy? higher ratings by lowering course requirements, that is, that ?bribing?
students by entertaining them, watering-down the course material, reducing
difficulty/workload, and giving undeserved high grades will translate into
higher student ratings (Franklin & Theall, 1991; Heckert, Latier,
Ringwald-Burton, & Drazen, 2006; Marsh & Roche, 2000). Of the large
number of faculty SRI-related beliefs, these are probably the most potentially
damaging, because they may lead faculty to resort to counter-productive
teaching strategies. Faculty may be tempted to grade higher and to lower the
level of difficulty/workload in order to receive higher ratings from students
(Centra, 2003). This, in turn, may lead to grade inflation and to a decline in
the amount of effort that students put into their courses. The ultimate
consequence could be the ?dumbing down? of college education (Greenwald &
Gillmore, 1997). Here are some examples
for dumbing down
course content:
Building the subject slowly from the bottom up, giving
lots of examples in class, dropping topics from the syllabus when convenient,
and using homework problems as ?models? for exam problems
(Zucker, 2010, p. 821).
There is some evidence that these damaging behaviors have
already been adopted by certain instructors:
This performance measurement has lead to both unethical
grade inflation and coursework deflation as faculty try to entertain students
rather than educating them? instructors ease grading,
inflate grades and deflate course work when SET data is used for faculty
evaluation purposes. By inflating grades, easing grades,
and deflating coursework, an instructor games the system and thus, is more
likely to receive positive evaluations (Crumbley et al.,
2010, pp. 187-8).
Although many sources have discounted the likelihood of
grade inflation resulting from instructors trying to ?buy? better student
ratings of instruction, many faculty members still believe that there is
widespread manipulation of grades (Franklin & Theall, 1991, p. 1). There is
a negative ethical aspect to manipulating situations and students in order to
raise ratings. However, one cannot blame SRIs if the real issue/problem is
unethical teacher behavior.
The ironic point is that most of these manipulative
behaviors have not proven effective in raising teacher ratings as shown below,
but nonetheless they continue to be tried out by instructors. The assumption
underlying these behaviors is that many students strive for high grades with
easy course demands and a low workload. However, research does not support the
generality of this belief and on the contrary, there is almost a consensus
among experts refuting faculty beliefs about all types of manipulation, as next
explained.
Can manipulating difficulty/workload level increase
faculty ratings?
Workloads that students perceive as excessive may indeed
negatively affect their learning. Overloaded students may develop feelings of
stress and failure and adopt unhelpful learning strategies. However, contrary
to faculty beliefs that decreasing difficulty/workload will increase their
ratings, research evidence shows that if success is too easily achieved as a
result of an overly light workload, students may lose interest and devalue such
learning. Courses demanding the least amount of work tend to receive lower,
rather than higher ratings. Students tend to value learning and achievement
more highly when they involve a substantial degree of challenge and commitment
and require investing time and effort (Marsh & Roche, 2000). Students seem
to appreciate a workload that is of the right magnitude but is still sensible
and presents a challenge. Courses for which students indicated that the level
of difficulty or the workload was appropriate or that they had expended more
effort
s, were rated
higher than other courses. The more effort expended, the higher students
perceived the value of the course (Heckert et al., 2006; Marsh & Roche,
2000).
Can manipulating grades? level increase faculty ratings?
Students are not as likely to be positively affected if
an ineffective teacher seems to be trying to buy good ratings with easy grades.
In fact, the attempt may boomerang (McKeachie, 1997). McKeachie brings as an
example a faculty member whose grades were the highest in his department but
who received the lowest student ratings. Assigning undeserved higher grades to
students may have a negative effect on instructor ratings (Abrami, Dickens, Perry,
& Leventhal, 1980). However, the assumption that giving higher grades can
raise ratings may be correct if the instructor can convince students that they
have learned more than is typical and therefore they deserve the higher grades.
Abrami, P. C. (2001). Improving judgments about teaching
effectiveness using teacher rating forms. In M. Theall, P. C. Abrami & L.
A. Mets (Eds.), The student ratings debate: Are they valid? How can we best use
them? New directions for institutional research (Vol. 109, pp. 59-87). San
Francisco: Jossey-Bass.
Abrami, P. C., Dickens, W. J., Perry, R. P., &
Leventhal, L. (1980). Do teacher standards for assigning grades affect student
evaluations of instruction? Journal of Educational Psychology(72), 107-118.
Aleamoni, L. M. (1999). Student rating myths versus
research facts from 1924 to 1998. Journal of Personnel Evaluation in Education,
13(2), 153-166.
Beran, T., Violato, C., Kline, D., & Frideres, J.
(2005). The utility of student ratings of instruction for students, faculty,
and administrators: A ?consequential validity? study. Canadian Journal of
Higher Education, 35(2), 49-70.
Cashin, W. E. (1995). Student ratings of teaching: The
research revisited. IDEA Paper No. 32. Manhattan, KS: Kansas State University
Center for Faculty Evaluation & Development.
Centra, J. A. (2003). Will teachers receive higher
student evaluations by giving higher grades and less course work? Research in
Higher Education, 44(5), 495-518.
Centra, J. A. (2008). Differences in student ratings of
instruction: Is it bias? Paper presented at the 88th annual meeting of the
American Educational Research Association, New York.
Cohen, P. A. (1981). Student-ratings of instruction and
student-achievement - a meta-analysis of multisection validity studies. Review
of Educational Research, 51(3), 281-309.
Crumbley, D. L., Flinn, R. E., & Reichelt, K. J.
(2010). What is ethical about grade inflation and coursework deflation? Journal
of Academic Ethics, 8(3), 187-197.
Feldman, K. A. (1976). Grades and college students'
evaluations of their courses and teachers. Research in Higher Education, 4(1),
69-111.
Feldman, K. A. (2007). Identifying exemplary teachers and
teaching: Evidence from student ratings In R. P. Perry & J. C. Smart
(Eds.), The scholarship of teaching and learning in higher education: An
evidence-based perspective (pp. 93-143). Dordrecht, The Netherlands: Springer.
Franklin, J., & Theall, M. (1991). Grade inflation
and student ratings: A closer look. Paper presented at the annual meeting of
the American Educational Research Association, Chicago, IL.
Gravestock, P., & Gregor-Greenleaf, E. (2008).
Student course evaluations: Research, models and trends. Toronto: Higher
Education Quality Council of Ontario, from http://www.heqco.ca/SiteCollectionDocuments/Student
Course Evaluations.pdf
Greenwald, A. G., & Gillmore, G. M. (1997). No pain,
no gain? The importance of measuring course workload in student ratings of
instruction. Journal of Educational Psychology, 89(4), 743-751.
Hativa, N. (1999). Towards a conceptual framework of
dimensions of effective instruction: The role of high-intermediate-and
low-inference teaching behaviors. Instructional Evaluation and Faculty
Development, 18, 3-10. Retrieved from http://www.aera.net/Default.aspx?menu_id=168&id=914
Hativa, N. (2008). Lecturing for effective learning: Disc
1--Making lessons interesting. Sterling, VA: Stylus.
Hativa, N., Barak, R., & Simhi, E. (2001). Exemplary
university teachers: Knowledge and beliefs regarding effective teaching
dimensions and strategies. Journal of Higher Education, 72(6), 699-729.
Heckert, T. M., Latier, A., Ringwald-Burton, A., &
Drazen, C. (2006). Relations among student effort, perceived class difficulty appropriateness,
and student evaluations of teaching: Is it possible to" buy" better
evaluations through lenient grading? College Student Journal, 40(3), 588.
Marsh, H. W. (1987). Students' evaluations of university
teaching: Research findings, methodological issues, and directions for future
research. International Journal of Educational Research, 11(3), 253-388.
Marsh, H. W., & Dunkin, M. J. (1997). Students'
evaluations of university teaching: A multidimensional perspective. In R. P.
Perry & J. C. Smart (Eds.), Effective teaching in higher education:
Research and practice (pp. 241-313). New York: Agathon Press.
Marsh, H. W., & Roche, L. A. (2000). Effects of
grading leniency and low workload on students' evaluations of teaching: Popular
myth, bias, validity, or innocent bystanders? Journal of Educational
Psychology, 92(1), 202.
McKeachie, W. J. (1997). Student ratings: The validity of
use. American Psychologist, 52(11), 1218-1225. Retrieved from http://psycnet.apa.org/journals/amp/52/11/1218.pdf
Remedios, R., & Lieberman, D. A. (2008). I liked your
course because you taught me well: The influence of grades, workload,
expectations and goals on students' evaluations of teaching. British
Educational Research Journal, 34(1), 91-115.
Theall, M., Franklin, J., & Ludlow, L. (1990).
Attributions and retributions: The locus of student ratings and perceptions of
performance. Paper presented at the annual meeting of the American Educational
Research Association, Boston: MA.
Wachtel, H. K. (1998). Student evaluation of college
teaching effectiveness: A brief review. Assessment & Evaluation in Higher
Education, 23(2), 191-212.
No comments:
Post a Comment