Submitted:
20 September 2025
Posted:
22 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background
1.2. Problem Statement
2. Theoretical and Conceptual Framework
2.1. Desirable Difficulties Theory:
2.2. Campbell’s Law and Goodhart’s Law:
2.3. Multitask Principal–Agent Theory:
S = denotes satisfaction-visible effort, L = learning-invisible effort, Ws = the institutional weight on SET, and α parameters capture marginal disutility. Calibrate Ws = 0.2 using the promotion weighting simulated above; derive the first-order conditions showing a 23% effort reallocation toward S when Ws rises from 0.05 to 0.20. 3. Methodology and Stakeholder-Aligned Triangulation
3.1. Systematic Review and Meta-Analysis:
3.2. Reanalysis of Quasi-Experimental Datasets:
3.3. Psychometric Audit of SET Instruments:
3.4. Text Analysis of Qualitative Feedback:
3.5. External Outcome Triangulation:
4. Findings and Discussion
4.1. SETs and Long-Term Learning Outcomes (RQ1)
4.1.1. Weak or Zero Correlation with Learning:
4.1.2. Evidence from Quasi-Experiments:
4.1.3. Student Perceptions vs. Actual Learning:
4.1.4. Alternate Measures of Teaching Effectiveness:
4.2. Incentives and Behavioral Distortions under SET-Driven Evaluation (RQ2)
4.2.1. Grade Inflation and Leniency Bias:
4.2.2. Teaching to the Test (or to the Evaluation):
4.2.3. Erosion of Desirable Difficulties:
4.2.4. Bias Amplification and Faculty Impact:
4.2.5. Summary of RQ2:
5. Conclusion and Recommendations
References
- ABET. (2025–2026). Criteria for accrediting engineering programs. https://www.abet.org/accreditation/accreditation-criteria/criteria-for-accrediting-engineering-programs-2025-2026/.
- Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64(3), 431–441. [CrossRef]
- American Sociological Association (ASA). (2019, February 13). Statement on student evaluations of teaching. https://www.asanet.org/wp-content/uploads/asa_statement_on_student_evaluations_of_teaching_feb132020.pdf.
- Banas, J. A., Dunbar, N., Rodriguez, D., & Liu, S.-J. (2011). A review of humor in educational settings: Four decades of research. Communication Education, 60(1), 115–144. [CrossRef]
- Basow, S. A., & Martin, J. L. (2012). Bias in student evaluations. College Teaching, 60(1), 21-27. https://ldr.lafayette.edu/concern/publications/fb4948799.
- Boring, A., Ottoboni, K., Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 1-11. [CrossRef]
- Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. Gernsbacher, R. Pew, L. Hough, & J. Pomerantz (Eds.), Psychology and the real world (pp. 56-64). Worth. https://www.scirp.org/reference/referencespapers?referenceid=3200676.
- Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students' evaluations of professors. IZA Discussion Papers, No. 5620. https://www.iza.org/publications/dp/5620/evaluating-students-evaluations-of-professors.
- Bryant, J., Comisky, P. W., Crane, J. S., & Zillmann, D. (1980). Relationship between college teachers’ use of humor in the classroom and students’ evaluations of their teachers. Journal of Educational Psychology, 72(4), 511–519. [CrossRef]
- Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. [CrossRef]
- Carpenter, S. K., Wilford, M. M., Kornell, N., & Mullaney, K. M. (2013). Appearances can be deceiving: Instructor fluency increases perceptions of learning without increasing actual learning. Psychonomic Bulletin & Review, 20, 1350-1356. [CrossRef]
- Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409–432. [CrossRef]
- Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51(3), 281–309. [CrossRef]
- Dahabreh, I. J., Robertson, S. E., Petito, L. C., Hernán, M. A., & Steingrimsson, J. A. (2023). Efficient and robust methods for causally interpretable meta-analysis: Transporting inferences from multiple randomized trials to a target population. Biometrics, 79(2), 1057–1072. [CrossRef]
- Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257. [CrossRef]
- Dik, B. J., & Duffy, R. D. (2009). Calling and vocation in career psychology: A pathway to purpose. Journal of Career Assessment, 17(3), 331-341. [CrossRef]
- Egger, M., Davey Smith, G., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315(7109), 629–634. [CrossRef]
- European Association for Quality Assurance in Higher Education. (2015). Standards and guidelines for quality assurance in the European Higher Education Area (ESG). https://ehea.info/media.ehea.info/file/2015_Yerevan/72/7/European_Standards_and_Guidelines_for_Quality_Assurance_in_the_EHEA_2015_MC_613727.pdf.
- Feldman, K.A. (1989). The association between student ratings of specific instructional dimensions and student achievement: Refining and extending the synthesis of data from multisection validity studies. Res High Educ 30, 583–645. [CrossRef]
- Flaherty, C. (2019, September 9). Sociologists and more than a dozen other professional groups speak out against student evaluations of teaching. Inside Higher Ed. https://www.insidehighered.com/news/2019/09/10/sociologists-and-more-dozen-other-professional-groups-speak-out-against-student.
- Friedland S. (2025, September 11). ‘A for All’: Emory College faculty grapple with grade inflation. The Emory Wheel. Retrieved from https://www.emorywheel.com/article/2025/09/a-for-all-emory-college-faculty-grapple-with-grade-inflation. /: from https.
- Gallup, Inc., & Purdue University. (2014). Great jobs, great lives: The Gallup-Purdue Index report. https://www.purdue.edu/uns/images/2014/gpi-alumnireport14.pdf.
- Geraghty, T. (2024, July 19). Goodhart’s law, Campbell’s law, and the Cobra Effect. Psych Safety. https://psychsafety.com/goodharts-law-campbells-law-and-the-cobra-effect/.
- Gilbert, R.O., Gilbert, D.R. (2025). Student evaluations of teaching do not reflect student learning: an observational study. BMC Med Educ 25, 313. [CrossRef]
- Goodhart, C. A. E. (1984). Problems of monetary management: The UK experience. In Monetary theory and practice: The UK experience (pp. 91–121). London: Palgrave Macmillan. [CrossRef]
- Gray, K. (2024, December 9). What are employers looking for when reviewing college students’ resumes? National Association of Colleges and Employers. https://www.naceweb.org/talent-acquisition/candidate-selection/what-are-employers-looking-for-when-reviewing-college-students-resumes.
- Hartung, J. (1999). An alternative method for meta-analysis. Biometrical Journal, 41(8), 901–916.
- Hirsch, A. (2025, March 27). What if student evaluations measured what actually matters? Center for Innovative Teaching and Learning. Retrieved from https://citl.news.niu.edu/2025/03/27/what-if-student-evaluations-measured-what-actually-matters.
- Holmström, B., & Milgrom, P. (1991). Multitask principal-agent analyses: Incentive contracts, asset ownership, and job design. Journal of Law, Economics, & Organization, 7(1), 24-52. Retrieved from https://people.duke.edu/~qc2/BA532/1991%20JLEO%20Holmstrom%20Milgrom.pdf.
- Huemer, M. (2001). Student evaluations: A critical review. Unpublished manuscript. https://spot.colorado.edu/~huemer/papers/sef.htm.
- Indiana University Center for Postsecondary Research. (2024). NSSE 2024 annual results: Engagement insights. National Survey of Student Engagement. https://nsse.indiana.edu/nsse/reports-data/nsse-overview.html.
- IntHout, J., Ioannidis, J. P. A., Rovers, M. M., & Goeman, J. J. (2016). Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open, 6(7), e010247. [CrossRef]
- Knapp, G., & Hartung, J. (2003). Improved tests for a random-effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693-2710. [CrossRef]
- Kogan, V., Genetin, B., Chen, J., & Kalish, A. (2022, January 5). Students' grade satisfaction influences evaluations of teaching: Evidence from individual-level data and an experimental intervention (EdWorkingPaper No. 22-513). Annenberg Institute at Brown University. [CrossRef]
- Kornell, N. (2013, May 31). Do the best professors get the worst ratings? Psychology Today. https://www.psychologytoday.com/us/blog/everybody-is-stupid-except-you/201305/do-the-best-professors-get-the-worst-ratings.
- Knapp, G., & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710. [CrossRef]
- MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303. [CrossRef]
- Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187–1197. [CrossRef]
- Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. [CrossRef]
- Naftulin, D. H., Ware, J. E., Jr., & Donnelly, F. A. (1973). The Doctor Fox lecture: A paradigm of educational seduction. Journal of Medical Education, 48(7), 630–635. [CrossRef]
- National Association of Colleges and Employers. (2025a, January). Job Outlook 2025. https://www.naceweb.org/docs/default-source/default-document-library/2025/publication/research-report/2025-nace-job-outlook-jan-2025.pdf.
- National Association of Colleges and Employers. (2025b, January 13). The gap in perceptions of new grads’ competency proficiency and resources to shrink it. https://www.naceweb.org/career-readiness/competencies/the-gap-in-perceptions-of-new-grads-competency-proficiency-and-resources-to-shrink-it.
- Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157-159. [CrossRef]
- Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., … & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. [CrossRef]
- Radavoi, C.N., Quadrelli, C. & Collins, P. (2025). Moral Responsibility for Grade Inflation: Where Does It Lie?. J Acad Ethics. [CrossRef]
- Sacks, P. (1996). Generation X goes to college: An eye-opening account of teaching in post-modern America. Open Court. https://www.petersacks.org/generation_x_goes_to_college__an_eye_opening_account_of_teaching_in_postmodern_am_2221.htm.
- Sangwa, S., & Mutabazi, P. (2025). Mission-Driven Learning Theory: Ordering Knowledge and Competence to Life Mission. Preprints. [CrossRef]
- Sixbert, S. , Titus L., Simeon N., Placide M. (2025). Expertise-Autonomy Equilibria in African Higher Education: A Systematic Review of Student-Centred Pedagogies and Graduate Readiness. International Journal of Research and Innovation in Social Science (IJRISS), 9(03), 5419-5432. [CrossRef]
- Sparks, S. D. (2011, April 26). Studies find 'desirable difficulties' help students learn. Education Week. https://www.edweek.org/leadership/studies-find-desirable-difficulties-help-students-learn/2011/04.
- Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research. https://www.scienceopen.com/hosted-document?doi=10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1.
- Sterne, J. A. C., & Egger, M. (2001). Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis. Journal of Clinical Epidemiology, 54(10), 1046-1055. [CrossRef]
- Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. [CrossRef]
- Zumbo, B.D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF) LOGISTIC REGRESSION MODELING AS A UNITARY FRAMEWORK FOR BINARY AND LIKERT-TYPE (ORDINAL) ITEM SCORES. https://www.semanticscholar.org/paper/A-Handbook-on-the-Theory-and-Methods-of-Item-(DIF)-Zumbo/7f88fb0ad98645582665532600d7c46406fa2db6.



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).