Article
Version 1
Preserved in Portico This version is not peer-reviewed
Low Inter-rater Reliability of a High Stakes Assessment of Teacher Candidates
Version 1
: Received: 14 August 2021 / Approved: 16 August 2021 / Online: 16 August 2021 (10:51:52 CEST)
A peer-reviewed article of this Preprint also exists.
Journal reference: Educ. Sci. 2021
DOI: 10.3390/educsci11100648
Abstract
The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen’s weighted kappa, the overall IRR estimate was .17 (poor strength of agreement). IRR estimates ranged from -.29 (worse than expected by chance) to .54 (moderate strength of agreement); all were below the standard of .70 for consensus agreement. Follow up interviews of 10 evaluators revealed possible reasons we observed low IRR, such as departures from established PACT scoring protocol, and lack of, or inconsistent, use of a scoring aid document. Evaluators reported difficulties scoring the materials that candidates submitted, particularly the use of Academic Language. Cognitive Task Analysis (CTA) is suggested as a method to improve IRR in the PACT and other teacher performance assessments such as the edTPA.
Keywords
Inter-rater reliability; preservice teacher performance assessment; PACT; edTPA; weighted kappa; cognitive task analysis; qualitative; quantitative
Subject
SOCIAL SCIENCES, Education Studies
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.
Leave a public commentSend a private comment to the author(s)

