Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Low Inter-rater Reliability of a High Stakes Assessment of Teacher Candidates

Version 1 : Received: 14 August 2021 / Approved: 16 August 2021 / Online: 16 August 2021 (10:51:52 CEST)

A peer-reviewed article of this Preprint also exists.

Journal reference: Educ. Sci. 2021
DOI: 10.3390/educsci11100648


The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen’s weighted kappa, the overall IRR estimate was .17 (poor strength of agreement). IRR estimates ranged from -.29 (worse than expected by chance) to .54 (moderate strength of agreement); all were below the standard of .70 for consensus agreement. Follow up interviews of 10 evaluators revealed possible reasons we observed low IRR, such as departures from established PACT scoring protocol, and lack of, or inconsistent, use of a scoring aid document. Evaluators reported difficulties scoring the materials that candidates submitted, particularly the use of Academic Language. Cognitive Task Analysis (CTA) is suggested as a method to improve IRR in the PACT and other teacher performance assessments such as the edTPA.


Inter-rater reliability; preservice teacher performance assessment; PACT; edTPA; weighted kappa; cognitive task analysis; qualitative; quantitative


SOCIAL SCIENCES, Education Studies

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.