Low Inter-rater Reliability of a High Stakes Assessment of Teacher Candidates

Scott Lyness; Kent Peterson; Kenneth Yates

doi:10.20944/preprints202108.0318.v1

Submitted:

14 August 2021

Posted:

16 August 2021

You are already at the latest version

Abstract

The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen’s weighted kappa, the overall IRR estimate was .17 (poor strength of agreement). IRR estimates ranged from -.29 (worse than expected by chance) to .54 (moderate strength of agreement); all were below the standard of .70 for consensus agreement. Follow up interviews of 10 evaluators revealed possible reasons we observed low IRR, such as departures from established PACT scoring protocol, and lack of, or inconsistent, use of a scoring aid document. Evaluators reported difficulties scoring the materials that candidates submitted, particularly the use of Academic Language. Cognitive Task Analysis (CTA) is suggested as a method to improve IRR in the PACT and other teacher performance assessments such as the edTPA.

Keywords:

Inter-rater reliability

;

preservice teacher performance assessment

;

PACT

;

edTPA

;

weighted kappa

;

cognitive task analysis

;

qualitative

;

quantitative

Subject:

Social Sciences - Education

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Low Inter-rater Reliability of a High Stakes Assessment of Teacher Candidates

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe