ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	2

Descriptor

Interrater Reliability	6
Probability	6
Scoring	6
Comparative Analysis	2
Cutting Scores	2
Estimation (Mathematics)	2
Item Response Theory	2
Judges	2
Standard Setting (Scoring)	2
Test Items	2
Ability	1
Analysis of Variance	1
Audio Equipment	1
Benchmarking	1
Bilingualism	1
Cognitive Mapping	1
Competence	1
Computation	1
Computer Assisted Testing	1
Computer Software	1
Content Analysis	1
Criterion Referenced Tests	1
Cues	1
Databases	1
Difficulty Level	1
More ▼

Source

Applied Psychological…	1
ETS Research Report Series	1
Educational and Psychological…	1
Journal of Speech, Language,…	1

Author

Alt, Mary	1
Blackman, Nicole J-M.	1
Donoghue, John R.	1
Fehrmann, Melinda L.	1
Figueroa, Cecilia	1
Hess, Melinda R.	1
Koval, John J.	1
Lunz, Mary E.	1
McClellan, Catherine A.	1
Meyers, Christina	1
O'Neill, Thomas R.	1
Rock, D. A.	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	4
Reports - Evaluative	2
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Preschool Education	1

Audience

Location

Arizona

Laws, Policies, & Programs

Assessments and Surveys

Expressive One Word Picture…	1
Mean Length of Utterance	1
Peabody Picture Vocabulary…	1

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Factors that Influence Fast Mapping in Children Exposed to Spanish and English

Peer reviewed

Direct link

Alt, Mary; Meyers, Christina; Figueroa, Cecilia – Journal of Speech, Language, and Hearing Research, 2013

Purpose: The purpose of this study was to determine whether children exposed to 2 languages would benefit from the phonotactic probability cues of a single language in the same way as monolingual peers and to determine whether crosslinguistic influence would be present in a fast-mapping task. Method: Two groups of typically developing children…

Descriptors: Regression (Statistics), Spanish, Cues, Task Analysis

Estimating Rater Agreement in 2 x 2 Tables: Correction for Chance and Intraclass Correlation.

Peer reviewed

Blackman, Nicole J-M.; Koval, John J. – Applied Psychological Measurement, 1993

Four indexes of agreement between ratings of a person that correct for chance and are interpretable as intraclass correlation coefficients for different analysis of variance models are investigated. Relationships among the estimators are established for finite samples, and the equivalence of these estimators in large samples is demonstrated. (SLD)

Descriptors: Analysis of Variance, Equations (Mathematics), Estimation (Mathematics), Interrater Reliability

The Angoff Cutoff Score Method: The Impact of Frame-of-Reference Rater Training.

Peer reviewed

Fehrmann, Melinda L.; And Others – Educational and Psychological Measurement, 1991

Two frame-of-reference rater training approaches were compared for effects on reliability and accuracy of cutoff scores generated by 21 raters using Angoff methods on tests taken by 155 undergraduates. Both approaches result in higher interrater reliability and more accuracy than does a non-frame-of-reference method. (SLD)

Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Higher Education

Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1996

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…

Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level

An Empirical Comparison of Judgmental Approaches to Standard Setting Procedures.

Download full text

Rock, D. A.; And Others – 1980

An experiment was designed that varied cutting score procedures, instructions, and types of judges in order to address the following questions concerning the Real Estate Licensing Examination: (1) Will the cutting score levels produced by groups of judges from differing backgrounds (academicians vs. practitioners vs. lawyers) using the same method…

Descriptors: Competence, Content Analysis, Criterion Referenced Tests, Cutting Scores