Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 22 |
| Since 2017 (last 10 years) | 59 |
| Since 2007 (last 20 years) | 160 |
Descriptor
| Scoring | 247 |
| Validity | 247 |
| Reliability | 90 |
| Evaluation Methods | 47 |
| Comparative Analysis | 40 |
| Scores | 37 |
| Writing Evaluation | 37 |
| Correlation | 36 |
| Student Evaluation | 35 |
| Computer Assisted Testing | 34 |
| Foreign Countries | 32 |
| More ▼ | |
Source
Author
| Williamson, David M. | 7 |
| Bejar, Isaac I. | 5 |
| Attali, Yigal | 4 |
| Forthmann, Boris | 3 |
| Jaeger, Richard M. | 3 |
| Mercer, Sterett H. | 3 |
| Ramineni, Chaitanya | 3 |
| Bauer, Malcolm I. | 2 |
| Bell, Courtney A. | 2 |
| Borko, Hilda | 2 |
| Breyer, F. Jay | 2 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 35 |
| Postsecondary Education | 28 |
| Elementary Education | 22 |
| Secondary Education | 21 |
| Elementary Secondary Education | 15 |
| Middle Schools | 13 |
| High Schools | 9 |
| Junior High Schools | 8 |
| Grade 8 | 7 |
| Grade 4 | 6 |
| Grade 6 | 6 |
| More ▼ | |
Location
| California | 7 |
| United States | 5 |
| Australia | 4 |
| China | 4 |
| Kentucky | 4 |
| New York | 4 |
| Turkey | 4 |
| United Kingdom (England) | 4 |
| Canada | 3 |
| Japan | 3 |
| Colorado | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Every Student Succeeds Act… | 3 |
| No Child Left Behind Act 2001 | 3 |
| Elementary and Secondary… | 1 |
| Elementary and Secondary… | 1 |
| Kentucky Education Reform Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Matt Homer – Advances in Health Sciences Education, 2024
Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In…
Descriptors: Examiners, Scoring, Validity, Cutting Scores
Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022
In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…
Descriptors: Artificial Intelligence, Scoring, Essays, Automation
Beisemann, Marie; Forthmann, Boris; Bürkner, Paul-Christian; Holling, Heinz – Journal of Creative Behavior, 2020
The Remote Associates Test (RAT; Mednick, 1962; Mednick & Mednick, 1967) is a commonly employed test of creative convergent thinking. The RAT is scored with a dichotomous scoring, scoring correct answers as 1 and all other answers as 0. Based on recent research into the information processing underlying RAT performance, we argued that the…
Descriptors: Psychometrics, Scoring, Tests, Semantics
Wise, Steven; Kuhfeld, Megan – Applied Measurement in Education, 2021
Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not…
Descriptors: Scoring, Achievement Tests, Identification, Validity
Shermis, Mark D. – Journal of Educational Measurement, 2022
One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…
Descriptors: Scoring, Essays, Validity, Writing Evaluation
Conti, Gary J. – Journal of Education and Learning, 2023
The use of personality inventories has been limited because of their cost and the length. To overcome these limitations, this study created the Personality Identity Estimator (PIE), an easy-to-use inventory to estimate personality types that can be used at no cost. PIE is a categorical inventory containing 12 items with 3 items for each of the 4…
Descriptors: Personality Measures, Personality Traits, Validity, Reliability
Barry O'Sullivan – Language Assessment Quarterly, 2023
This paper highlights as issues of concern the rapid changes in technology and the tendency to report on partial validation efforts where the work is not identified as forming part of a larger validation project. With close human supervision emerging technologies can have a significant and positive impact on language testing. While technology…
Descriptors: Technology Uses in Education, Computer Assisted Testing, Language Tests, Supervision
Alyson Burnett; Katlyn Lee Milless; Michelle Bennett; Whitney Kozakowski; Sonia Alves; Christine Ross – Regional Educational Laboratory Mid-Atlantic, 2024
This study analyzed Pennsylvania School Climate Survey data from students and staff in the 2021/22 school year to assess the validity and reliability of the elementary school student version of the survey; approaches to scoring the survey in individual schools at all grade levels; and perceptions of school climate across student, staff, and school…
Descriptors: Educational Environment, Decision Making, Surveys, Validity
Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021
Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…
Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring
Rodgers, Wendy J.; Morris-Mathews, Hannah; Romig, John Elwood; Bettini, Elizabeth – Review of Educational Research, 2022
Classroom observation research plays an important role in policy, practice, and scholarship for students with disabilities. When interpreting results of observation studies, it is important to consider the validity evidence provided by researchers and how that speaks to the intended use of those results. In this literature synthesis, we used…
Descriptors: Special Education, Validity, Classroom Research, Students with Disabilities
Davies, Ben; Alcock, Lara; Jones, Ian – Educational Studies in Mathematics, 2020
Proof is central to mathematics and has drawn substantial attention from the mathematics education community. Yet, valid and reliable measures of proof comprehension remain rare. In this article, we present a study investigating proof comprehension via students' summaries of a given proof. These summaries were evaluated by expert judges making…
Descriptors: Mathematical Logic, Mathematics Skills, Comprehension, Reliability
Paul Deane; Duanli Yan; Katherine Castellano; Yigal Attali; Michelle Lamar; Mo Zhang; Ian Blood; James V. Bruno; Chen Li; Wenju Cui; Chunyi Ruan; Colleen Appel; Kofi James; Rodolfo Long; Farah Qureshi – ETS Research Report Series, 2024
This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a…
Descriptors: Writing (Composition), Essays, Models, Elementary School Students
Roduta Roberts, Mary; Gotch, Chad M.; Cook, Megan; Werther, Karin; Chao, Iris C. I. – Measurement: Interdisciplinary Research and Perspectives, 2022
Performance-based assessment is a common approach to assess the development and acquisition of practice competencies among health professions students. Judgments related to the quality of performance are typically operationalized as ratings against success criteria specified within a rubric. The extent to which the rubric is understood,…
Descriptors: Protocol Analysis, Scoring Rubrics, Interviews, Performance Based Assessment
Curran, Patrick J.; Georgeson, A. R.; Bauer, Daniel J.; Hussong, Andrea M. – International Journal of Behavioral Development, 2021
Conducting valid and reliable empirical research in the prevention sciences is an inherently difficult and challenging task. Chief among these is the need to obtain numerical scores of underlying theoretical constructs for use in subsequent analysis. This challenge is further exacerbated by the increasingly common need to consider multiple…
Descriptors: Psychometrics, Scoring, Prevention, Scores
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022
In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity

Peer reviewed
Direct link
