Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 10 |
| Since 2007 (last 20 years) | 24 |
Descriptor
| Interrater Reliability | 46 |
| Scoring | 46 |
| Test Validity | 46 |
| Test Reliability | 30 |
| Test Construction | 22 |
| Language Tests | 12 |
| Evaluation Methods | 11 |
| Psychometrics | 11 |
| Test Items | 11 |
| Student Evaluation | 10 |
| Correlation | 9 |
| More ▼ | |
Source
Author
| Anna-Maria Fall | 2 |
| Bejar, Isaac I. | 2 |
| Beula M. Magimairaj | 2 |
| Brydges, Ryan | 2 |
| Greg Roberts | 2 |
| Philip Capin | 2 |
| Ronald B. Gillam | 2 |
| Sandra L. Gillam | 2 |
| Sharon Vaughn | 2 |
| Alderson, J. Charles | 1 |
| Alverez de Santizo, Myrna… | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 3 |
| Practitioners | 2 |
| Administrators | 1 |
| Teachers | 1 |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
| Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Lynsey Joohyun Lee – ProQuest LLC, 2021
Reliability and validity are two important topics that have been studied for many decades in the educational measurement field, including discussions of Writing Studies' subfield of writing assessment, since the establishment of the College Entrance Exam Board [CEEB] in 1899 (Huot et al., 2010). In recent years, scholarly conversations of fairness…
Descriptors: Writing Evaluation, Test Validity, Test Reliability, Case Studies
Safak, Pinar; Cakmak, Salih; Karakoc, Tamer; Aydin O'Dwyer, Pinar – European Journal of Educational Research, 2021
This study aimed to develop a valid and reliable instrument that measures the functional vision of students with low vision. Thus, an assessment tool and performance activities were developed for three vision skill groups (near vision skills, distance vision skills, and visual field) that include functional vision skills. The universe was 1485…
Descriptors: Foreign Countries, Vision Tests, Diagnostic Tests, Vision
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Tavares, Walter; Brydges, Ryan; Myre, Paul; Prpic, Jason; Turner, Linda; Yelle, Richard; Huiskamp, Maud – Advances in Health Sciences Education, 2018
Assessment of clinical competence is complex and inference based. Trustworthy and defensible assessment processes must have favourable evidence of validity, particularly where decisions are considered high stakes. We aimed to organize, collect and interpret validity evidence for a high stakes simulation based assessment strategy for certifying…
Descriptors: Competence, Simulation, Allied Health Personnel, Certification
Davis, Larry; Norris, John – ETS Research Report Series, 2021
The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The "TOEFL® Essentials"™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the…
Descriptors: Task Analysis, Language Proficiency, Speech Communication, Language Tests
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Maxwell, Bruce; Boon, Helen; Tanchuk, Nicolas; Rauwerda, Bryan – Journal of Moral Education, 2021
This article documents the adaptation, piloting and validation of a measure of teachers' ethical sensitivity. To create the test, we modified a measure from dentistry drawing on literature in teacher professional ethics and drew on the expertise of professional ethics scholars and practitioners. Based on the results of Rasch analysis combined with…
Descriptors: Ethics, Moral Values, Scores, Teacher Education Programs
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Hatala, Rose; Cook, David A.; Brydges, Ryan; Hawkins, Richard – Advances in Health Sciences Education, 2015
In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane's framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected…
Descriptors: Measures (Individuals), Test Validity, Surgery, Skills
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014
Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…
Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests
Greenberg, Kathleen Puglisi – Teaching of Psychology, 2012
The scoring instrument described in this article is based on a deconstruction of the seven sections of an American Psychological Association (APA)-style empirical research report into a set of learning outcomes divided into content-, expression-, and format-related categories. A double-weighting scheme used to score the report yields a final grade…
Descriptors: Scoring, Research Reports, Grading, Outcome Measures
Dietel, Ron – Phi Delta Kappan, 2011
Two tests intended to measure student achievement of the Common Core State Standards will face intense scrutiny, but the test makers say they will include performance assessments and other items that are not multiple-choice questions. Incorporating performance items on this tests will bring up issues over scoring, costs, and validity.
Descriptors: Student Evaluation, State Standards, Test Construction, Intellectual Property
Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013
Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…
Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis

Direct link
Peer reviewed
