Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 4 |
| Since 2007 (last 20 years) | 20 |
Descriptor
| Comparative Analysis | 42 |
| Educational Assessment | 42 |
| Reliability | 18 |
| Test Reliability | 16 |
| Scores | 14 |
| Foreign Countries | 12 |
| Evaluation Methods | 11 |
| Student Evaluation | 11 |
| Test Validity | 10 |
| Interrater Reliability | 9 |
| Validity | 9 |
| More ▼ | |
Source
Author
| Allan S. Cohen | 1 |
| Anderson, Ronald E. | 1 |
| Banta, Trudy W. | 1 |
| Black, Aime | 1 |
| Bulut, Okan | 1 |
| Chan, David W. | 1 |
| Chen, Shin-Feng | 1 |
| Chiu, Pui Chi | 1 |
| Chou, Ching-Ting | 1 |
| Collier, Chris | 1 |
| Crowley, Susan L. | 1 |
| More ▼ | |
Publication Type
Education Level
| Elementary Education | 7 |
| Higher Education | 6 |
| Postsecondary Education | 6 |
| Elementary Secondary Education | 5 |
| Grade 4 | 4 |
| Grade 8 | 3 |
| Intermediate Grades | 3 |
| Middle Schools | 3 |
| Grade 5 | 2 |
| Grade 6 | 2 |
| Junior High Schools | 2 |
| More ▼ | |
Audience
| Researchers | 1 |
Location
| United Kingdom | 3 |
| United States | 3 |
| Canada | 2 |
| Netherlands | 2 |
| Portugal | 2 |
| Taiwan | 2 |
| United Kingdom (England) | 2 |
| Asia | 1 |
| Australia | 1 |
| Brazil | 1 |
| California | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
| Program for International… | 3 |
| National Assessment of… | 2 |
| Childrens Depression Inventory | 1 |
| College Student Experiences… | 1 |
| International Association for… | 1 |
| Progress in International… | 1 |
| Trends in International… | 1 |
What Works Clearinghouse Rating
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019
Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…
Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis
Davis, Andrew – Ethics and Education, 2015
PISA claims that it can extend its reach from its current core subjects of Reading, Science, Maths and problem-solving. Yet given the requirement for high levels of reliability for PISA, especially in the light of its current high stakes character, proposed widening of its subject coverage cannot embrace some important aspects of the social and…
Descriptors: International Assessment, High Stakes Tests, Reliability, Academic Achievement
Bulut, Okan – ProQuest LLC, 2013
The importance of subscores in educational and psychological assessments is undeniable. Subscores yield diagnostic information that can be used for determining how each examinee's abilities/skills vary over different content domains. One of the most common criticisms about reporting and using subscores is insufficient reliability of subscores.…
Descriptors: Item Response Theory, Simulation, Correlation, Reliability
Pollitt, Alastair – International Journal of Technology and Design Education, 2012
Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution…
Descriptors: Holistic Evaluation, Educational Assessment, Evaluation Methods, Educational History
Chiu, Pui Chi – ProQuest LLC, 2012
This study examines student growth on mathematics and reading assessments across academic years (Spring 2006 through Spring 2009) using three different growth models: hierarchical linear model (HLM), value-added model (VAM), and student growth percentile model (SGP). Comparisons across these three growth models were conducted to investigate the…
Descriptors: Longitudinal Studies, Mathematics Tests, Reading Tests, Educational Assessment
Wang, Jing-Ru; Chen, Shin-Feng; Fang, I.; Chou, Ching-Ting – International Journal of Science Education, 2014
This study used a Chinese-language version of the Index of Science Reading Awareness to explore the science reading metacognition and comprehension of Taiwanese students. Structural equation modelling results confirmed the underlying model comprised three clusters of metacognitive knowledge: beliefs and confidence in science reading, knowledge of…
Descriptors: Foreign Countries, Metacognition, Reading Comprehension, Science Instruction
Hsieh, Mingchuan – Language Assessment Quarterly, 2013
The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…
Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests
Oliver, Linda – ProQuest LLC, 2013
Accountability for student achievement is required by legislation and demanded by the public. Testing is the method of choice for determining student achievement and for informing teachers, parents and students about what students know and still need to learn. State tests that meet the demands of federal legislation have far-reaching consequences…
Descriptors: Educational Assessment, High Stakes Tests, Mathematics Tests, Grade 8
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D. – International Journal of Testing, 2012
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Descriptors: Foreign Students, Test Bias, Speech Communication, Effect Size
Black, Aime – ProQuest LLC, 2012
Student achievement to reward or sanction schools. These unadjusted accountability indicators do not account for differences in student or school characteristics that contribute to variations in assessment results. Since the "Coleman Report" (1966), a guiding principle in accountability design has been that educational outcomes data…
Descriptors: Least Squares Statistics, Regression (Statistics), Accountability, Educational Indicators
Young, John W.; Holtzman, Steven; Steinberg, Jonathan – Educational Testing Service, 2011
In this research investigation of score comparability for language minority students (English language learners [ELLs] and former English language learners), we examined 3 indicators of score comparability (reliability, internal test structure, and differential item functioning) for 4th and 8th grade students who took the NCLB-mandated content…
Descriptors: Language Minorities, Second Language Learning, Grade 8, Minority Group Students
Timmerman, Briana E. Crotwell; Strickland, Denise C.; Johnson, Robert L.; Payne, John R. – Assessment & Evaluation in Higher Education, 2011
We developed a rubric for measuring students' ability to reason and write scientifically. The Rubric for Science Writing (Rubric) was tested in a variety of undergraduate biology laboratory courses (total n = 142 laboratory reports) using science graduate students (teaching assistants) as raters. Generalisability analysis indicates that the Rubric…
Descriptors: Graduate Students, Science Laboratories, Biology, Writing Skills

Peer reviewed
Direct link
