ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	20

Descriptor

Comparative Analysis	42
Educational Assessment	42
Reliability	18
Test Reliability	16
Scores	14
Foreign Countries	12
Evaluation Methods	11
Student Evaluation	11
Test Validity	10
Interrater Reliability	9
Validity	9
Correlation	8
Elementary Secondary Education	7
Mathematics Tests	7
Performance Based Assessment	7
Academic Achievement	6
Elementary Education	6
Test Construction	6
Elementary School Students	5
Evaluators	5
High Stakes Tests	5
Scoring	5
Test Items	5
Test Use	5
Testing Programs	5
More ▼

Publication Type

Journal Articles	23
Reports - Research	17
Reports - Evaluative	14
Speeches/Meeting Papers	6
Dissertations/Theses -…	4
Tests/Questionnaires	3
Opinion Papers	2
Reports - Descriptive	2
Books	1
Collected Works - General	1
Collected Works - Proceedings	1
Information Analyses	1
Numerical/Quantitative Data	1
More ▼

Education Level

Elementary Education	7
Higher Education	6
Postsecondary Education	6
Elementary Secondary Education	5
Grade 4	4
Grade 8	3
Intermediate Grades	3
Middle Schools	3
Grade 5	2
Grade 6	2
Junior High Schools	2
Secondary Education	2
Early Childhood Education	1
Grade 2	1
Grade 3	1
Grade 7	1
Primary Education	1
More ▼

Audience

Researchers

Location

United Kingdom	3
United States	3
Canada	2
Netherlands	2
Portugal	2
Taiwan	2
United Kingdom (England)	2
Asia	1
Australia	1
Brazil	1
California	1
China	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
Ireland	1
Israel	1
Italy	1
Japan	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Program for International…	3
National Assessment of…	2
Childrens Depression Inventory	1
College Student Experiences…	1
International Association for…	1
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 42 results Save | Export

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Distractor Analysis for Multiple-Choice Tests: An Empirical Study with International Language Assessment Data. Research Report. ETS RR-19-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019

Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…

Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests

Reliability and Validity of International Large-Scale Assessment: Understanding IEA's Comparative Studies of Student Achievement. IEA Research for Education. Volume 10

Download full text

Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020

Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…

Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis

Is It Really Possible to Test All Educationally Significant Achievements with High Levels of Reliability?

Peer reviewed

Direct link

Davis, Andrew – Ethics and Education, 2015

PISA claims that it can extend its reach from its current core subjects of Reading, Science, Maths and problem-solving. Yet given the requirement for high levels of reliability for PISA, especially in the light of its current high stakes character, proposed widening of its subject coverage cannot embrace some important aspects of the social and…

Descriptors: International Assessment, High Stakes Tests, Reliability, Academic Achievement

Between-Person and Within-Person Subscore Reliability: Comparison of Unidimensional and Multidimensional IRT Models

Direct link

Bulut, Okan – ProQuest LLC, 2013

The importance of subscores in educational and psychological assessments is undeniable. Subscores yield diagnostic information that can be used for determining how each examinee's abilities/skills vary over different content domains. One of the most common criticisms about reporting and using subscores is insufficient reliability of subscores.…

Descriptors: Item Response Theory, Simulation, Correlation, Reliability

Comparative Judgement for Assessment

Peer reviewed

Direct link

Pollitt, Alastair – International Journal of Technology and Design Education, 2012

Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution…

Descriptors: Holistic Evaluation, Educational Assessment, Evaluation Methods, Educational History

A Longitudinal Study on State Mathematics and Reading Assessments: Comparisons of Growth Models on Students' Achievement Scores

Direct link

Chiu, Pui Chi – ProQuest LLC, 2012

This study examines student growth on mathematics and reading assessments across academic years (Spring 2006 through Spring 2009) using three different growth models: hierarchical linear model (HLM), value-added model (VAM), and student growth percentile model (SGP). Comparisons across these three growth models were conducted to investigate the…

Descriptors: Longitudinal Studies, Mathematics Tests, Reading Tests, Educational Assessment

Comparison of Taiwanese and Canadian Students' Metacognitive Awareness of Science Reading, Text, and Strategies

Peer reviewed

Direct link

Wang, Jing-Ru; Chen, Shin-Feng; Fang, I.; Chou, Ching-Ting – International Journal of Science Education, 2014

This study used a Chinese-language version of the Index of Science Reading Awareness to explore the science reading metacognition and comprehension of Taiwanese students. Structural equation modelling results confirmed the underlying model comprised three clusters of metacognitive knowledge: beliefs and confidence in science reading, knowledge of…

Descriptors: Foreign Countries, Metacognition, Reading Comprehension, Science Instruction

Comparing Yes/No Angoff and Bookmark Standard Setting Methods in the Context of English Assessment

Peer reviewed

Direct link

Hsieh, Mingchuan – Language Assessment Quarterly, 2013

The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…

Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests

A Study Comparing the Results of a School System's Quarterly Assessments and the State High-Stakes Test in Eighth-Grade Mathematics

Direct link

Oliver, Linda – ProQuest LLC, 2013

Accountability for student achievement is required by legislation and demanded by the public. Testing is the method of choice for determining student achievement and for informing teachers, parents and students about what students know and still need to learn. State tests that meet the demands of federal legislation have far-reaching consequences…

Descriptors: Educational Assessment, High Stakes Tests, Mathematics Tests, Grade 8

Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments

Peer reviewed

Direct link

Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D. – International Journal of Testing, 2012

In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…

Descriptors: Foreign Students, Test Bias, Speech Communication, Effect Size

A Comparison of Value-Added, Ordinary Least Squares Regression, and the California STAR Accountability Indicators

Direct link

Black, Aime – ProQuest LLC, 2012

Student achievement to reward or sanction schools. These unadjusted accountability indicators do not account for differences in student or school characteristics that contribute to variations in assessment results. Since the "Coleman Report" (1966), a guiding principle in accountability design has been that educational outcomes data…

Descriptors: Least Squares Statistics, Regression (Statistics), Accountability, Educational Indicators

Score Comparability for Language Minority Students on the Content Assessments Used by Two States. Research Report. ETS RR-11-27

Download full text

Young, John W.; Holtzman, Steven; Steinberg, Jonathan – Educational Testing Service, 2011

In this research investigation of score comparability for language minority students (English language learners [ELLs] and former English language learners), we examined 3 indicators of score comparability (reliability, internal test structure, and differential item functioning) for 4th and 8th grade students who took the NCLB-mandated content…

Descriptors: Language Minorities, Second Language Learning, Grade 8, Minority Group Students

Development of a "Universal" Rubric for Assessing Undergraduates' Scientific Reasoning Skills Using Scientific Writing

Peer reviewed

Direct link

Timmerman, Briana E. Crotwell; Strickland, Denise C.; Johnson, Robert L.; Payne, John R. – Assessment & Evaluation in Higher Education, 2011

We developed a rubric for measuring students' ability to reason and write scientifically. The Rubric for Science Writing (Rubric) was tested in a variety of undergraduate biology laboratory courses (total n = 142 laboratory reports) using science graduate students (teaching assistants) as raters. Generalisability analysis indicates that the Rubric…

Descriptors: Graduate Students, Science Laboratories, Biology, Writing Skills

Previous Page | Next Page »

Pages: 1 | 2 | 3

ProQuest LLC	4
Assessment & Evaluation in…	2
International Journal of…	2
Research in the Teaching of…	2
American Journal of Education	1
Applied Measurement in…	1
Applied Psychological…	1
College Student Experiences…	1
ETS Research Report Series	1
Educational Measurement:…	1
Educational Research	1
Educational Testing Service	1
Educational and Psychological…	1
Ethics and Education	1
International Association for…	1
International Association for…	1
International Journal of…	1
International Journal of…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Journal of Educational and…	1
Language Assessment Quarterly	1
Psychological Assessment	1
Research in Higher Education	1
Studies in Educational…	1
More ▼

Allan S. Cohen	1
Anderson, Ronald E.	1
Banta, Trudy W.	1
Black, Aime	1
Bulut, Okan	1
Chan, David W.	1
Chen, Shin-Feng	1
Chiu, Pui Chi	1
Chou, Ching-Ting	1
Collier, Chris	1
Crowley, Susan L.	1
Davies, Dan	1
Davis, Andrew	1
Ercikan, Kadriye	1
Fang, I.	1
Ferrao, Maria	1
Gearhart, Maryl	1
Gonyea, Robert M.	1
Gredler, Margaret E.	1
Guangtian Zhu	1
Guskey, Thomas R.	1
Haberman, Shelby J.	1
Hacker, Jacob	1
Hathaway, Walter	1
More ▼