ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	32

Descriptor

Evaluation Problems	32
Testing Problems	32
Evaluation Methods	21
Evaluation Research	14
Measurement	14
Psychometrics	13
Test Validity	13
Educational Assessment	12
Test Construction	12
Teacher Evaluation	11
Educational Testing	10
Knowledge Base for Teaching	10
Mathematics Education	9
Mathematics Instruction	9
Measurement Techniques	9
Pedagogical Content Knowledge	9
Item Response Theory	6
Test Reliability	6
Foreign Countries	5
Student Evaluation	5
Test Items	5
Educational Policy	4
Scores	4
Simulation	4
Teacher Competency Testing	4
More ▼

Publication Type

Journal Articles	29
Reports - Research	12
Opinion Papers	9
Reports - Evaluative	8
Reports - Descriptive	2
Dissertations/Theses -…	1
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	14
Higher Education	4
Elementary Education	3
Postsecondary Education	3
Adult Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Junior High Schools	1
Secondary Education	1

Audience

Location

Arizona	1
California	1
Canada	1
Colorado	1
Delaware	1
Denmark	1
Germany	1
Ghana	1
Idaho	1
Illinois	1
Indiana	1
Japan	1
Kansas	1
Maine	1
Maryland	1
Massachusetts	1
Michigan	1
Minnesota	1
Montana	1
Nevada	1
New Hampshire	1
New Jersey	1
New Mexico	1
North Dakota	1
Ohio	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Advanced Placement…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 32 results Save | Export

Detecting Test Flakiness without Rerunning Tests

Direct link

Abdulrahman Alshammari – ProQuest LLC, 2024

A critical component of modern software development practices, particularly continuous integration (CI), is the halt of development activities in response to test failures which requires further investigation and debugging. As software changes, regression testing becomes vital to verify that new code does not affect existing functionality.…

Descriptors: Computer Software, Programming, Coding, Test Reliability

Rethinking Online Assessment Quality from Pre-Service Teachers Perspectives

Peer reviewed
PDF on ERIC

Download full text

Mücahit Öztürk – Open Praxis, 2024

This study examined the problems that pre-service teachers face in the online assessment process and their suggestions for solutions to these problems. The participants were 136 pre-service teachers who have been experiencing online assessment for a long time and who took the Foundations of Open and Distance Learning course. This research is a…

Descriptors: Foreign Countries, Preservice Teacher Education, Preservice Teachers, Distance Education

Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Peer reviewed

Direct link

Sinharay, Sandip – Applied Measurement in Education, 2017

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis

Challenges and Strategies for Assessing Specialised Knowledge for Teaching

Peer reviewed
PDF on ERIC

Download full text

Orrill, Chandra Hawley; Kim, Ok-Kyeong; Peters, Susan A.; Lischka, Alyson E.; Jong, Cindy; Sanchez, Wendy B.; Eli, Jennifer A. – Mathematics Teacher Education and Development, 2015

Developing and writing assessment items that measure teachers' knowledge is an intricate and complex undertaking. In this paper, we begin with an overview of what is known about measuring teacher knowledge. We then highlight the challenges inherent in creating assessment items that focus specifically on measuring teachers' specialised knowledge…

Descriptors: Specialization, Knowledge Base for Teaching, Educational Strategies, Testing Problems

Social Epistemology and the Pragmatics of Assessment

Peer reviewed

Direct link

Gergen, Kenneth J.; Dixon-Román, Ezekiel J. – Teachers College Record, 2014

In the present offering we challenge the presumption that the educational testing of students provides objective information about such students. This presumption largely rests on an empiricist account of science. In light of mounting criticism, however, empiricist foundationalism has given way to a social epistemology. From this standpoint,…

Descriptors: Epistemology, Educational Testing, Test Validity, Evaluation Utilization

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

The Leading Group Effect: Illusionary Declines in Scholastic Standard Scores of Mid-Range Japanese Junior High School Pupils

Peer reviewed

Direct link

Mori, Kazuo; Uchida, Akitoshi – Research in Education, 2012

Longitudinal change in the average Z scores for four groups of pupils sorted by quartiles was examined for its stability over three years. The data, collected from 1998 to 2009, was obtained from nine cohorts of Japanese junior high school pupils totaling 1,962 subjects. It showed illusionary declines among the mid-range pupils but improvements…

Descriptors: Foreign Countries, Junior High School Students, Cohort Analysis, Evaluation Problems

Limits on the Accuracy of Linking. Research Report. ETS RR-10-22

Download full text

Haberman, Shelby J. – Educational Testing Service, 2010

Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…

Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy

The Applicability of Multidimensional Computerized Adaptive Testing for Cognitive Ability Measurement in Organizational Assessment

Peer reviewed

Direct link

Makransky, Guido; Glas, Cees A. W. – International Journal of Testing, 2013

Cognitive ability tests are widely used in organizations around the world because they have high predictive validity in selection contexts. Although these tests typically measure several subdomains, testing is usually carried out for a single subdomain at a time. This can be ineffective when the subdomains assessed are highly correlated. This…

Descriptors: Foreign Countries, Cognitive Ability, Adaptive Testing, Feedback (Response)

An NCME Instructional Module on Using Differential Step Functioning to Refine the Analysis of DIF in Polytomous Items

Peer reviewed

Direct link

Penfield, Randall D.; Gattamorta, Karina; Childs, Ruth A. – Educational Measurement: Issues and Practice, 2009

Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item-level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has…

Descriptors: Test Bias, Test Items, Evaluation Methods, Scores

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Impact of Diagnosticity on the Adequacy of Models for Cognitive Diagnosis under a Linear Attribute Structure: A Simulation Study

Peer reviewed

Direct link

de La Torre, Jimmy; Karelitz, Tzur M. – Journal of Educational Measurement, 2009

Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM-based data with a linear attribute structure. The study utilizes a procedure…

Descriptors: Simulation, Item Response Theory, Psychometrics, Evaluation Methods

Test Industry Split over "Formative" Assessment

Direct link

Cech, Scott J. – Education Week, 2008

There's a war of sorts going on within the normally staid assessment industry, and it's a war over the definition of a type of assessment that many educators understand in only a sketchy fashion. Formative assessments, also known as "classroom assessments," are in some ways easier to define by what they are not. They're not like the long,…

Descriptors: Formative Evaluation, Testing, Evaluation Problems, Testing Problems

Assessment Design and Cheating Risk in Online Instruction

Peer reviewed

Direct link

Harmon, Oskar R.; Lambrinos, James; Buffolino, Judy – Online Journal of Distance Learning Administration, 2010

Many consider online courses to be an inferior alternative to traditional face-to-face (f2f) courses because exam cheating is thought to occur more often in online courses. This study examines how the assessment design in online courses contributes to this perception. Following a literature review, the assessment design in a sample of online…

Descriptors: Electronic Learning, Student Attitudes, Cheating, Online Courses

Playing with the Stakes: A Consideration of an Aspect of the Social Context of a Gatekeeping Writing Assessment

Peer reviewed

Direct link

Baker, Beverly A. – Assessing Writing, 2010

In high-stakes writing assessments, rater training in the use of a rating scale does not eliminate variability in grade attribution. This realisation has been accompanied by research that explores possible sources of rater variability, such as rater background or rating scale type. However, there has been little consideration thus far of…

Descriptors: Foreign Countries, Writing Evaluation, Writing Tests, Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3

Measurement:…	9
Journal of Educational…	4
Applied Measurement in…	2
American Educational Research…	1
Assessing Writing	1
Education Week	1
Educational Measurement:…	1
Educational Research and…	1
Educational Testing Service	1
Intelligence	1
International Journal of…	1
International Journal on…	1
Mathematics Teacher Education…	1
Online Journal of Distance…	1
Open Praxis	1
ProQuest LLC	1
Research in Education	1
Teachers College Record	1
Thomas B. Fordham Institute	1
Turkish Online Journal of…	1
More ▼

Abdulrahman Alshammari	1
Adkins, Deborah	1
Alonzo, Alicia C.	1
Baker, Beverly A.	1
Baldwin, Su G.	1
Buffolino, Judy	1
Camilli, Gregory	1
Cech, Scott J.	1
Charles, Jennifer E.	1
Childs, Ruth A.	1
Clauser, Brian E.	1
Cronin, John	1
Cui, Ying	1
Dahlin, Michael	1
DiBello, Lou	1
Dillon, Gerard F.	1
Dixon-Román, Ezekiel J.	1
Eli, Jennifer A.	1
Engelhard, George, Jr.	1
Ferrara, Steve	1
Gattamorta, Karina	1
Gearhart, Maryl	1
Gergen, Kenneth J.	1
Glas, Cees A. W.	1
Haberman, Shelby J.	1
More ▼