Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 6 |
| Since 2007 (last 20 years) | 43 |
Descriptor
| Educational Testing | 114 |
| Statistical Analysis | 114 |
| Test Reliability | 27 |
| Test Construction | 26 |
| Achievement Tests | 22 |
| Academic Achievement | 20 |
| Scores | 20 |
| Standardized Tests | 19 |
| Evaluation Methods | 18 |
| Measurement Techniques | 16 |
| Reading Tests | 16 |
| More ▼ | |
Source
Author
| Alonzo, Julie | 4 |
| Irvin, P. Shawn | 4 |
| Lai, Cheng-Fei | 4 |
| Park, Bitnara Jasmine | 4 |
| Tindal, Gerald | 4 |
| Booker, Kevin | 2 |
| Bruch, Julie | 2 |
| Clark, Sara H. | 2 |
| Gill, Brian | 2 |
| Metsämuuronen, Jari | 2 |
| Zwick, Rebecca | 2 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 18 |
| Higher Education | 14 |
| Elementary Education | 10 |
| Postsecondary Education | 7 |
| High Schools | 4 |
| Grade 4 | 3 |
| Grade 5 | 3 |
| Grade 6 | 3 |
| Secondary Education | 3 |
| Adult Education | 2 |
| Grade 3 | 2 |
| More ▼ | |
Audience
| Researchers | 3 |
| Policymakers | 1 |
| Practitioners | 1 |
| Students | 1 |
Location
| United Kingdom | 4 |
| Ghana | 2 |
| Michigan | 2 |
| Minnesota (Minneapolis) | 2 |
| North Carolina | 2 |
| California | 1 |
| California (Fresno) | 1 |
| California (Stanford) | 1 |
| Canada | 1 |
| Colorado (Denver) | 1 |
| Florida | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 5 |
| Elementary and Secondary… | 1 |
| Stewart B McKinney Homeless… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
The Mantel-Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latentvariable approaches to assessing measurement invariance at the item level have been…
Descriptors: Test Bias, Educational Testing, Statistical Analysis, Item Response Theory
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Kelley's Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item-total correlation, DI can reach the ultimate values of +1 and -1, and it is stable against the outliers. Because of the computational easiness, DI is…
Descriptors: Test Items, Computation, Item Analysis, Nonparametric Statistics
Quaigrain, Kennedy; Arhin, Ato Kwamina – Cogent Education, 2017
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Descriptors: Item Analysis, Teacher Developed Materials, Test Reliability, Educational Assessment
Sinharay, Sandip – Grantee Submission, 2019
Benefiting from item preknowledge (e.g., McLeod, Lewis, & Thissen, 2003) is a major type of fraudulent behavior during educational assessments. This paper suggests a new statistic that can be used for detecting the examinees who may have benefitted from item preknowledge using their response times. The statistic quantifies the difference in…
Descriptors: Test Items, Cheating, Reaction Time, Identification
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Boyd, Donald; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – Journal of Educational and Behavioral Statistics, 2013
Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for…
Descriptors: Accountability, Educational Research, Educational Testing, Error of Measurement
Zwick, Rebecca; Ye, Lei; Isham, Steven – ETS Research Report Series, 2013
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. Although it is often assumed that refinement of the matching criterion always provides more accurate DIF results, the actual situation proves to be more complex. To explore the effectiveness of refinement, we…
Descriptors: Test Bias, Statistical Analysis, Simulation, Educational Testing
Lai, Cheng-Fei; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we present the results of a reliability study of the third-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Descriptors: Grade 3, Curriculum Based Assessment, Educational Testing, Testing Programs
Park, Bitnara Jasmine; Irvin, P. Shawn; Lai, Cheng-Fei; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we present the results of a reliability study of the fifth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Descriptors: Grade 5, Curriculum Based Assessment, Educational Testing, Testing Programs
Park, Bitnara Jasmine; Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we present the results of a reliability study of the fourth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Descriptors: Grade 4, Curriculum Based Assessment, Educational Testing, Testing Programs
Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Lai, Cheng-Fei; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we present the results of a reliability study of the sixth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Descriptors: Grade 6, Grade 3, Curriculum Based Assessment, Educational Testing
Liu, Hsin-min – ProQuest LLC, 2014
One of the fundamental problems in language testing is the lack of adequate generalizability between what a test is measuring and what fulfills the learners' real world language use needs. It is important to recognize that no matter how precise a test measures a construct, if the way that a construct is defined and the way that test tasks are…
Descriptors: Reading Tests, Language Tests, Task Analysis, Generalizability Theory
Han, Kyung T. – Practical Assessment, Research & Evaluation, 2012
For several decades, the "three-parameter logistic model" (3PLM) has been the dominant choice for practitioners in the field of educational measurement for modeling examinees' response data from multiple-choice (MC) items. Past studies, however, have pointed out that the c-parameter of 3PLM should not be interpreted as a guessing…
Descriptors: Statistical Analysis, Models, Multiple Choice Tests, Guessing (Tests)
Gill, Brian; Bruch, Julie; Booker, Kevin – Regional Educational Laboratory Mid-Atlantic, 2013
States are increasingly interested in including measures of student achievement growth, or "value-
added," in evaluating teachers. Annual state assessments, however, which are the typical measure of student
growth, usually cover only reading and math teachers and only in grades 4-8. These state assessments thus cannot
…
Descriptors: Teacher Evaluation, Teacher Competencies, Evaluation Methods, Educational Testing

Peer reviewed
Direct link
