Publication Date
In 2025 | 7 |
Since 2024 | 16 |
Since 2021 (last 5 years) | 45 |
Since 2016 (last 10 years) | 130 |
Since 2006 (last 20 years) | 399 |
Descriptor
Test Theory | 1165 |
Test Items | 262 |
Test Reliability | 252 |
Test Construction | 246 |
Test Validity | 245 |
Psychometrics | 182 |
Scores | 176 |
Item Response Theory | 167 |
Foreign Countries | 160 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Gerhard Tutz; Pascal Jordan – Journal of Educational and Behavioral Statistics, 2024
A general framework of latent trait item response models for continuous responses is given. In contrast to classical test theory (CTT) models, which traditionally distinguish between true scores and error scores, the responses are clearly linked to latent traits. It is shown that CTT models can be derived as special cases, but the model class is…
Descriptors: Item Response Theory, Responses, Scores, Models
Kentaro Fukushima; Nao Uchida; Kensuke Okada – Journal of Educational and Behavioral Statistics, 2025
Diagnostic tests are typically administered in a multiple-choice (MC) format due to their advantages of objectivity and time efficiency. The MC-deterministic input, noisy "and" gate (DINA) family of models, a representative class of cognitive diagnostic models for MC items, efficiently and parsimoniously estimates the mastery profiles of…
Descriptors: Diagnostic Tests, Cognitive Measurement, Multiple Choice Tests, Educational Assessment
Kylie Gorney; Sandip Sinharay – Educational and Psychological Measurement, 2025
Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory…
Descriptors: Scores, Test Theory, Test Items, Testing
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2022
The testlet comprises a set of items based on a common stimulus. When the testlet is used in the tests, there may violate the local independence assumption, and in this case, it would not be appropriate to use traditional item response theory models in the tests in which the testlet is included. When the testlet is discussed, one of the most…
Descriptors: Test Items, Test Theory, Models, Sample Size
Michelle D. Lazarus; Mahbub Sarkar; Claire Palermo; Sze-Ee Soh; Melanie K. Farlie – Anatomical Sciences Education, 2025
Worldviews influence research--from design to interpretation and reporting. Historically, psychometrics has been predominantly situated within a positivist paradigm, while social research has often aligned with interpretivist or critical paradigms. However, emerging perspectives in the philosophy-of-science are challenging this rigid alignment,…
Descriptors: World Views, Psychometrics, Allied Health Occupations Education, Educational Research
Nathaniel Owen; Ananda Senel – Review of Education, 2025
Transparency in high-stakes English language assessment has become crucial for ensuring fairness and maintaining assessment validity in language testing. However, our understanding of how transparency is conceptualised and implemented remains fragmented, particularly in relation to stakeholder experiences and technological innovations. This study…
Descriptors: Accountability, High Stakes Tests, Language Tests, Computer Assisted Testing
Osman Tat; Abdullah Faruk Kilic – Turkish Online Journal of Distance Education, 2024
The widespread availability of internet access in daily life has resulted in a greater acceptance of online assessment methods. E-assessment platforms offer various features such as randomizing questions and answers, utilizing extensive question banks, setting time limits, and managing access during online exams. Electronic assessment enables…
Descriptors: Test Construction, Test Validity, Test Reliability, Anxiety
Eray Selçuk; Ergül Demir – International Journal of Assessment Tools in Education, 2024
This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item…
Descriptors: Item Response Theory, Item Analysis, Test Items, Simulation
Stemler, Steven E.; Naples, Adam – Practical Assessment, Research & Evaluation, 2021
When students receive the same score on a test, does that mean they know the same amount about the topic? The answer to this question is more complex than it may first appear. This paper compares classical and modern test theories in terms of how they estimate student ability. Crucial distinctions between the aims of Rasch Measurement and IRT are…
Descriptors: Item Response Theory, Test Theory, Ability, Computation
Bruno D. Zumbo – International Journal of Assessment Tools in Education, 2023
In line with the journal volume's theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description of test…
Descriptors: Test Theory, Test Validity, True Scores, Definitions
Arce, Alvaro J.; Young, Michael J. – International Journal of Testing, 2022
The paper argues that contemporary test validity theory places the consequences of testing on the lives of all college applicants at the back of the test validation argument. It introduces the notion of test efficacy as a process to gather evidence on claims on consequences of testing on all college applicants that can be traced back to validity.…
Descriptors: Test Validity, Test Theory, College Applicants, College Entrance Examinations
Daniel M. Settlage; Jim R. Wollscheid – Journal of the Scholarship of Teaching and Learning, 2024
The examination of the testing mode effect has received increased attention as higher education has shifted to remote testing during the COVID-19 pandemic. We believe the testing mode effect consists of four components: the ability to physically write on the test, the method of answer recording, the proctoring/testing environment, and the effect…
Descriptors: College Students, Macroeconomics, Tests, Answer Sheets
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Chakrabartty, Satyendra Nath – International Journal of Psychology and Educational Studies, 2021
The paper proposes new measures of difficulty and discriminating values of binary items and test consisting of such items and find their relationships including estimation of test error variance and thereby the test reliability, as per definition using cosine similarities. The measures use entire data. Difficulty value of test and item is defined…
Descriptors: Test Items, Difficulty Level, Scores, Test Reliability
Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023
We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…
Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length