Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 4 |
Descriptor
| Item Analysis | 41 |
| Test Reliability | 41 |
| Testing Problems | 41 |
| Test Validity | 21 |
| Test Items | 17 |
| Test Construction | 15 |
| Multiple Choice Tests | 8 |
| Response Style (Tests) | 8 |
| Test Interpretation | 8 |
| Error of Measurement | 7 |
| Scoring | 7 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Elementary Secondary Education | 1 |
Audience
| Practitioners | 2 |
| Researchers | 2 |
| Teachers | 1 |
Location
| Indonesia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Patrisius Istiarto Djiwandono; Daniel Ginting – Language Education & Assessment, 2025
The teaching of English as a foreign language in Indonesia has a long history, and it is always important to ask whether the assessment of the students' language skills has been valid and reliable. A screening of many articles in several prominent databases reveal that a number of evaluation studies have been done by Indonesian scholars in the…
Descriptors: Foreign Countries, Language Tests, English (Second Language), Second Language Learning
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Peer reviewedBurton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Item-discrimination indices are numbers calculated from test data that are used in assessing the effectiveness of individual test questions. This article asserts that the indices are so unreliable as to suggest that countless good questions may have been discarded over the years. It considers how the indices, and hence overall test reliability,…
Descriptors: Guessing (Tests), Item Analysis, Test Reliability, Testing Problems
Peer reviewedvan den Wollenberg, Arnold L. – Psychometrika, 1982
Presently available test statistics for the Rasch model are shown to be insensitive to violations of the assumption of test unidimensionality. Two new statistics are presented. One is similar to available statistics, but with some improvements; the other addresses the problem of insensitivity to unidimensionality. (Author/JKS)
Descriptors: Item Analysis, Latent Trait Theory, Statistics, Test Reliability
Peer reviewedKuncel, Ruth Boutin; Fiske, Donald W. – Educational and Psychological Measurement, 1974
Four hypotheses regarding stability of response process and response in personality testing are tested and supported. (RC)
Descriptors: College Students, Item Analysis, Personality Measures, Response Style (Tests)
Peer reviewedAndrulis, Richard S.; And Others – Educational and Psychological Measurement, 1978
The effects of repeaters (testees included in both administrations of two forms of a test) on the test equating process are examined. It is shown that repeaters do effect test equating and tend to lower the cutoff point for passing the test. (JKS)
Descriptors: Cutting Scores, Equated Scores, Item Analysis, Scoring
Strickland, Guy – 1970
This report summarizes the findings of Jackson and Lahadern who used a revised form of the Student Opinion Poll (SOP) and a questionnaire to study the intercorrelations of attitudes and achievement. The study found that: (1) first graders have attitudes toward school work but these attitudes were not differentiated toward specific school subjects;…
Descriptors: Achievement, Attitudes, Evaluation, Item Analysis
Peer reviewedRusch, Reuben; Steiner, Judith – Journal of Experimental Education, 1979
The Selected Marker Tests were examined for scoring problems and internal consistency and were administered orally to sixth and seventh graders. Scoring problems were discovered and changes were suggested. The problem was found to be item reliability rather than interrater reliability. (Author/MH)
Descriptors: Cognitive Tests, Elementary Education, Item Analysis, Problem Solving
Peer reviewedBarnette, J. Jackson; And Others – Educational Research Quarterly, 1978
The DELPHI procedure requires respondents to reply to several questionnaire iterations with subsequent rounds containing previous round feedback. This study investigated the methodology (response rates, effects of feedback) and offered evidence that large-scale DELPHI surveys are not as advantageous as has previously been indicated. Suggestions…
Descriptors: Feedback, Item Analysis, Measurement Techniques, Predictive Measurement
Peer reviewedWhitely, Susan E. – Journal of Educational Measurement, 1977
A debate concerning specific issues and the general usefulness of the Rasch latent trait test model is continued. Methods of estimation, necessary sample size, and the applicability of the model are discussed. (JKS)
Descriptors: Error of Measurement, Item Analysis, Mathematical Models, Measurement
Peer reviewedWright, Benjamin D. – Journal of Educational Measurement, 1977
Statements made in a previous article of this journal concerning the Rasch latent trait test model are questioned. Methods of estimation, necessary sample sizes, several formuli, and the general usefulness of the Rasch model are discussed. (JKS)
Descriptors: Computers, Error of Measurement, Item Analysis, Mathematical Models
Miller, Harry G.; Williams, Reed G. – Educational Technology, 1973
Descriptors: Content Analysis, Item Analysis, Measurement Techniques, Multiple Choice Tests
Linn, Robert – 1978
A series of studies on conceptual and design problems in competency-based measurements are explained. The concept of validity within the context of criterion-referenced measurement is reviewed. The authors believe validation should be viewed as a process rather than an end product. It is the process of marshalling evidence to support…
Descriptors: Criterion Referenced Tests, Item Analysis, Item Sampling, Test Bias

Direct link
