Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 8 |
Descriptor
| Scoring Formulas | 41 |
| Test Items | 41 |
| Test Reliability | 17 |
| Difficulty Level | 15 |
| Multiple Choice Tests | 15 |
| Item Analysis | 14 |
| Guessing (Tests) | 13 |
| Higher Education | 13 |
| Test Construction | 12 |
| Testing Problems | 10 |
| Scoring | 8 |
| More ▼ | |
Source
Author
| Plake, Barbara S. | 3 |
| Angoff, William H. | 2 |
| Huynh, Huynh | 2 |
| Schrader, William B. | 2 |
| Weiss, David J. | 2 |
| Aiken, Lewis R. | 1 |
| Alliegro, Marissa C. | 1 |
| Bejar, Issac I. | 1 |
| Brennan, Robert L. | 1 |
| Brinzer, Raymond J. | 1 |
| Bulut, Okan | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 41 |
| Journal Articles | 18 |
| Speeches/Meeting Papers | 13 |
| Collected Works - General | 1 |
| Information Analyses | 1 |
| Reports - Evaluative | 1 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 3 |
| Postsecondary Education | 3 |
| Secondary Education | 1 |
Audience
| Researchers | 4 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yun, Young Ho; Kim, Yaeji; Sim, Jin A.; Choi, Soo Hyuk; Lim, Cheolil; Kang, Joon-ho – Journal of School Health, 2018
Background: The objective of this study was to develop the School Health Score Card (SHSC) and validate its psychometric properties. Methods: The development of the SHSC questionnaire included 3 phases: item generation, construction of domains and items, and field testing with validation. To assess the instrument's reliability and validity, we…
Descriptors: School Health Services, Psychometrics, Test Construction, Test Validity
Morgan, Grant B.; Moore, Courtney A.; Floyd, Harlee S. – Journal of Psychoeducational Assessment, 2018
Although content validity--how well each item of an instrument represents the construct being measured--is foundational in the development of an instrument, statistical validity is also important to the decisions that are made based on the instrument. The primary purpose of this study is to demonstrate how simulation studies can be used to assist…
Descriptors: Simulation, Decision Making, Test Construction, Validity
Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…
Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns
Zechner, Klaus; Chen, Lei; Davis, Larry; Evanini, Keelan; Lee, Chong Min; Leong, Chee Wee; Wang, Xinhao; Yoon, Su-Youn – ETS Research Report Series, 2015
This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching ("TEFT"™) within the "ELTeach"™ framework.The test consists of items for all four language modalities:…
Descriptors: Scoring, Scoring Formulas, Speech Communication, Task Analysis
Jancarík, Antonín; Kostelecká, Yvona – Electronic Journal of e-Learning, 2015
Electronic testing has become a regular part of online courses. Most learning management systems offer a wide range of tools that can be used in electronic tests. With respect to time demands, the most efficient tools are those that allow automatic assessment. The presented paper focuses on one of these tools: matching questions in which one…
Descriptors: Online Courses, Computer Assisted Testing, Test Items, Scoring Formulas
Buri, John R.; Cromett, Cristina E.; Post, Maria C.; Landis, Anna Marie; Alliegro, Marissa C. – Online Submission, 2015
Rationale is presented for the derivation of a new measure of stressful life events for use with students [Negative Life Events Scale for Students (NLESS)]. Ten stressful life events questionnaires were reviewed, and the more than 600 items mentioned in these scales were culled based on the following criteria: (a) only long-term and unpleasant…
Descriptors: Experience, Social Indicators, Stress Variables, Affective Measures
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
Taskinen, Päivi H.; Steimel, Jochen; Gräfe, Linda; Engell, Sebastian; Frey, Andreas – Peabody Journal of Education, 2015
This study examined students' competencies in engineering education at the university level. First, we developed a competency model in one specific field of engineering: process dynamics and control. Then, the theoretical model was used as a frame to construct test items to measure students' competencies comprehensively. In the empirical…
Descriptors: Models, Engineering Education, Test Items, Outcome Measures
Peer reviewedOltman, Phillip K.; Stricker, Lawrence J. – Language Testing, 1990
A recent multidimensional scaling analysis of the Test of English-as-a-Foreign-Language (TOEFL) item response data identified clusters of items in the test sections that, being more homogeneous than their parent sections, might be better for diagnostic use. The analysis was repeated using different scoring techniques. Results diverged only for…
Descriptors: English (Second Language), Item Analysis, Language Tests, Scaling
Peer reviewedWillson, Victor L. – Educational and Psychological Measurement, 1982
The Serlin-Kaiser procedure is used to complete a principal components solution for scoring weights for all options of a given item. Coefficient alpha is maximized for a given multiple choice test. (Author/GK)
Descriptors: Analysis of Covariance, Factor Analysis, Multiple Choice Tests, Scoring Formulas
Brinzer, Raymond J. – 1979
The problem engendered by the Matching Familiar Figures (MFF) Test is one of instrument integrity (II). II is delimited by validity, reliability, and utility of MFF as a measure of the reflective-impulsive construct. Validity, reliability and utility of construct assessment may be improved by utilizing: (1) a prototypic scoring model that will…
Descriptors: Conceptual Tempo, Difficulty Level, Item Analysis, Research Methodology
Peer reviewedDorans, Neil J. – Journal of Educational Measurement, 1986
The analytical decomposition demonstrates how the effects of item characteristics, test properties, individual examinee responses, and rounding rules combine to produce the item deletion effect on the equating/scaling function and candidate scores. The empirical portion of the report illustrates the effects of item deletion on reported score…
Descriptors: Difficulty Level, Equated Scores, Item Analysis, Latent Trait Theory
Peer reviewedHuynh, Huynh – Journal of Educational Statistics, 1986
Under the assumptions of classical measurement theory and the condition of normality, a formula is derived for the reliability of composite scores. The formula represents an extension of the Spearman-Brown formula to the case of truncated data. (Author/JAZ)
Descriptors: Computer Simulation, Error of Measurement, Expectancy Tables, Scoring Formulas
Hutchinson, T. P. – 1984
One means of learning about the processes operating in a multiple choice test is to include some test items, called nonsense items, which have no correct answer. This paper compares two versions of a mathematical model of test performance to interpret test data that includes both genuine and nonsense items. One formula is based on the usual…
Descriptors: Foreign Countries, Guessing (Tests), Mathematical Models, Multiple Choice Tests
Peer reviewedAiken, Lewis R. – Educational and Psychological Measurement, 1980
Procedures for computing content validity and consistency reliability coefficients and determining the statistical significance of these coefficients are described. Procedures employing the multinomial probability distribution for small samples and normal curve probability estimates for large samples, can be used where judgments are made on…
Descriptors: Computer Programs, Measurement Techniques, Probability, Questionnaires

Direct link
