ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	8

Descriptor

Scoring Formulas	41
Test Items	41
Test Reliability	17
Difficulty Level	15
Multiple Choice Tests	15
Item Analysis	14
Guessing (Tests)	13
Higher Education	13
Test Construction	12
Testing Problems	10
Scoring	8
Test Validity	8
Achievement Tests	7
Psychometrics	7
Latent Trait Theory	6
Mathematical Models	6
Statistical Analysis	6
Test Theory	6
College Entrance Examinations	5
College Students	5
Error of Measurement	5
Comparative Analysis	4
Equated Scores	4
Foreign Countries	4
Scaling	4
More ▼

Source

Educational and Psychological…	3
Journal of Educational…	2
Journal of Experimental…	2
Applied Psychological…	1
ETS Research Report Series	1
Electronic Journal of…	1
Evaluation in Education: An…	1
Journal of Educational…	1
Journal of Psychoeducational…	1
Journal of School Health	1
Language Assessment Quarterly	1
Language Testing	1
Online Submission	1
Peabody Journal of Education	1
Review of Educational Research	1
More ▼

Publication Type

Reports - Research	41
Journal Articles	18
Speeches/Meeting Papers	13
Collected Works - General	1
Information Analyses	1
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3
Secondary Education	1

Audience

Researchers

Location

Czech Republic	1
Germany	1
Japan	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
SAT (College Admission Test)	2
Comprehensive Tests of Basic…	1
Graduate Management Admission…	1
Matching Familiar Figures Test	1
State Trait Anxiety Inventory	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 41 results Save | Export

Development and Validity Testing of the School Health Score Card

Peer reviewed

Direct link

Yun, Young Ho; Kim, Yaeji; Sim, Jin A.; Choi, Soo Hyuk; Lim, Cheolil; Kang, Joon-ho – Journal of School Health, 2018

Background: The objective of this study was to develop the School Health Score Card (SHSC) and validate its psychometric properties. Methods: The development of the SHSC questionnaire included 3 phases: item generation, construction of domains and items, and field testing with validation. To assess the instrument's reliability and validity, we…

Descriptors: School Health Services, Psychometrics, Test Construction, Test Validity

On Using Simulations to Inform Decision Making during Instrument Development

Peer reviewed

Direct link

Morgan, Grant B.; Moore, Courtney A.; Floyd, Harlee S. – Journal of Psychoeducational Assessment, 2018

Although content validity--how well each item of an instrument represents the construct being measured--is foundational in the development of an instrument, statistical validity is also important to the decisions that are made based on the instrument. The primary purpose of this study is to demonstrate how simulation studies can be used to assist…

Descriptors: Simulation, Decision Making, Test Construction, Validity

Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review

Peer reviewed

Direct link

Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017

Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…

Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns

Automated Scoring of Speaking Tasks in the Test of English-for-Teaching ("TEFT"™). Research Report. ETS RR-15-31

Peer reviewed
PDF on ERIC

Download full text

Zechner, Klaus; Chen, Lei; Davis, Larry; Evanini, Keelan; Lee, Chong Min; Leong, Chee Wee; Wang, Xinhao; Yoon, Su-Youn – ETS Research Report Series, 2015

This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching ("TEFT"™) within the "ELTeach"™ framework.The test consists of items for all four language modalities:…

Descriptors: Scoring, Scoring Formulas, Speech Communication, Task Analysis

The Scoring of Matching Questions Tests: A Closer Look

Peer reviewed
PDF on ERIC

Download full text

Jancarík, Antonín; Kostelecká, Yvona – Electronic Journal of e-Learning, 2015

Electronic testing has become a regular part of online courses. Most learning management systems offer a wide range of tools that can be used in electronic tests. With respect to time demands, the most efficient tools are those that allow automatic assessment. The presented paper focuses on one of these tools: matching questions in which one…

Descriptors: Online Courses, Computer Assisted Testing, Test Items, Scoring Formulas

Negative Life Events Scale for Students (NLESS)

Download full text

Buri, John R.; Cromett, Cristina E.; Post, Maria C.; Landis, Anna Marie; Alliegro, Marissa C. – Online Submission, 2015

Rationale is presented for the derivation of a new measure of stressful life events for use with students [Negative Life Events Scale for Students (NLESS)]. Ten stressful life events questionnaires were reviewed, and the more than 600 items mentioned in these scales were culled based on the following criteria: (a) only long-term and unpleasant…

Descriptors: Experience, Social Indicators, Stress Variables, Affective Measures

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

A Competency Model for Process Dynamics and Control and Its Use for Test Construction at University Level

Peer reviewed

Direct link

Taskinen, Päivi H.; Steimel, Jochen; Gräfe, Linda; Engell, Sebastian; Frey, Andreas – Peabody Journal of Education, 2015

This study examined students' competencies in engineering education at the university level. First, we developed a competency model in one specific field of engineering: process dynamics and control. Then, the theoretical model was used as a frame to construct test items to measure students' competencies comprehensively. In the empirical…

Descriptors: Models, Engineering Education, Test Items, Outcome Measures

Developing Homogeneous TOEFL Scales by Multidimensional Scaling.

Peer reviewed

Oltman, Phillip K.; Stricker, Lawrence J. – Language Testing, 1990

A recent multidimensional scaling analysis of the Test of English-as-a-Foreign-Language (TOEFL) item response data identified clusters of items in the test sections that, being more homogeneous than their parent sections, might be better for diagnostic use. The analysis was repeated using different scoring techniques. Results diverged only for…

Descriptors: English (Second Language), Item Analysis, Language Tests, Scaling

Maximizing Reliability in Multiple Choice Questions.

Peer reviewed

Willson, Victor L. – Educational and Psychological Measurement, 1982

The Serlin-Kaiser procedure is used to complete a principal components solution for scoring weights for all options of a given item. Coefficient alpha is maximized for a given multiple choice test. (Author/GK)

Descriptors: Analysis of Covariance, Factor Analysis, Multiple Choice Tests, Scoring Formulas

New Directions in Matching Familiar Figures Test Research Resulting From Scoring and Item Analyses.

Download full text

Brinzer, Raymond J. – 1979

The problem engendered by the Matching Familiar Figures (MFF) Test is one of instrument integrity (II). II is delimited by validity, reliability, and utility of MFF as a measure of the reflective-impulsive construct. Validity, reliability and utility of construct assessment may be improved by utilizing: (1) a prototypic scoring model that will…

Descriptors: Conceptual Tempo, Difficulty Level, Item Analysis, Research Methodology

The Impact of Item Deletion on Equating Conversions and Reported Score Distributions.

Peer reviewed

Dorans, Neil J. – Journal of Educational Measurement, 1986

The analytical decomposition demonstrates how the effects of item characteristics, test properties, individual examinee responses, and rounding rules combine to produce the item deletion effect on the equating/scaling function and candidate scores. The empirical portion of the report illustrates the effects of item deletion on reported score…

Descriptors: Difficulty Level, Equated Scores, Item Analysis, Latent Trait Theory

Reliability of Composite Measurements Based on the m Highest of n Equivalent Components.

Peer reviewed

Huynh, Huynh – Journal of Educational Statistics, 1986

Under the assumptions of classical measurement theory and the condition of normality, a formula is derived for the reliability of composite scores. The formula represents an extension of the Spearman-Brown formula to the case of truncated data. (Author/JAZ)

Descriptors: Computer Simulation, Error of Measurement, Expectancy Tables, Scoring Formulas

Nonsense Items in Multiple Choice Tests.

Download full text

Hutchinson, T. P. – 1984

One means of learning about the processes operating in a multiple choice test is to include some test items, called nonsense items, which have no correct answer. This paper compares two versions of a mathematical model of test performance to interpret test data that includes both genuine and nonsense items. One formula is based on the usual…

Descriptors: Foreign Countries, Guessing (Tests), Mathematical Models, Multiple Choice Tests

Content Validity and Reliability of Single Items or Questionnaires.

Peer reviewed

Aiken, Lewis R. – Educational and Psychological Measurement, 1980

Procedures for computing content validity and consistency reliability coefficients and determining the statistical significance of these coefficients are described. Procedures employing the multinomial probability distribution for small samples and normal curve probability estimates for large samples, can be used where judgments are made on…

Descriptors: Computer Programs, Measurement Techniques, Probability, Questionnaires

Previous Page | Next Page »

Pages: 1 | 2 | 3

Plake, Barbara S.	3
Angoff, William H.	2
Huynh, Huynh	2
Schrader, William B.	2
Weiss, David J.	2
Aiken, Lewis R.	1
Alliegro, Marissa C.	1
Bejar, Issac I.	1
Brennan, Robert L.	1
Brinzer, Raymond J.	1
Bulut, Okan	1
Buri, John R.	1
Chen, Lei	1
Choi, Soo Hyuk	1
Choppin, Bruce	1
Cromett, Cristina E.	1
Davis, Larry	1
Dorans, Neil J.	1
Downey, Ronald G.	1
Engell, Sebastian	1
Evanini, Keelan	1
Floyd, Harlee S.	1
Frary, Robert B.	1
Frey, Andreas	1
More ▼