ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	12

Descriptor

Reliability	33
Test Format	33
Test Items	11
Psychometrics	8
Scores	8
Test Construction	8
Comparative Analysis	7
Validity	7
Evaluation Methods	5
Foreign Countries	5
Test Use	5
Adults	4
Correlation	4
Elementary School Students	4
Error of Measurement	4
High School Students	4
Higher Education	4
Item Response Theory	4
Computer Assisted Testing	3
High Schools	3
Rating Scales	3
Scoring	3
Scoring Rubrics	3
Test Length	3
Undergraduate Students	3
More ▼

Publication Type

Reports - Research	33
Journal Articles	27
Speeches/Meeting Papers	6
Tests/Questionnaires	1

Education Level

Higher Education	4
High Schools	2
Postsecondary Education	2
Elementary Education	1
Grade 11	1
Grade 12	1
Secondary Education	1

Audience

Researchers

Location

Australia	1
California	1
Finland	1
Italy	1
North Dakota	1
South Korea	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Iowa Tests of Basic Skills	1
Peabody Individual…	1
Peabody Picture Vocabulary…	1
Self Description Questionnaire	1
Wechsler Adult Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

Using Rasch Analysis to Examine Raters' Expertise Turkish Teacher Candidates' Competency Levels in Writing Different Types of Test Items

Peer reviewed
PDF on ERIC

Download full text

Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022

The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…

Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise

Classification Consistency and Accuracy for Mixed-Format Tests

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019

This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…

Descriptors: Classification, Reliability, Accuracy, Test Format

Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

Peer reviewed

Direct link

Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013

The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items

Internet Administration of the Paper-and-Pencil Gifted Rating Scale: Assessing Psychometric Equivalence

Peer reviewed

Direct link

Yarnell, Jordy B.; Pfeiffer, Steven I. – Journal of Psychoeducational Assessment, 2015

The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering…

Descriptors: Gifted, Psychometrics, Rating Scales, Computer Assisted Testing

ETS Psychometric Contributions: Focus on Test Scores. Research Report. ETS RR-13-15. ETS R&D Scientific and Policy Contributions Series. ETS SPC-13-03

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim – ETS Research Report Series, 2013

The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…

Descriptors: Psychometrics, Scores, Correlation, Regression (Statistics)

Linking Outcomes from Peabody Picture Vocabulary Test Forms Using Item Response Models

Peer reviewed

Direct link

Hoffman, Lesa; Templin, Jonathan; Rice, Mabel L. – Journal of Speech, Language, and Hearing Research, 2012

Purpose: The present work describes how vocabulary ability as assessed by 3 different forms of the Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 1997) can be placed on a common latent metric through item response theory (IRT) modeling, by which valid comparisons of ability between samples or over time can then be made. Method: Responses…

Descriptors: Item Response Theory, Test Format, Vocabulary, Comparative Analysis

Testing the Reliability of Delay Discounting of Ten Commodities Using the Fill-in-the-Blank Method

Peer reviewed

Direct link

Weatherly, Jeffrey N.; Derenne, Adam; Terrell, Heather K. – Psychological Record, 2011

Several measures of delay discounting have been shown to be reliable over periods of up to 3 months. In the present study, 115 participants completed a fill-in-the-blank (FITB) delay-discounting task on sets of 5 different commodities, 12 weeks apart. Results showed that discounting rates were not well described by a hyperbolic function but were…

Descriptors: Delay of Gratification, Reliability, Test Format, Measures (Individuals)

The Differences among Three-, Four-, and Five-Option-Item Formats in the Context of a High-Stakes English-Language Listening Test

Peer reviewed

Direct link

Lee, HyeSun; Winke, Paula – Language Testing, 2013

We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed…

Descriptors: Academic Ability, Stakeholders, Reliability, Listening Comprehension Tests

A Note on the Score Reliability for the Satisfaction with Life Scale: An RG Study

Peer reviewed

Direct link

Vassar, Matt – Social Indicators Research, 2008

The purpose of the present study was to meta-analytically investigate the score reliability for the Satisfaction With Life Scale. Four-hundred and sixteen articles using the measure were located through electronic database searches and then separated to identify studies which had calculated reliability estimates from their own data. Sixty-two…

Descriptors: Test Format, Life Satisfaction, Reliability, Measures (Individuals)

Testing Signal-Detection Models of Yes/No and Two-Alternative Forced-Choice Recognition Memory

Peer reviewed

Direct link

Jang, Yoonhee; Wixted, John T.; Huber, David E. – Journal of Experimental Psychology: General, 2009

The current study compared 3 models of recognition memory in their ability to generalize across yes/no and 2-alternative forced-choice (2AFC) testing. The unequal-variance signal-detection model assumes a continuous memory strength process. The dual-process signal-detection model adds a thresholdlike recollection process to a continuous…

Descriptors: Test Format, Familiarity, Testing, Criteria

The Alternate Forms Reliability of the New Tasks Added to the Assessment of Motor and Process Skills.

Peer reviewed

Ellison, Stephanie; Fisher, Anne G.; Duran, Leslie – Journal of Applied Measurement, 2001

Evaluated the alternate forms reliability of new versus old tasks of the Assessment of Motor and Process Skills (AMPS) (A. Fisher, 1993). Participants were 44 persons from the AMPS database. Results support good alternate forms reliability of the motor and process ability measures and suggest that the newly calibrated tasks can be used reliably in…

Descriptors: Adults, Evaluation Methods, Psychomotor Skills, Reliability

A "Rearrangement Procedure" for Scoring Adaptive Tests with Review Options

Peer reviewed

Direct link

Papanastasiou, Elena C.; Reckase, Mark D. – International Journal of Testing, 2007

Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in…

Descriptors: Simulation, Adaptive Testing, Computer Assisted Testing, Test Items

Estimating the Internal Consistency Reliability of Tests Composed of Testlets Varying in Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)

Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format

The Effect of Using Item Parameters Calibrated from Paper Administrations in Computer Adaptive Test Administrations

Peer reviewed
PDF on ERIC

Download full text

Direct link

Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2007

Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Scores

Italian Version of the Self-Description Questionnaire-III.

Peer reviewed

Maggi, Stefania – International Journal of Testing, 2001

Developed an Italian version of the Self-Description Questionnaire (SDQ-III) and studied the reliability and factorial validity of this translated instrument. Results show that the translated version has psychometric properties similar to those of the original English version. (SLD)

Descriptors: Factor Structure, Foreign Countries, Psychometrics, Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	3
International Journal of…	3
Applied Measurement in…	2
Journal of Applied Measurement	2
Advances in Health Sciences…	1
Applied Psychological…	1
Assessment	1
ETS Research Report Series	1
Educational Assessment	1
International Journal of…	1
Journal of Experimental…	1
Journal of Marriage and the…	1
Journal of Outcome Measurement	1
Journal of Psychoeducational…	1
Journal of Speech, Language,…	1
Journal of Technology,…	1
Journal of Youth and…	1
Language Testing	1
Psychological Record	1
Research in the Schools	1
Social Indicators Research	1
More ▼

Fisher, Anne G.	2
Moses, Tim	2
Mott, Michael S.	2
Almond, Patricia	1
Barnette, J. Jackson	1
Chang, Lei	1
Cheng, Ying-Yao	1
Demsky, Yvonne I.	1
Derenne, Adam	1
Dietz, Thomas	1
Downing, Steven M.	1
Duran, Leslie	1
Earley, Mark A.	1
Ellison, Stephanie	1
Feldt, Leonard S.	1
Gass, Carlton S.	1
Golden, Charles J.	1
Guagnano, Gregory A.	1
Halonen, Pirjo	1
Halpin, Regina	1
Hare, R. Dwight	1
Harley, Dwight	1
Hoffman, Lesa	1
Hollenbeck, Keith	1
More ▼