Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 12 |
Descriptor
| Reliability | 33 |
| Test Format | 33 |
| Test Items | 11 |
| Psychometrics | 8 |
| Scores | 8 |
| Test Construction | 8 |
| Comparative Analysis | 7 |
| Validity | 7 |
| Evaluation Methods | 5 |
| Foreign Countries | 5 |
| Test Use | 5 |
| More ▼ | |
Source
Author
| Fisher, Anne G. | 2 |
| Moses, Tim | 2 |
| Mott, Michael S. | 2 |
| Almond, Patricia | 1 |
| Barnette, J. Jackson | 1 |
| Chang, Lei | 1 |
| Cheng, Ying-Yao | 1 |
| Demsky, Yvonne I. | 1 |
| Derenne, Adam | 1 |
| Dietz, Thomas | 1 |
| Downing, Steven M. | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 33 |
| Journal Articles | 27 |
| Speeches/Meeting Papers | 6 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 4 |
| High Schools | 2 |
| Postsecondary Education | 2 |
| Elementary Education | 1 |
| Grade 11 | 1 |
| Grade 12 | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 1 |
Location
| Australia | 1 |
| California | 1 |
| Finland | 1 |
| Italy | 1 |
| North Dakota | 1 |
| South Korea | 1 |
| Turkey | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Advanced Placement… | 1 |
| Iowa Tests of Basic Skills | 1 |
| Peabody Individual… | 1 |
| Peabody Picture Vocabulary… | 1 |
| Self Description Questionnaire | 1 |
| Wechsler Adult Intelligence… | 1 |
What Works Clearinghouse Rating
Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022
The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…
Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise
Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019
This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…
Descriptors: Classification, Reliability, Accuracy, Test Format
Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items
Yarnell, Jordy B.; Pfeiffer, Steven I. – Journal of Psychoeducational Assessment, 2015
The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering…
Descriptors: Gifted, Psychometrics, Rating Scales, Computer Assisted Testing
Moses, Tim – ETS Research Report Series, 2013
The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…
Descriptors: Psychometrics, Scores, Correlation, Regression (Statistics)
Hoffman, Lesa; Templin, Jonathan; Rice, Mabel L. – Journal of Speech, Language, and Hearing Research, 2012
Purpose: The present work describes how vocabulary ability as assessed by 3 different forms of the Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 1997) can be placed on a common latent metric through item response theory (IRT) modeling, by which valid comparisons of ability between samples or over time can then be made. Method: Responses…
Descriptors: Item Response Theory, Test Format, Vocabulary, Comparative Analysis
Weatherly, Jeffrey N.; Derenne, Adam; Terrell, Heather K. – Psychological Record, 2011
Several measures of delay discounting have been shown to be reliable over periods of up to 3 months. In the present study, 115 participants completed a fill-in-the-blank (FITB) delay-discounting task on sets of 5 different commodities, 12 weeks apart. Results showed that discounting rates were not well described by a hyperbolic function but were…
Descriptors: Delay of Gratification, Reliability, Test Format, Measures (Individuals)
Lee, HyeSun; Winke, Paula – Language Testing, 2013
We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed…
Descriptors: Academic Ability, Stakeholders, Reliability, Listening Comprehension Tests
Vassar, Matt – Social Indicators Research, 2008
The purpose of the present study was to meta-analytically investigate the score reliability for the Satisfaction With Life Scale. Four-hundred and sixteen articles using the measure were located through electronic database searches and then separated to identify studies which had calculated reliability estimates from their own data. Sixty-two…
Descriptors: Test Format, Life Satisfaction, Reliability, Measures (Individuals)
Jang, Yoonhee; Wixted, John T.; Huber, David E. – Journal of Experimental Psychology: General, 2009
The current study compared 3 models of recognition memory in their ability to generalize across yes/no and 2-alternative forced-choice (2AFC) testing. The unequal-variance signal-detection model assumes a continuous memory strength process. The dual-process signal-detection model adds a thresholdlike recollection process to a continuous…
Descriptors: Test Format, Familiarity, Testing, Criteria
Peer reviewedEllison, Stephanie; Fisher, Anne G.; Duran, Leslie – Journal of Applied Measurement, 2001
Evaluated the alternate forms reliability of new versus old tasks of the Assessment of Motor and Process Skills (AMPS) (A. Fisher, 1993). Participants were 44 persons from the AMPS database. Results support good alternate forms reliability of the motor and process ability measures and suggest that the newly calibrated tasks can be used reliably in…
Descriptors: Adults, Evaluation Methods, Psychomotor Skills, Reliability
Papanastasiou, Elena C.; Reckase, Mark D. – International Journal of Testing, 2007
Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in…
Descriptors: Simulation, Adaptive Testing, Computer Assisted Testing, Test Items
Peer reviewedFeldt, Leonard S. – Applied Measurement in Education, 2002
Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)
Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format
Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2007
Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Scores
Peer reviewedMaggi, Stefania – International Journal of Testing, 2001
Developed an Italian version of the Self-Description Questionnaire (SDQ-III) and studied the reliability and factorial validity of this translated instrument. Results show that the translated version has psychometric properties similar to those of the original English version. (SLD)
Descriptors: Factor Structure, Foreign Countries, Psychometrics, Reliability

Direct link
