Publication Date
| In 2026 | 0 |
| Since 2025 | 197 |
| Since 2022 (last 5 years) | 1067 |
| Since 2017 (last 10 years) | 2577 |
| Since 2007 (last 20 years) | 4938 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Gal Kaldes; Jason Braasch; Erica Kessler – Grantee Submission, 2025
Purpose: College placement assessments often overlook multilingual learners' full linguistic abilities and literacy engagement, as standardized tests primarily assess English proficiency rather than how students interact with academic texts. Directed Self-Placement (DSP) offers an alternative approach through self-assessment, with some models…
Descriptors: Placement Tests, Student Placement, College Students, Multilingualism
Lucia M. Reyes; Michael A. Cook; Steven M. Ross – Center for Research and Reform in Education, 2025
In March of 2025, brightwheel, a San Francisco-based educational technology company, partnered with the Center for Research and Reform in Education (CRRE) at Johns Hopkins University to test brightwheel's product, the Experience Assessment. The assessment was designed to provide early childhood educators with an objective and systematic way to…
Descriptors: Psychometrics, Educational Technology, Early Childhood Education, Young Children
Lu, Ru; Kim, Sooyeon – ETS Research Report Series, 2021
This study evaluated the impact of subgroup weighting for equating through a common-item anchor. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that equating was most accurate when the new form and reference form samples were weighted to be similar to the target…
Descriptors: Equated Scores, Weighted Scores, Raw Scores, Test Items
Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2021
Clinical, medical, and health psychologists use difference scores obtained from pretest--posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed…
Descriptors: Test Reliability, Scores, Pretests Posttests, Computation
Colombi, Roberto; Giordano, Sabrina; Tutz, Gerhard – Journal of Educational and Behavioral Statistics, 2021
A mixture of logit models is proposed that discriminates between responses to rating questions that are affected by a tendency to prefer middle or extremes of the scale regardless of the content of the item (response styles) and purely content-driven preferences. Explanatory variables are used to characterize the content-driven way of answering as…
Descriptors: Rating Scales, Response Style (Tests), Test Items, Models
Sanrey, Camille; Bressoux, Pascal; Lima, Laurent; Pansu, Pascal – British Journal of Educational Psychology, 2021
Background: In academic contexts, teachers' judgements are central to instruction and have many consequences for students' self-perceptions. Understanding the cognitive biases that may exist in teachers' judgements is thus of central importance. Aims: This paper presents two studies in which we aimed to investigate the presence of a halo effect in…
Descriptors: Evaluative Thinking, Teachers, Bias, Student Evaluation
Baldwin, Peter; Yaneva, Victoria; Mee, Janet; Clauser, Brian E.; Ha, Le An – Journal of Educational Measurement, 2021
In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information-retrieval-based automatic question-answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random…
Descriptors: Natural Language Processing, Prediction, Item Response Theory, Reaction Time
Andersson, Björn; Xin, Tao – Journal of Educational and Behavioral Statistics, 2021
The estimation of high-dimensional latent regression item response theory (IRT) models is difficult because of the need to approximate integrals in the likelihood function. Proposed solutions in the literature include using stochastic approximations, adaptive quadrature, and Laplace approximations. We propose using a second-order Laplace…
Descriptors: Item Response Theory, Computation, Regression (Statistics), Statistical Bias
Peralta, Yadira; Aguilar-Rodriguez, Adriana; González Dávila, Osiel; Miranda, Alfonso – Journal of Psychoeducational Assessment, 2021
According to the literature, the use of the Berkeley Puppet Interview (BPI) to measure Big Five personality traits in children provides reliable and valid scores. However, the implementation of the BPI could be costly, especially when working with large sample sizes. Big Five self-reports were collected from 1118 Mexican children aged 7-8 years…
Descriptors: Personality Measures, Children, Test Reliability, Foreign Countries
Myszkowski, Nils; Storme, Martin – Journal of Creative Behavior, 2021
Fluency tasks are among the most common item formats for the assessment of certain cognitive abilities, such as verbal fluency or divergent thinking. A typical approach to the psychometric modeling of such tasks (e.g., "Intelligence," 2016, 57, 25) is the Rasch Poisson Counts Model (RPCM; "Probabilistic models for some intelligence…
Descriptors: Creative Thinking, Cognitive Measurement, Test Items, Difficulty Level
Hall, Matthew L.; Reidies, Jess A. – Journal of Deaf Studies and Deaf Education, 2021
We tested the utility of two standardized measures of receptive skills in American Sign Language (ASL) in hearing adults who are novice signers: the ASL Comprehension Test (ASL-CT; Hauser, P. C., Paludneviciene, R., Riddle, W., Kurz, K. B., Emmorey, K., & Contreras, J. (2016). American Sign Language Comprehension Test: A tool for sign language…
Descriptors: American Sign Language, Receptive Language, Novices, Adults
Ha, Hung Tan – Language Testing in Asia, 2021
The Listening Vocabulary Levels Test (LVLT) created by McLean et al. Language Teaching Research 19:741-760, 2015 filled an important gap in the field of second language assessment by introducing an instrument for the measurement of phonological vocabulary knowledge. However, few attempts have been made to provide further validity evidence for the…
Descriptors: Vocabulary, Vietnamese, Test Validity, Test Items
Sinharay, Sandip – Grantee Submission, 2021
Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices of Levine and Drasgow (1988) and is the most powerful statistic for detecting item…
Descriptors: Robustness (Statistics), Hypothesis Testing, Statistics, Test Items
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Cooperman, Allison W.; Weiss, David J.; Wang, Chun – Educational and Psychological Measurement, 2022
Adaptive measurement of change (AMC) is a psychometric method for measuring intra-individual change on one or more latent traits across testing occasions. Three hypothesis tests--a Z test, likelihood ratio test, and score ratio index--have demonstrated desirable statistical properties in this context, including low false positive rates and high…
Descriptors: Error of Measurement, Psychometrics, Hypothesis Testing, Simulation

Peer reviewed
Direct link
