Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 8 |
| Since 2017 (last 10 years) | 13 |
| Since 2007 (last 20 years) | 17 |
Descriptor
| Test Items | 48 |
| Test Validity | 37 |
| Test Construction | 21 |
| Test Reliability | 15 |
| Elementary Secondary Education | 11 |
| Literature Reviews | 11 |
| Achievement Tests | 9 |
| Psychometrics | 9 |
| Scores | 9 |
| Content Validity | 8 |
| Item Analysis | 8 |
| More ▼ | |
Source
Author
| Cawthon, Stephanie | 2 |
| Diamond, Esther E. | 2 |
| Downing, Steven M. | 2 |
| Haladyna, Thomas M. | 2 |
| Hambleton, Ronald K. | 2 |
| Leppo, Rachel | 2 |
| Aiken, Lewis R. | 1 |
| Barry, Margot | 1 |
| Beck, Klaus | 1 |
| Ben-Porath, Yossef S. | 1 |
| Benson, Jeri | 1 |
| More ▼ | |
Publication Type
| Information Analyses | 48 |
| Journal Articles | 31 |
| Reports - Research | 12 |
| Speeches/Meeting Papers | 11 |
| Reports - Evaluative | 6 |
| Opinion Papers | 5 |
| Guides - Non-Classroom | 1 |
| Reports - Descriptive | 1 |
Education Level
| Elementary Secondary Education | 2 |
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Adult Education | 1 |
Audience
| Researchers | 7 |
| Practitioners | 3 |
| Teachers | 1 |
Location
| Australia | 1 |
| Indonesia | 1 |
| Minnesota | 1 |
| Spain | 1 |
| United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| General Educational… | 1 |
| Graduate Record Examinations | 1 |
| Minnesota Multiphasic… | 1 |
| SAT (College Admission Test) | 1 |
| Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Patrisius Istiarto Djiwandono; Daniel Ginting – Language Education & Assessment, 2025
The teaching of English as a foreign language in Indonesia has a long history, and it is always important to ask whether the assessment of the students' language skills has been valid and reliable. A screening of many articles in several prominent databases reveal that a number of evaluation studies have been done by Indonesian scholars in the…
Descriptors: Foreign Countries, Language Tests, English (Second Language), Second Language Learning
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Camilla M. McMahon; Maryellen Brunson McClain; Savannah Wells; Sophia Thompson; Jeffrey D. Shahidullah – Journal of Autism and Developmental Disorders, 2025
Purpose: The goal of the current study was to conduct a substantive validity review of four autism knowledge assessments with prior psychometric support (Gillespie-Lynch in J Autism and Dev Disord 45(8):2553-2566, 2015; Harrison in J Autism and Dev Disord 47(10):3281-3295, 2017; McClain in J Autism and Dev Disord 50(3):998-1006, 2020; McMahon…
Descriptors: Measures (Individuals), Psychometrics, Test Items, Accuracy
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Ella Anghel; Lale Khorramdel; Matthias von Davier – Large-scale Assessments in Education, 2024
As the use of process data in large-scale educational assessments is becoming more common, it is clear that data on examinees' test-taking behaviors can illuminate their performance, and can have crucial ramifications concerning assessments' validity. A thorough review of the literature in the field may inform researchers and practitioners of…
Descriptors: Educational Assessment, Test Validity, Test Items, Reaction Time
Ekaterina Sudina – Studies in Second Language Acquisition, 2023
As survey research in second language acquisition grows in popularity, the adherence to best practices associated with questionnaire quality is critical for a better understanding of factors that influence second language (L2) development. To ensure that a self-report scale targets the construct of interest and does it consistently and accurately,…
Descriptors: Second Language Learning, Language Acquisition, Measures (Individuals), Test Reliability
Rosa, Claudio D.; Collado, Silvia; Larson, Lincoln R. – Journal of Environmental Education, 2022
The New Ecological Paradigm (NEP) scale adapted for use with children (NEP-C) is one of the most frequently used measures of children's environmental beliefs. Though widely utilized, the limitations of the NEP-C instrument are often overlooked. Based on a systematic synthesis of existing literature examining the NEP-C, we argue that the scale…
Descriptors: Attitude Measures, Children, Environment, Beliefs
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Villarreal, Victor – Journal of Psychoeducational Assessment, 2019
The "Rating Scale of Impairment" (RSI; Goldstein & Naglieri, 2016b) is a norm-referenced measure of functional impairment. The RSI measures impairment in six domains, as well as overall impairment, based in part on the International Classification of Functioning, Disability, and Health. Functional impairment, as defined by the ICF…
Descriptors: Rating Scales, Norm Referenced Tests, Disabilities, Test Construction
Beck, Klaus – Frontline Learning Research, 2020
Many test developers try to ensure the content validity of their tests by having external experts review the items, e.g. in terms of relevance, difficulty, or clarity. Although this approach is widely accepted, a closer look reveals several pitfalls need to be avoided if experts' advice is to be truly helpful. The purpose of this paper is to…
Descriptors: Content Validity, Psychological Testing, Educational Testing, Student Evaluation
Jay Parkes – Journal of Faculty Development, 2021
Brief multiple-choice question workshops are a prevalent part of the faculty development landscape. But do they work? Studies have documented that faculty member-written multiple-choice questions (fMCQs) are frequently flawed and do not live up to quality standards. Poor fMCQs have real consequences for students beyond annoyance. Fourteen studies…
Descriptors: Teacher Workshops, Multiple Choice Tests, Faculty Development, Program Effectiveness
Barry, Margot; Egan, Arlene – International Review of Education, 2018
Adult learners are attracted to learning opportunities (e.g. course offers) which seem promising in terms of allowing them to match their choices to their own perceived predispositions. To find out more about their personal learning style, some adult learners may fill in a questionnaire designed by researchers who aim (and claim) to enable both…
Descriptors: Adult Learning, Cognitive Style, Adult Education, Interest Research
Carrió-Pastor, María Luisa; Martín Marchante, Beatriz – International Journal of English Studies, 2018
The work at hand is part of a wider study the aim of which was to determine what kind of factors influence failures in pragmatic items of an online adaptive placement test taken by a group of 34 Spanish students in their first year at university. A preceding analysis (Carrió & Martín, 2016) showed the type of personal factors, such as lack of…
Descriptors: Pragmatics, Test Items, Language Tests, College Freshmen
Walsh, Kerryann; Rassafiani, Mehdi; Mathews, Ben; Farrell, Ann; Butler, Des – Journal of Child Sexual Abuse, 2010
This paper details a systematic literature review identifying problems in extant research relating to teachers' attitudes toward reporting child sexual abuse and offers a model for new attitude scale development and testing. Scale development comprised a five-phase process grounded in contemporary attitude theories, including (a) developing the…
Descriptors: Sexual Abuse, Child Abuse, Focus Groups, Content Validity
Cawthon, Stephanie; Leppo, Rachel – American Annals of the Deaf, 2013
The authors conducted a qualitative meta-analysis of the research on assessment accommodations for students who are deaf or hard of hearing. There were 16 identified studies that analyzed the impact of factors related to student performance on academic assessments across different educational settings, content areas, and types of assessment…
Descriptors: Testing Accommodations, Academic Achievement, Deafness, Hearing Impairments

Peer reviewed
Direct link
