Publication Date
| In 2026 | 0 |
| Since 2025 | 14 |
| Since 2022 (last 5 years) | 62 |
| Since 2017 (last 10 years) | 133 |
| Since 2007 (last 20 years) | 419 |
Descriptor
| Item Analysis | 957 |
| Test Validity | 957 |
| Test Reliability | 535 |
| Test Construction | 425 |
| Test Items | 303 |
| Foreign Countries | 210 |
| Factor Analysis | 200 |
| Psychometrics | 169 |
| Correlation | 116 |
| Statistical Analysis | 110 |
| Achievement Tests | 109 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| Turkey | 52 |
| Canada | 15 |
| Iran | 11 |
| Australia | 10 |
| China | 10 |
| California | 7 |
| India | 7 |
| Indonesia | 7 |
| United Kingdom | 7 |
| Florida | 6 |
| Japan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 5 |
| Individuals with Disabilities… | 4 |
| Elementary and Secondary… | 1 |
| Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Akhtar, Hanif – International Association for Development of the Information Society, 2022
When examinees perceive a test as low stakes, it is logical to assume that some of them will not put out their maximum effort. This condition makes the validity of the test results more complicated. Although many studies have investigated motivational fluctuation across tests during a testing session, only a small number of studies have…
Descriptors: Intelligence Tests, Student Motivation, Test Validity, Student Attitudes
Brigid Garvin – ProQuest LLC, 2021
Autism Spectrum Disorder (ASD) is diagnosed using the same criteria for males and females (e.g., DSM-5, ICD-10). Our understanding of ASD, including its etiology, symptom presentation, and prevalence has evolved significantly over time motivating several changes to the diagnostic criteria and the tools with which symptoms are measured. One aspect…
Descriptors: Preschool Children, Autism Spectrum Disorders, Diagnostic Tests, Observation
Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023
This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…
Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Ismail, Fouzul Kareema Mohamed; Zubairi, Ainol Madziah Bt. – English Language Teaching, 2022
This paper presents the findings of a study that intended to seek the content validity (CV) evidence of an instrument to measure the reading ability of university students in Sri Lanka. The reading passages and items were adapted from CEFR aligned Learning Resource Network (LRN) materials. The items were designed based on the cognitive processing…
Descriptors: Foreign Countries, Test Items, Content Validity, Reading Tests
Mardiana – Eurasian Journal of Applied Linguistics, 2023
Written inquiries, which are more frequent and have less of a focus on complex thinking, are issues at school. Students are not taught how to respond to questions found in High-Level Thinking Skills (HOTS) tests, hence, their thinking abilities are generally weak. The issue for teachers is that neither they nor anyone else has been able to create…
Descriptors: Skill Development, Thinking Skills, Check Lists, Models
Bokander, Lars; Bylund, Emanuel – Language Learning, 2020
Over the past decade, the LLAMA language aptitude test battery has come to play an increasingly important role as an instrument in research on individual differences in language development. However, a potentially serious problem that has been pointed out by several scholars is that the LLAMA has not yet been carefully validated. We addressed this…
Descriptors: Item Analysis, Language Tests, Test Items, Individual Differences
Parry, James R. – Online Submission, 2020
This paper presents research and provides a method to ensure that parallel assessments, that are generated from a large test-item database, maintain equitable difficulty and content coverage each time the assessment is presented. To maintain fairness and validity it is important that all instances of an assessment, that is intended to test the…
Descriptors: Culture Fair Tests, Difficulty Level, Test Items, Test Validity
Kimmia Lyon; Jessica B. Koslouski; Sandra M. Chafouleas; Amy M. Briesch; Jacqueline M. Caemmerer – Grantee Submission, 2025
Existing educational assessments have typically been developed without appropriate attention to the intended and unintended consequences of measure implementation and interpretation. We are developing the Expanding Screening to Support Youth (ESSY) Whole Child Screener using a mixed methods approach that attends to the intended and unintended…
Descriptors: Student Attitudes, Screening Tests, Validity, Grade 3
Kimmia Lyon; Jessica B. Koslouski; Sandra M. Chafouleas; Amy M. Briesch; Jacqueline M. Caemmerer – School Mental Health, 2025
Existing educational assessments have typically been developed without appropriate attention to the intended and unintended consequences of measure implementation and interpretation. We are developing the Expanding Screening to Support Youth (ESSY) Whole Child Screener using a mixed methods approach that attends to the intended and unintended…
Descriptors: Student Attitudes, Screening Tests, Validity, Grade 3
Pablo Robles-García; Stuart McLean; Jeffrey Stewart; Ji-young Shin; Claudia Helena Sánchez-Gutiérrez – Language Assessment Quarterly, 2024
Recent literature in the field of L2 vocabulary assessment has advocated for the development of written receptive vocabulary tests such as Vocabulary Levels Tests (VLTs) that use: (a) meaning-recall item formats, (b) a minimum of 40 item counts per 1,000-frequency band to improve level estimates, and (c) lemmas (not word-families) as the lexical…
Descriptors: Spanish, Test Validity, Test Construction, Vocabulary Development
Laliyo, Lukman Abdul Rauf; Hamdi, Syukrul; Pikoli, Masrid; Abdullah, Romario; Panigoro, Citra – European Journal of Educational Research, 2021
One of the issues that hinder the students' learning progress is the inability to construct an epistemological explanation of a scientific phenomenon. Four-tier multiple-choice (hereinafter, 4TMC) instrument and Partial-Credit Model were employed to elaborate on the diagnosis process of the aforementioned problem. This study was to develop and…
Descriptors: Learning Processes, Multiple Choice Tests, Models, Test Items
Thapelo Ncube Whitfield – ProQuest LLC, 2021
Student Experience surveys are used to measure student attitudes towards their campus as well as to initiate conversations for institutional change. Validity evidence to support the interpretations of these surveys' results, however, is lacking. The first purpose of this study was to compare three Differential Item Functioning (DIF) methods on…
Descriptors: College Students, Student Surveys, Student Experience, Student Attitudes
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
Patrisius Istiarto Djiwandono; Daniel Ginting – Language Education & Assessment, 2025
The teaching of English as a foreign language in Indonesia has a long history, and it is always important to ask whether the assessment of the students' language skills has been valid and reliable. A screening of many articles in several prominent databases reveal that a number of evaluation studies have been done by Indonesian scholars in the…
Descriptors: Foreign Countries, Language Tests, English (Second Language), Second Language Learning

Peer reviewed
Direct link
