Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 26 |
| Since 2017 (last 10 years) | 50 |
| Since 2007 (last 20 years) | 85 |
Descriptor
| Test Format | 243 |
| Test Reliability | 243 |
| Test Validity | 243 |
| Test Construction | 91 |
| Test Items | 60 |
| Testing | 50 |
| Test Interpretation | 43 |
| Higher Education | 41 |
| Language Tests | 38 |
| Standardized Tests | 38 |
| Foreign Countries | 37 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 33 |
| Postsecondary Education | 27 |
| Secondary Education | 18 |
| Elementary Education | 17 |
| Middle Schools | 13 |
| Junior High Schools | 12 |
| Grade 8 | 9 |
| High Schools | 8 |
| Grade 4 | 6 |
| Grade 5 | 6 |
| Grade 7 | 6 |
| More ▼ | |
Audience
| Practitioners | 22 |
| Administrators | 15 |
| Teachers | 14 |
| Researchers | 4 |
| Community | 1 |
| Policymakers | 1 |
| Students | 1 |
| Support Staff | 1 |
Location
| New York | 8 |
| Canada | 3 |
| Israel | 3 |
| Turkey | 3 |
| Georgia | 2 |
| Germany | 2 |
| Indonesia | 2 |
| Iran | 2 |
| Japan | 2 |
| Netherlands | 2 |
| Singapore | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| Job Training Partnership Act… | 1 |
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Natalja Menold; Vera Toepoel – Sociological Methods & Research, 2024
Research on mixed devices in web surveys is in its infancy. Using a randomized experiment, we investigated device effects (desktop PC, tablet and mobile phone) for six response formats and four different numbers of scale points. N = 5,077 members of an online access panel participated in the experiment. An exact test of measurement invariance and…
Descriptors: Online Surveys, Handheld Devices, Telecommunications, Test Reliability
Susan K. Johnsen – Gifted Child Today, 2024
The author provides a checklist for educators who are selecting technically adequate tests for identifying and referring students for gifted education services and programs. The checklist includes questions related to how the test was normed, reliability and validity studies as well as questions related to types of scores, administration, and…
Descriptors: Test Selection, Academically Gifted, Gifted Education, Test Validity
Andrew S. Cale; Elizabeth R. Agosto; Brenda Kucha Anak Ganeng; Megan E. Kruskie; Margaret A. McNulty; Kyle A. Robertson; Cecelia J. Vetter; Sabrina C. Woods; Md. Nazmul Karim; Adam B. Wilson – Anatomical Sciences Education, 2025
To keep pace with medicine's unpredictable changes, medical trainees must learn to accurately monitor and evaluate themselves via metacognition (i.e., thinking about thinking). The Metacognitive Awareness Inventory (MAI) can assess and guide the metacognitive development of trainees. This study summarizes existing psychometric evidence and…
Descriptors: Meta Analysis, Psychometrics, Metacognition, Measures (Individuals)
Judy R. Wilkerson; W. Steve Lang; LaSonya Moore – Journal of Research in Education, 2025
The DAATS (Dispositions Assessments Aligned with Teacher Standards) battery is a series of five instruments of different item types that measure teachers' consistency with the critical dispositions embedded in the InTASC Standards. The purpose of this study was to continue a 20-year research project on the development and implementation of…
Descriptors: Educational Assessment, National Standards, Teacher Evaluation, Teacher Competencies
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability
Cui, Ying; Chen, Fu; Lutsyk, Alina; Leighton, Jacqueline P.; Cutumisu, Maria – Assessment in Education: Principles, Policy & Practice, 2023
With the exponential increase in the volume of data available in the 21st century, data literacy skills have become vitally important in work places and everyday life. This paper provides a systematic review of available data literacy assessments targeted at different audiences and educational levels. The results can help researchers and…
Descriptors: Data, Information Literacy, 21st Century Skills, Competence
Celeste Combrinck – SAGE Open, 2024
We have less time and focus than ever before, while the demand for attention is increasing. Therefore, it is no surprise that when answering questionnaires, we often choose to strongly agree or be neutral, producing problematic and unusable data. The current study investigated forced-choice (ipsative) format compared to the same questions on a…
Descriptors: Likert Scales, Test Format, Surveys, Design
Conoyer, Sarah J.; Wagner, Kyle B.; Janssen, Kristen K.; Jewell, Jeremy D.; McKenney, Elizabeth L. W. – Assessment for Effective Intervention, 2023
As content literacy intervention is expanded in schools, data-based decision-making practices need to also advance, especially in the areas of science. Vocabulary-matching curriculum-based measures (VM-CBM) may allow educators to identify students needing additional support in science vocabulary to assist with using and comprehending disciplinary…
Descriptors: Curriculum Based Assessment, Elementary School Science, Vocabulary, Benchmarking
Muhammed Parviz; Masoud Azizi – Discover Education, 2025
This article offers a critical review of the Ministry of Science, Research, and Technology English Proficiency Test (MSRT), a high-stakes exam required for postgraduate graduation, scholarships, and certain employment positions in Iran. Despite its widespread use, the design and implementation of the MSRT raise concerns about its validity and…
Descriptors: Language Tests, Language Proficiency, English (Second Language), Second Language Learning
Laura A. Outhwaite; Pirjo Aunio; Jaimie Ka Yu Leung; Jo Van Herwegen – Educational Psychology Review, 2024
Successful early mathematical development is vital to children's later education, employment, and wellbeing outcomes. However, established measurement tools are infrequently used to (i) assess children's mathematical skills and (ii) identify children with or at-risk of mathematical learning difficulties. In response, this pre-registered systematic…
Descriptors: Mathematics Tests, Screening Tests, Mathematics Skills, At Risk Students
Cobern, William W.; Adams, Betty A. J. – International Journal of Assessment Tools in Education, 2020
What follows is a practical guide for establishing the validity of a survey for research purposes. The motivation for providing this guide is our observation that researchers, not necessarily being survey researchers per se, but wanting to use a survey method, lack a concise resource on validity. There is far more to know about surveys and survey…
Descriptors: Surveys, Test Validity, Test Construction, Test Items
Duru, Erdinc; Ozgungor, Sevgi; Yildirim, Ozen; Duatepe-Paksu, Asuman; Duru, Sibel – International Journal of Assessment Tools in Education, 2022
The aim of this study is to develop a valid and reliable measurement tool that measures critical thinking skills of university students. Pamukkale Critical Thinking Skills Scale was developed as two separate forms; multiple choice and open-ended. The validity and reliability studies of the multiple-choice form were constructed on two different…
Descriptors: Critical Thinking, Cognitive Measurement, Test Validity, Test Reliability
McLeod, Melissa; Cheng, Liying – Language Assessment Quarterly, 2023
The Canadian English Language Proficiency Index Program (CELPIP) Test was designed for immigration and citizenship in Canada. CELPIP is a computer-based English-language proficiency test which covers all four skills. This test review provides a description of the test and its construct, tasks, and delivery. Then, it appraises CELPIP for…
Descriptors: Language Tests, Language Proficiency, English (Second Language), Second Language Learning
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format

Peer reviewed
Direct link
