Publication Date
| In 2026 | 0 |
| Since 2025 | 5 |
| Since 2022 (last 5 years) | 45 |
| Since 2017 (last 10 years) | 91 |
| Since 2007 (last 20 years) | 144 |
Descriptor
| Test Format | 418 |
| Test Reliability | 418 |
| Test Validity | 243 |
| Test Construction | 135 |
| Test Items | 119 |
| Higher Education | 88 |
| Multiple Choice Tests | 68 |
| Foreign Countries | 67 |
| Testing | 65 |
| Test Interpretation | 61 |
| Comparative Analysis | 57 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 33 |
| Teachers | 23 |
| Administrators | 18 |
| Researchers | 12 |
| Community | 1 |
| Counselors | 1 |
| Policymakers | 1 |
| Students | 1 |
| Support Staff | 1 |
Location
| New York | 9 |
| Turkey | 8 |
| California | 7 |
| Canada | 6 |
| Japan | 6 |
| Germany | 4 |
| United Kingdom | 4 |
| Georgia | 3 |
| Israel | 3 |
| France | 2 |
| Indonesia | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| Job Training Partnership Act… | 1 |
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Mustafa Ilhan; Nese Güler; Gülsen Tasdelen Teker; Ömer Ergenekon – International Journal of Assessment Tools in Education, 2024
This study aimed to examine the effects of reverse items created with different strategies on psychometric properties and respondents' scale scores. To this end, three versions of a 10-item scale in the research were developed: 10 positive items were integrated in the first form (Form-P) and five positive and five reverse items in the other two…
Descriptors: Test Items, Psychometrics, Scores, Measures (Individuals)
Natalja Menold; Vera Toepoel – Sociological Methods & Research, 2024
Research on mixed devices in web surveys is in its infancy. Using a randomized experiment, we investigated device effects (desktop PC, tablet and mobile phone) for six response formats and four different numbers of scale points. N = 5,077 members of an online access panel participated in the experiment. An exact test of measurement invariance and…
Descriptors: Online Surveys, Handheld Devices, Telecommunications, Test Reliability
Herwin, Herwin; Pristiwaluyo, Triyanto; Ruslan, Ruslan; Dahalan, Shakila Che – Cypriot Journal of Educational Sciences, 2022
The application of multiple-choice tests often does not consider the scoring technique and the number of choices. The study aims at describing the effect of the scoring technique and numerous options towards the reliability of multiple-choice objective tests on social subjects in elementary school. The study is quantitative research with…
Descriptors: Scoring, Multiple Choice Tests, Test Reliability, Elementary School Students
Dambha, Tasneem; Swanepoel, De Wet; Mahomed-Asmail, Faheema; De Sousa, Karina C.; Graham, Marien A.; Smits, Cas – Journal of Speech, Language, and Hearing Research, 2022
Purpose: This study compared the test characteristics, test-retest reliability, and test efficiency of three novel digits-in-noise (DIN) test procedures to a conventional antiphasic 23-trial adaptive DIN (D23). Method: One hundred twenty participants with an average age of 42 years (SD = 19) were included. Participants were tested and retested…
Descriptors: Auditory Tests, Screening Tests, Efficiency, Test Format
Susan K. Johnsen – Gifted Child Today, 2024
The author provides a checklist for educators who are selecting technically adequate tests for identifying and referring students for gifted education services and programs. The checklist includes questions related to how the test was normed, reliability and validity studies as well as questions related to types of scores, administration, and…
Descriptors: Test Selection, Academically Gifted, Gifted Education, Test Validity
Tugra Karademir Coskun; Ayfer Alper – Digital Education Review, 2024
This study aims to examine the potential differences between teacher evaluations and artificial intelligence (AI) tool-based assessment systems in university examinations. The research has evaluated a wide spectrum of exams including numerical and verbal course exams, exams with different assessment styles (project, test exam, traditional exam),…
Descriptors: Artificial Intelligence, Visual Aids, Video Technology, Tests
Practical Considerations in Choosing an Anchor Test Form for Equating under the Random Groups Design
Cui, Zhongmin; He, Yong – Measurement: Interdisciplinary Research and Perspectives, 2023
Careful considerations are necessary when there is a need to choose an anchor test form from a list of old test forms for equating under the random groups design. The choice of the anchor form potentially affects the accuracy of equated scores on new test forms. Few guidelines, however, can be found in the literature on choosing the anchor form.…
Descriptors: Test Format, Equated Scores, Best Practices, Test Construction
Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management
Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023
Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…
Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education
Andrew S. Cale; Elizabeth R. Agosto; Brenda Kucha Anak Ganeng; Megan E. Kruskie; Margaret A. McNulty; Kyle A. Robertson; Cecelia J. Vetter; Sabrina C. Woods; Md. Nazmul Karim; Adam B. Wilson – Anatomical Sciences Education, 2025
To keep pace with medicine's unpredictable changes, medical trainees must learn to accurately monitor and evaluate themselves via metacognition (i.e., thinking about thinking). The Metacognitive Awareness Inventory (MAI) can assess and guide the metacognitive development of trainees. This study summarizes existing psychometric evidence and…
Descriptors: Meta Analysis, Psychometrics, Metacognition, Measures (Individuals)
Sen, Sedat – Creativity Research Journal, 2022
The purpose of this study was to estimate the overall reliability values for the scores produced by Runco Ideational Behavior Scale (RIBS) and explore the variability of RIBS score reliability across studies. To achieve this, a reliability generalization meta-analysis was carried out using the 86 Cronbach's alpha estimates obtained from 77 studies…
Descriptors: Generalization, Creativity, Meta Analysis, Higher Education
VanDerHeyden, Amanda M.; Codding, Robin; Solomon, Benjamin G. – Remedial and Special Education, 2023
Computer-based curriculum-based measurement (CBM) is a relatively common practice, but surprisingly few studies have examined the reliability of computer-based CBM. This study sought to examine the reliability of CBM administered via paper/pencil versus the computer. Twenty-one of 25 students in two third-grade classes (N = 21) participated in two…
Descriptors: Curriculum Based Assessment, Computer Assisted Testing, Test Format, Grade 3
Judy R. Wilkerson; W. Steve Lang; LaSonya Moore – Journal of Research in Education, 2025
The DAATS (Dispositions Assessments Aligned with Teacher Standards) battery is a series of five instruments of different item types that measure teachers' consistency with the critical dispositions embedded in the InTASC Standards. The purpose of this study was to continue a 20-year research project on the development and implementation of…
Descriptors: Educational Assessment, National Standards, Teacher Evaluation, Teacher Competencies
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability

Peer reviewed
Direct link
