Publication Date
| In 2026 | 0 |
| Since 2025 | 18 |
| Since 2022 (last 5 years) | 120 |
| Since 2017 (last 10 years) | 262 |
| Since 2007 (last 20 years) | 435 |
Descriptor
| Test Format | 956 |
| Test Items | 956 |
| Test Construction | 363 |
| Multiple Choice Tests | 260 |
| Foreign Countries | 227 |
| Difficulty Level | 199 |
| Higher Education | 179 |
| Computer Assisted Testing | 160 |
| Item Response Theory | 151 |
| Item Analysis | 149 |
| Scores | 146 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 62 |
| Teachers | 47 |
| Researchers | 32 |
| Students | 15 |
| Administrators | 13 |
| Parents | 6 |
| Policymakers | 5 |
| Community | 1 |
| Counselors | 1 |
Location
| Turkey | 27 |
| Canada | 15 |
| Germany | 15 |
| Australia | 13 |
| Israel | 13 |
| Japan | 12 |
| Netherlands | 10 |
| United Kingdom | 10 |
| United States | 9 |
| Arizona | 6 |
| Iran | 6 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 2 |
| No Child Left Behind Act 2001 | 2 |
| Elementary and Secondary… | 1 |
| Head Start | 1 |
| Job Training Partnership Act… | 1 |
| Perkins Loan Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Moon, Jung Aa; Sinharay, Sandip; Keehner, Madeleine; Katz, Irvin R. – International Journal of Testing, 2020
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants' tendency to respond to an item was affected by the presence of a grid and variations of…
Descriptors: Computer Assisted Testing, Multiple Choice Tests, Test Wiseness, Psychometrics
Remizova, Alisa; Rudnev, Maksim – International Journal of Social Research Methodology, 2020
The justifiability scale (JS) is widely used to measure individual and country differences in moral attitudes. However, the validity of the instrument has been barely assessed. The current study addressed the concurrent and content validity of four popular JS items (justifiability of homosexuality, suicide, prostitution, and euthanasia). A sample…
Descriptors: Moral Values, Content Validity, Attitude Measures, Foreign Countries
Bulut, Okan; Bulut, Hatice Cigdem; Cormier, Damien C.; Ilgun Dibek, Munevver; Sahin Kursad, Merve – Educational Assessment, 2023
Some statewide testing programs allow students to receive corrective feedback and revise their answers during testing. Despite its pedagogical benefits, the effects of providing revision opportunities remain unknown in the context of alternate assessments. Therefore, this study examined student data from a large-scale alternate assessment that…
Descriptors: Error Correction, Alternative Assessment, Feedback (Response), Multiple Choice Tests
Tim Stoeckel; Tomoko Ishii – Vocabulary Learning and Instruction, 2024
In an upcoming coverage-comprehension study, we plan to assess learners' meaning-recall knowledge of words as they occur in the study's reading passage. As several meaning-recall test formats exist, the purpose of this small-scale study (N = 10) was to determine which of three formats was most similar to a criterion interview regarding mean score…
Descriptors: Vocabulary Development, Language Tests, Second Language Learning, Classification
Magdalen Beiting-Parrish – ProQuest LLC, 2022
The following is a five-chapter dissertation surrounding the use of text mining techniques for better understanding the language of mathematics items from standardized tests to improve linguistic equity of these items to support assessment of English Language Learners. Introduction: The dissertation begins with an overview of the problem that…
Descriptors: Mathematics Tests, Test Items, Item Analysis, Standardized Tests
Opstad, Leiv – Athens Journal of Education, 2021
The discussion of whether multiple-choice questions can replace the traditional exam with essays and constructed questions in introductory courses has just started in Norway. There is not an easy answer. The findings depend on the pattern of the questions. Therefore, one must be careful in drawing conclusions. In this research, one will explore a…
Descriptors: Multiple Choice Tests, Essay Tests, Introductory Courses, Foreign Countries
Sutadji, Eddy; Susilo, Herawati; Wibawa, Aji Prasetya; Jabari, Nidal A. M.; Rohmad, Syaiful Nur – Education Sciences, 2021
Assessment methods are important to create qualified graduates who are ready to face the real world. Authentic assessment is considered to be the most effective method to achieve this. The application of authentic assessment is often universal. However, there is a difference between natural sciences and social sciences. If it is used for different…
Descriptors: Performance Based Assessment, Natural Sciences, Social Sciences, College Faculty
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018
The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…
Descriptors: Programming, Automation, Test Items, Test Format
Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020
Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…
Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making
Kurnaz-Adibatmaz, Fatma Betül; Yildiz, Hüseyin – Journal of Theoretical Educational Science, 2020
In this study logistic regression and Lord's Chi Square methods were used to research the items that have DIF. The study utilized Peabody Picture Vocabulary Test (PPVT). The original form of the PPVT includes four options. Three different forms (A, B and C) were formed by removing one of the distractors respectively. The original form of PPVT was…
Descriptors: Item Analysis, Test Items, Vocabulary, Verbal Ability
Öztürk, Nagihan Boztunç – Universal Journal of Educational Research, 2019
In this study, how the length and characteristics of routing module in different panel designs affect measurement precision is examined. In the scope of the study, six different routing module length, nine different routing module characteristics, and two different panel design are handled. At the end of the study, the effects of conditions on…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Length, Test Format
NWEA, 2022
This technical report documents the processes and procedures employed by NWEA® to build and support the English MAP® Reading Fluency™ assessments administered during the 2020-2021 school year. It is written for measurement professionals and administrators to help evaluate the quality of MAP Reading Fluency. The seven sections of this report: (1)…
Descriptors: Achievement Tests, Reading Tests, Reading Achievement, Reading Fluency
Jeffrey Martin – Vocabulary Learning and Instruction, 2022
The functioning of a vocabulary testing instrument rests in part on the test-taking actions made possible for examinees by item format, an aspect of test development that warrants consideration in second-language vocabulary research. For example, although iterations of the written receptive vocabulary levels test (VLT) have integrated improvements…
Descriptors: Test Wiseness, Vocabulary, Vocabulary Development, Second Language Learning
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability

Peer reviewed
Direct link
