Publication Date
In 2025 | 31 |
Since 2024 | 125 |
Since 2021 (last 5 years) | 464 |
Since 2016 (last 10 years) | 869 |
Since 2006 (last 20 years) | 1349 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Practitioners | 195 |
Teachers | 159 |
Researchers | 92 |
Administrators | 49 |
Students | 34 |
Policymakers | 14 |
Parents | 12 |
Counselors | 2 |
Community | 1 |
Media Staff | 1 |
Support Staff | 1 |
More ▼ |
Location
Canada | 62 |
Turkey | 59 |
Germany | 40 |
United Kingdom | 36 |
Australia | 35 |
Japan | 34 |
China | 32 |
United States | 32 |
California | 25 |
United Kingdom (England) | 25 |
Netherlands | 24 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Menold, Natalja; Raykov, Tenko – Educational and Psychological Measurement, 2016
This article examines the possible dependency of composite reliability on presentation format of the elements of a multi-item measuring instrument. Using empirical data and a recent method for interval estimation of group differences in reliability, we demonstrate that the reliability of an instrument need not be the same when polarity of the…
Descriptors: Test Reliability, Test Format, Test Items, Differences
Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F. – Applied Measurement in Education, 2016
The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Descriptors: Item Response Theory, Equated Scores, Test Format, Testing Programs
Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2016
A common suggestion made in the psychometric literature for fixed-length classification tests is that one should design tests so that they have maximum information at the cut score. Designing tests in this way is believed to maximize the classification accuracy and consistency of the assessment. This article uses simulated examples to illustrate…
Descriptors: Cutting Scores, Psychometrics, Test Construction, Classification
Lewis, Kendra M.; Ewers, Timothy; Miller, JoLynn C.; Bird, Marianne; Borba, John; Hill, Russell D.; Rea-Keywood, Jeannette; Shelstad, Nancy; Trzesniewski, Kali – Journal of Extension, 2018
Research on retention in the 4-H youth development program has consistently shown that one of the primary indicators for youths' dropping out of 4-H is being a first-year member. Extension 4-H professionals from California, Idaho, Wyoming, and New Jersey formed a team to study this issue. Our team surveyed first-year members and their…
Descriptors: Youth Programs, Academic Persistence, School Holding Power, Dropout Research
Martin-Raugh, Michelle P.; Anguiano-Carrsaco, Cristina; Jackson, Teresa; Brenneman, Meghan W.; Carney, Lauren; Barnwell, Patrick; Kochert, Jonathan – International Journal of Testing, 2018
Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far…
Descriptors: Test Format, Test Reliability, Test Validity, Predictive Validity
Liu, Yuming; Robin, Frédéric; Yoo, Hanwook; Manna, Venessa – ETS Research Report Series, 2018
The "GRE"® Psychology test is an achievement test that measures core knowledge in 12 content domains that represent the courses commonly offered at the undergraduate level. Currently, a total score and 2 subscores, experimental and social, are reported to test takers as well as graduate institutions. However, the American Psychological…
Descriptors: College Entrance Examinations, Graduate Study, Psychological Testing, Scores
Sinharay, Sandip – Grantee Submission, 2018
Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends…
Descriptors: Test Format, Goodness of Fit, Item Response Theory, Error Patterns
Wang, Ling – Journal of Educational Multimedia and Hypermedia, 2021
Running records is an important reading assessment for diagnosing early readers' needs in diverse instructional settings across grade levels. This study develops an innovative app to help teachers administer running records assessment and investigates teachers' perceptions of its functionality and usability in practical classrooms. The app offers…
Descriptors: Miscue Analysis, Reading Comprehension, Reading Tests, Computer Software
Impact of Background Noise Fluctuation and Reverberation on Response Time in a Speech Reception Task
Prodi, Nicola; Visentin, Chiara – Journal of Speech, Language, and Hearing Research, 2019
Purpose: This study examines the effects of reverberation and noise fluctuation on the response time (RT) to the auditory stimuli in a speech reception task. Method: The speech reception task was presented to 76 young adults with normal hearing in 3 simulated listening conditions (1 anechoic, 2 reverberant). Speechlike stationary and fluctuating…
Descriptors: Acoustics, Reaction Time, Auditory Stimuli, Speech Communication
Steedle, Jeffrey T.; Morrison, Kristin M. – Educational Assessment, 2019
Assessment items are commonly field tested prior to operational use to observe statistical item properties such as difficulty. Item parameter estimates from field testing may be used to assign scores via pre-equating or computer adaptive designs. This study examined differences between item difficulty estimates based on field test and operational…
Descriptors: Field Tests, Test Items, Statistics, Difficulty Level
Neiro, Jakke; Johansson, Niko – LUMAT: International Journal on Math, Science and Technology Education, 2020
The history and evolution of science assessment remains poorly known, especially in the context of the exam question contents. Here we analyze the Finnish matriculation examination in biology from the 1920s to 1960s to understand how the exam has evolved in both its knowledge content and educational form. Each question was classified according to…
Descriptors: Foreign Countries, Biology, Test Content, Test Format
Shar, Kelli; Russ, Rosemary S.; Laverty, James T. – Physical Review Physics Education Research, 2020
Assessments are usually thought of as ways for instructors to get information from students. In this work, we flip this perspective and explore how assessments communicate information to students. Specifically, we consider how assessments may provide information about what faculty and/or researchers think it means to know and do physics, i.e.,…
Descriptors: Epistemology, Science Instruction, Physics, Science Tests
Nakata, Tatsuya – Studies in Second Language Acquisition, 2017
Although research shows that repetition increases second language vocabulary learning, only several studies have examined the long-term effects of increasing retrieval frequency in one learning session. With this in mind, the present study examined the effects of within-session repeated retrieval on vocabulary learning. The study is original in…
Descriptors: Repetition, Second Language Learning, Vocabulary Development, English
Höhne, Jan Karem; Schlosser, Stephan; Krebs, Dagmar – Field Methods, 2017
Measuring attitudes and opinions employing agree/disagree (A/D) questions is a common method in social research because it appears to be possible to measure different constructs with identical response scales. However, theoretical considerations suggest that A/D questions require a considerable cognitive processing. Item-specific (IS) questions,…
Descriptors: Online Surveys, Test Format, Test Items, Difficulty Level
Liu, Chen-Wei; Wang, Wen-Chung – Journal of Educational Measurement, 2017
The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…
Descriptors: Item Response Theory, Models, Maximum Likelihood Statistics, Data Analysis