Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 4 |
| Since 2007 (last 20 years) | 4 |
Descriptor
| Interrater Reliability | 12 |
| Test Format | 12 |
| Test Items | 12 |
| Language Tests | 5 |
| Test Construction | 4 |
| English (Second Language) | 3 |
| Foreign Countries | 3 |
| Rating Scales | 3 |
| Test Reliability | 3 |
| Comparative Analysis | 2 |
| Construct Validity | 2 |
| More ▼ | |
Source
| Applied Measurement in… | 2 |
| Journal of Communication… | 1 |
| Language Education &… | 1 |
| Online Submission | 1 |
| Pegem Journal of Education… | 1 |
| Theory and Research in… | 1 |
Author
| Alderson, J. Charles | 1 |
| Boldt, R. F. | 1 |
| Boyer, Michelle | 1 |
| Braem, Penny Boyes | 1 |
| Chang, Lei | 1 |
| Curren, Randall R. | 1 |
| Downing, Steven M. | 1 |
| Ebling, Sarah | 1 |
| Edward Paul Getman | 1 |
| Haladyna, Thomas M. | 1 |
| Haug, Tobias | 1 |
| More ▼ | |
Publication Type
Audience
Location
| Netherlands | 1 |
| Switzerland | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Kaharu, Sarintan N.; Mansyur, Jusman – Pegem Journal of Education and Instruction, 2021
This study aims to develop a test that can be used to explore mental models and representation patterns of objects in liquid fluid. The test developed by adapting the Reeves's Development Model was carried out in several stages, namely: determining the orientation and test segments; initial survey; preparation of the initial draft; try out;…
Descriptors: Test Construction, Schemata (Cognition), Scientific Concepts, Water
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Haug, Tobias; Ebling, Sarah; Braem, Penny Boyes; Tissi, Katja; Sidler-Miserez, Sandra – Language Education & Assessment, 2019
In German Switzerland the learning and assessment of Swiss German Sign Language ("Deutschschweizerische Gebärdensprache," DSGS) takes place in different contexts, for example, in tertiary education or in continuous education courses. By way of the still ongoing implementation of the Common European Framework of Reference for DSGS,…
Descriptors: German, Sign Language, Language Tests, Test Items
Edward Paul Getman – Online Submission, 2020
Despite calls for engaging assessments targeting young language learners (YLLs) between 8 and 13 years old, what makes assessment tasks engaging and how such task characteristics affect measurement quality have not been well studied empirically. Furthermore, there has been a dearth of validity research about technology-enhanced speaking tests for…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Learner Engagement
van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
Descriptors: Interrater Reliability, Judges, Probability, Standard Setting
Peer reviewedHaladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests
Boldt, R. F. – 1992
The Test of Spoken English (TSE) is an internationally administered instrument for assessing nonnative speakers' proficiency in speaking English. The research foundation of the TSE examination described in its manual refers to two sources of variation other than the achievement being measured: interrater reliability and internal consistency.…
Descriptors: Adults, Analysis of Variance, Interrater Reliability, Language Proficiency
Alderson, J. Charles; And Others – 1995
The guide is intended for teachers who must construct language tests and for other professionals who may need to construct, evaluate, or use the results of language tests. Most examples are drawn from the field of English-as-a-Second-Language instruction in the United Kingdom, but the principles and practices described may be applied to the…
Descriptors: Educational Trends, English (Second Language), Interrater Reliability, Language Tests
Curren, Randall R. – Theory and Research in Education, 2004
This article addresses the capacity of high stakes tests to measure the most significant kinds of learning. It begins by examining a set of philosophical arguments pertaining to construct validity and alleged conceptual obstacles to attributing specific knowledge and skills to learners. The arguments invoke philosophical doctrines of holism and…
Descriptors: Test Items, Educational Testing, Construct Validity, High Stakes Tests
Kreeft, Henk; Sanders, Piet – 1983
In the Dutch national examinations, reading comprehension tests are used for all languages. For the native language, reading comprehension is tested with reading passages and related questions to which the test-taker provides his own response, not choosing from a group of alternatives. One problem encountered in testing with these items is…
Descriptors: Dutch, Evaluation Methods, Evaluators, Foreign Countries
Nakamura, Yuji – Journal of Communication Studies, 1997
This study investigated the effects of three aspects of language testing (test task, familiarity with an interviewer, and test method) on both tester and tested. Data were drawn from several previous studies by the researcher. Concerning test task, data were analyzed for the type of topic students wanted most to talk about or preferred not to talk…
Descriptors: Behavior Patterns, Comparative Analysis, English (Second Language), Interrater Reliability
Lunz, Mary E.; And Others – 1989
A method for understanding and controlling the multiple facets of an oral examination (OE) or other judge-intermediated examination is presented and illustrated. This study focused on determining the extent to which the facets model (FM) analysis constructs meaningful variables for each facet of an OE involving protocols, examiners, and…
Descriptors: Computer Software, Difficulty Level, Evaluators, Examiners

Direct link
