ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	4

Descriptor

Interrater Reliability	12
Test Format	12
Test Items	12
Language Tests	5
Test Construction	4
English (Second Language)	3
Foreign Countries	3
Rating Scales	3
Test Reliability	3
Comparative Analysis	2
Construct Validity	2
Content Validity	2
Difficulty Level	2
Elementary School Students	2
Evaluators	2
Models	2
Multiple Choice Tests	2
Scoring	2
Test Use	2
Test Validity	2
Testing	2
Testing Problems	2
Achievement Tests	1
Acoustics	1
Adults	1
More ▼

Source

Applied Measurement in…	2
Journal of Communication…	1
Language Education &…	1
Online Submission	1
Pegem Journal of Education…	1
Theory and Research in…	1

Publication Type

Reports - Research	7
Journal Articles	5
Speeches/Meeting Papers	3
Reports - Evaluative	2
Collected Works - Serials	1
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Elementary Education	2
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Netherlands	1
Switzerland	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

The Development of a Test to Explore the Students' Mental Models and External Representation Patterns of Hanging Objects

Peer reviewed
PDF on ERIC

Download full text

Kaharu, Sarintan N.; Mansyur, Jusman – Pegem Journal of Education and Instruction, 2021

This study aims to develop a test that can be used to explore mental models and representation patterns of objects in liquid fluid. The test developed by adapting the Reeves's Development Model was carried out in several stages, namely: determining the orientation and test segments; initial survey; preparation of the initial draft; try out;…

Descriptors: Test Construction, Schemata (Cognition), Scientific Concepts, Water

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Sign Language Learning and Assessment in German Switzerland: Exploring the Potential of Vocabulary Size Tests for Swiss German Sign Language

Peer reviewed
PDF on ERIC

Download full text

Haug, Tobias; Ebling, Sarah; Braem, Penny Boyes; Tissi, Katja; Sidler-Miserez, Sandra – Language Education & Assessment, 2019

In German Switzerland the learning and assessment of Swiss German Sign Language ("Deutschschweizerische Gebärdensprache," DSGS) takes place in different contexts, for example, in tertiary education or in continuous education courses. By way of the still ongoing implementation of the Common European Framework of Reference for DSGS,…

Descriptors: German, Sign Language, Language Tests, Test Items

Age, Task Characteristics, and Acoustic Indicators of Engagement: Investigations into the Validity of a Technology-Enhanced Speaking Test for Young Language Learners

Download full text

Edward Paul Getman – Online Submission, 2020

Despite calls for engaging assessments targeting young language learners (YLLs) between 8 and 13 years old, what makes assessment tasks engaging and how such task characteristics affect measurement quality have not been well studied empirically. Furthermore, there has been a dearth of validity research about technology-enhanced speaking tests for…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Learner Engagement

Detecting Intrajudge Inconsistency in Standard Setting Using Test Items with a Selected-Response Format. Research Report.

Download full text

van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000

In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…

Descriptors: Interrater Reliability, Judges, Probability, Standard Setting

A Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)

Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Reliability of the Test of Spoken English Revisited. Research Reports, Report 40.

Download full text

Boldt, R. F. – 1992

The Test of Spoken English (TSE) is an internationally administered instrument for assessing nonnative speakers' proficiency in speaking English. The research foundation of the TSE examination described in its manual refers to two sources of variation other than the achievement being measured: interrater reliability and internal consistency.…

Descriptors: Adults, Analysis of Variance, Interrater Reliability, Language Proficiency

Language Test Construction and Evaluation.

Alderson, J. Charles; And Others – 1995

The guide is intended for teachers who must construct language tests and for other professionals who may need to construct, evaluate, or use the results of language tests. Most examples are drawn from the field of English-as-a-Second-Language instruction in the United Kingdom, but the principles and practices described may be applied to the…

Descriptors: Educational Trends, English (Second Language), Interrater Reliability, Language Tests

Educational Measurement and Knowledge of Other Minds

Peer reviewed

Direct link

Curren, Randall R. – Theory and Research in Education, 2004

This article addresses the capacity of high stakes tests to measure the most significant kinds of learning. It begins by examining a set of philosophical arguments pertaining to construct validity and alleged conceptual obstacles to attributing specific knowledge and skills to learners. The arguments invoke philosophical doctrines of holism and…

Descriptors: Test Items, Educational Testing, Construct Validity, High Stakes Tests

Model Responses for Examinations with Open-Ended Questions.

Kreeft, Henk; Sanders, Piet – 1983

In the Dutch national examinations, reading comprehension tests are used for all languages. For the native language, reading comprehension is tested with reading passages and related questions to which the test-taker provides his own response, not choosing from a group of alternatives. One problem encountered in testing with these items is…

Descriptors: Dutch, Evaluation Methods, Evaluators, Foreign Countries

Involving Factors of Fairness in Language Testing.

Download full text

Nakamura, Yuji – Journal of Communication Studies, 1997

This study investigated the effects of three aspects of language testing (test task, familiarity with an interviewer, and test method) on both tester and tested. Data were drawn from several previous studies by the researcher. Concerning test task, data were analyzed for the type of topic students wanted most to talk about or preferred not to talk…

Descriptors: Behavior Patterns, Comparative Analysis, English (Second Language), Interrater Reliability

Variation among Examiners and Protocols on Oral Examinations.

Lunz, Mary E.; And Others – 1989

A method for understanding and controlling the multiple facets of an oral examination (OE) or other judge-intermediated examination is presented and illustrated. This study focused on determining the extent to which the facets model (FM) analysis constructs meaningful variables for each facet of an OE involving protocols, examiners, and…

Descriptors: Computer Software, Difficulty Level, Evaluators, Examiners

Alderson, J. Charles	1
Boldt, R. F.	1
Boyer, Michelle	1
Braem, Penny Boyes	1
Chang, Lei	1
Curren, Randall R.	1
Downing, Steven M.	1
Ebling, Sarah	1
Edward Paul Getman	1
Haladyna, Thomas M.	1
Haug, Tobias	1
Kaharu, Sarintan N.	1
Kieftenbeld, Vincent	1
Kreeft, Henk	1
Lunz, Mary E.	1
Mansyur, Jusman	1
Nakamura, Yuji	1
Sanders, Piet	1
Sidler-Miserez, Sandra	1
Tissi, Katja	1
Vos, Hans J.	1
van der Linden, Wim J.	1
More ▼