Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 9 |
| Since 2017 (last 10 years) | 20 |
| Since 2007 (last 20 years) | 30 |
Descriptor
| Evaluation Methods | 57 |
| Language Tests | 57 |
| Test Reliability | 29 |
| Test Validity | 25 |
| Second Language Learning | 24 |
| English (Second Language) | 20 |
| Language Proficiency | 19 |
| Interrater Reliability | 18 |
| Foreign Countries | 15 |
| Evaluators | 14 |
| Reliability | 14 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 8 |
| Postsecondary Education | 7 |
| Elementary Education | 4 |
| Early Childhood Education | 2 |
| Secondary Education | 2 |
| Grade 1 | 1 |
| Grade 2 | 1 |
| High Schools | 1 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
| Primary Education | 1 |
| More ▼ | |
Audience
| Researchers | 4 |
| Practitioners | 2 |
| Students | 1 |
| Teachers | 1 |
Location
| Australia | 2 |
| Vietnam | 2 |
| China | 1 |
| Egypt | 1 |
| Europe | 1 |
| Japan | 1 |
| Nebraska | 1 |
| Netherlands | 1 |
| New Zealand | 1 |
| Oman | 1 |
| Pennsylvania | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Clinical Evaluation of… | 2 |
| Test of English as a Foreign… | 2 |
| Graduate Record Examinations | 1 |
| Test for Auditory… | 1 |
| Woodcock Reading Mastery Test | 1 |
What Works Clearinghouse Rating
Bijani, Houman; Hashempour, Bahareh; Ibrahim, Khaled Ahmed Abdel-Al; Orabah, Salim Said Bani; Heydarnejad, Tahereh – Language Testing in Asia, 2022
Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently,…
Descriptors: Oral Language, Language Tests, Feedback (Response), Bias
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Magdalena Luniewska; Magdalena Krysztofiak; Weronika Bialek; Martyna Burdach; Ewa Komorowska; Grzegorz Krajewski; Judyta Pacewicz; Julia Radzikowska; Nina Gram Garmann; Ewa Haman – First Language, 2025
Vocabulary assessment is an important part of measuring language proficiency in both monolingual and bilingual children. The LITMUS Cross-Linguistic Lexical Tasks (CLT) provides a framework for assessing the vocabulary of monolingual and bilingual children using a standardized procedure and comparable stimuli across languages. All language…
Descriptors: Task Analysis, Contrastive Linguistics, Monolingualism, Vocabulary Development
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Tu, Thuy Thi Minh – ProQuest LLC, 2023
The study aimed to elicit information from Vietnamese EFL university instructors about their knowledge and skills regarding the principles, theory, and practices of language assessment by means of revision and validation of the Language Assessment Literacy--Revised Vietnam (LAL-RV), which was previously developed by Kremmel and Harding (2020). A…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, College Faculty
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Marshall, Neil; Shaw, Kirsten; Hunter, Jodie; Jones, Ian – New Zealand Journal of Educational Studies, 2020
There is growing interest in using comparative judgement to assess student work as an alternative to traditional marking. Comparative judgement requires no rubrics and is instead grounded in experts making pairwise judgements about the relative 'quality' of students' work according to a high level criterion. The resulting decision data are fitted…
Descriptors: Comparative Analysis, Decision Making, Student Evaluation, Evaluation Methods
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
James Dean Brown; Ali Panahi; Hassan Mohebbi – Language Teaching Research Quarterly, 2023
Panahi and Mohebbi review James Dean Brown's 50-years of research in language testing, curriculum development and research statistics with reference to an impressionistic framework for analysis containing two components with their subcomponents: Annotations (i.e., briefing and implications) and main concepts and themes (i.e., testing and teaching…
Descriptors: Second Language Learning, Second Language Instruction, Language Tests, Curriculum Development
Hunt, Emily; Nang, Charn; Meldrum, Suzanne; Armstrong, Elizabeth – Language, Speech, and Hearing Services in Schools, 2022
Purpose: Multilingual children are disproportionately represented on speech pathology caseloads, in part due to the limited ability of traditional language assessments to accurately capture multilingual children's language abilities. This systematic review evaluates the evidence for identification of language disorder in multilingual children…
Descriptors: Multilingualism, Speech Language Pathology, Language Tests, Diagnostic Tests
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Nebraska Department of Education, 2024
The Nebraska Student-Centered Assessment System (NSCAS) is a statewide assessment system that embodies Nebraska's holistic view of students and helps them prepare for success in postsecondary education, career, and civic life. It uses multiple measures throughout the year to provide educators and decision-makers at all levels with the insights…
Descriptors: Student Evaluation, Evaluation Methods, Elementary School Students, Middle School Students
Allehaiby, Wid Hasen; Al-Bahlani, Sara – Arab World English Journal, 2021
One of the main challenges higher educational institutions encounter amid the recent COVID-19 crisis is transferring assessment approaches from the traditional face-to-face form to the online Emergency Remote Teaching approach. A set of language assessment principles, practicality, reliability, validity, authenticity, and washback, which can be…
Descriptors: Barriers, Distance Education, Evaluation Methods, Teaching Methods

Peer reviewed
Direct link
