Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 5 |
Descriptor
| Interrater Reliability | 16 |
| Test Reliability | 16 |
| Testing | 16 |
| Test Validity | 10 |
| English (Second Language) | 7 |
| Foreign Countries | 7 |
| Language Tests | 6 |
| Test Construction | 6 |
| Test Items | 6 |
| Scoring | 5 |
| Second Language Instruction | 5 |
| More ▼ | |
Source
Author
| Adams, R. J. | 1 |
| Alderson, J. Charles | 1 |
| Botting, Nicola | 1 |
| Dodd, Barbara | 1 |
| Griffin, Patrick | 1 |
| Haertel, Edward H. | 1 |
| Hasson, Natalie | 1 |
| Komperda, Regis | 1 |
| Lazenby, Katherine | 1 |
| Marcroft, Tina A. | 1 |
| McNamara, T. F. | 1 |
| More ▼ | |
Publication Type
Education Level
| Early Childhood Education | 1 |
| Elementary Education | 1 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| High Schools | 1 |
| Higher Education | 1 |
| Intermediate Grades | 1 |
| More ▼ | |
Audience
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
| Clinical Evaluation of… | 1 |
| International English… | 1 |
| Raven Progressive Matrices | 1 |
| Strengths and Difficulties… | 1 |
| Test of English as a Foreign… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Hasson, Natalie; Dodd, Barbara; Botting, Nicola – International Journal of Language & Communication Disorders, 2012
Background: Sentence construction and syntactic organization are known to be poor in children with specific language impairments (SLI), but little is known about the way in which children with SLI approach language tasks, and static standardized tests contribute little to the differentiation of skills within the population of children with…
Descriptors: Alternative Assessment, Sentence Structure, Syntax, Language Processing
Zhao, Zhongbao – RELC Journal: A Journal of Language Teaching and Research, 2013
This study investigates the validity of the Diagnostic College English Speaking Test (DCEST) in the context of EFL teaching and learning in China. The experiment was conducted in three stages over the course of eight weeks at a national key university in China. By means of test administration and questionnaire survey, the researcher gathered…
Descriptors: Oral Language, Construct Validity, Language Tests, Diagnostic Tests
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation
Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004
There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…
Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability
Powell, Thomas W. – Clinical Linguistics & Phonetics, 2006
The third edition of the "Boston Diagnostic Aphasia Examination" (Goodglass, Kaplan, and Barresi) introduced standardized procedures for coding discourse samples elicited using the well known Cookie Theft illustration. To evaluate the reliability of this discourse coding procedure, a transcribed sample was coded by 14 novice examiners…
Descriptors: Examiners, Interrater Reliability, Test Reliability, Aphasia
Peer reviewedStewart, Krista J. – Psychology in the Schools, 1987
Evaluated the technical aspects of three Wechsler Intelligence Scale for Children-Revised (WISC-R) administrations of five psychology graduate students using the WISC-R Administration Observational Checklist (WAOC) to evaluate interrater agreement. Students performed significantly better on the second than on the first observation, with…
Descriptors: Educational Diagnosis, Error Patterns, Examiners, Graduate Students
McNamara, T. F.; Adams, R. J. – 1991
A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability
Peer reviewedSmith, Richard Merrill – Academic Medicine, 1993
A University of Hawaii study compared objective and subjective assessments of the three-step triple jump examination which tests medical students' clinical problem-solving processes. Subjects were 58 first-year students. Results found the subjective assessments were more consistent across problems of varying difficulty level than were objective…
Descriptors: Case Studies, Difficulty Level, Higher Education, Interrater Reliability
Griffin, Patrick – 1990
Results of the International English Language Testing System (IELTS) battery trials in Australia are reported. The IELTS tests of productive language skills use direct assessment strategies and subjective scoring according to detailed guidelines. The receptive skills tests use indirect assessment strategies and clerical scoring procedures.…
Descriptors: English (Second Language), Foreign Countries, Grammar, Interrater Reliability
Alderson, J. Charles; And Others – 1995
The guide is intended for teachers who must construct language tests and for other professionals who may need to construct, evaluate, or use the results of language tests. Most examples are drawn from the field of English-as-a-Second-Language instruction in the United Kingdom, but the principles and practices described may be applied to the…
Descriptors: Educational Trends, English (Second Language), Interrater Reliability, Language Tests
Salies, Tania Gastao – 1998
A discussion of the evaluation of writing, particularly in English as a Second Language, argues for a communicative approach reflecting the current approach to language teaching and learning. The movement toward more communication-oriented and more valid language testing is examined briefly, and direct assessment is chosen as the preferred format…
Descriptors: Communicative Competence (Languages), English (Second Language), Evaluation Criteria, Foreign Countries
Peer reviewedTurner, Jean – Annual Review of Applied Linguistics, 1998
This review of research on second-language oral testing outlines the nature of early research in interview-format proficiency testing, then reports on new directions in investigation of construct validity of interview-format and other oral skills tests through examination of examinee, interviewer, and rater performance. Research on empirically…
Descriptors: Construct Validity, Educational Trends, Interrater Reliability, Interviews
Strong, Gregory – Thought Currents in English Literature, 1995
This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…
Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries
Previous Page | Next Page ยป
Pages: 1 | 2
Direct link
