NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 68 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019
Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…
Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Peer reviewed Peer reviewed
Direct linkDirect link
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Peer reviewed Peer reviewed
Direct linkDirect link
Boris, Ashley L.; Awadalla, Nardeen; Martin, Toby L.; Martin, Garry L.; Kaminski, Lauren; Miljkovic, Morena – Education and Training in Autism and Developmental Disabilities, 2015
The Assessment of Basic Learning Abilities (ABLA) is a tool that is used to assess the learning ability of individuals with intellectual disability (ID) and children with autism. The ABLA was recently revised and is now referred to as the ABLA-Revised (ABLA-R). A self-instructional manual was prepared to teach individuals how to administer the…
Descriptors: Guides, Academic Ability, Intellectual Disability, Autism
Zhao, Zhongbao – RELC Journal: A Journal of Language Teaching and Research, 2013
This study investigates the validity of the Diagnostic College English Speaking Test (DCEST) in the context of EFL teaching and learning in China. The experiment was conducted in three stages over the course of eight weeks at a national key university in China. By means of test administration and questionnaire survey, the researcher gathered…
Descriptors: Oral Language, Construct Validity, Language Tests, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Hasson, Natalie; Dodd, Barbara; Botting, Nicola – International Journal of Language & Communication Disorders, 2012
Background: Sentence construction and syntactic organization are known to be poor in children with specific language impairments (SLI), but little is known about the way in which children with SLI approach language tasks, and static standardized tests contribute little to the differentiation of skills within the population of children with…
Descriptors: Alternative Assessment, Sentence Structure, Syntax, Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Reed, Deborah K.; Sturges, Keith M. – Remedial and Special Education, 2013
Researchers have expressed concern about "implementation" fidelity in intervention research but have not extended that concern to "assessment" fidelity, or the extent to which pre-/posttests are administered and interpreted as intended. When studying reading interventions, data gathering heavily influences the identification of…
Descriptors: Reading Tests, Fidelity, Pretests Posttests, Intervention
Peer reviewed Peer reviewed
Direct linkDirect link
Darsaklis, Vasiliki; Snider, Laurie M.; Majnemer, Annette; Mazer, Barbara – Physical & Occupational Therapy in Pediatrics, 2013
This study examined the constructs underlying the Movement Assessment Battery for Children-2 (M-ABC-2), Bruninks-Oseretsky Test of Motor Proficiency (BOTMP) and Vineland Adaptive Behavior Scale-2 (VABS-2) using the framework of the International Classification of Functioning Disability and Health--Child Youth version (ICF-CY) and the diagnostic…
Descriptors: Adjustment (to Environment), Motor Development, Children, Developmental Disabilities
Peer reviewed Peer reviewed
Direct linkDirect link
Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014
There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…
Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Lu, Chia-Chen; Luh, Ding-Bang – Creativity Research Journal, 2012
Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…
Descriptors: Creativity, Interrater Reliability, Construct Validity, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Lafave, Mark; Katz, Larry; Butterwick, Dale – Advances in Health Sciences Education, 2008
Content validation of an instrument that measures student performance in OSCE-type practical examinations is a critical step in a tool's overall validity and reliability [Hopkins (1998), "Educational and Psychological Measurement and Evaluation" (8th ed.). Toronto: Allyn & Bacon]. The purpose of the paper is to outline the process…
Descriptors: Check Lists, Physical Activities, Observation, Physicians
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5