NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
Praxis Series1
What Works Clearinghouse Rating
Showing 1 to 15 of 16 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020
This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…
Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Sadaf, Ayesha; Olesova, Larisa – American Journal of Distance Education, 2017
The researchers in this study examined the influence of questions designed with the Practical Inquiry Model (PIM), compared with the regular (playground) questions, on students' levels of cognitive presence in online discussions. Students' discussion postings were collected and categorized according to the four levels of cognitive presence:…
Descriptors: Graduate Students, Masters Programs, Cognitive Processes, Web Based Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Jang, Hyewon – Journal of Science Education and Technology, 2016
Gaps between science, technology, engineering, and mathematics (STEM) education and required workplace skills have been identified in industry, academia, and government. Educators acknowledge the need to reform STEM education to better prepare students for their future careers. We pursue this growing interest in the skills needed for STEM…
Descriptors: STEM Education, Work Environment, Interrater Reliability, Engineering Education
Peer reviewed Peer reviewed
Direct linkDirect link
Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014
One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…
Descriptors: Second Language Learning, Language Tests, French, Language Teachers
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Yiu, Edwin M.-L.; Chan, Karen M. K.; Mok, Rosa S.-M. – Clinical Linguistics & Phonetics, 2007
One of the ways to improve the reliability in perceptual voice quality rating is to provide listeners with external anchors. A paired comparison matching paradigm using synthesized Cantonese voice stimuli that covered a range of rough and breathy qualities were used to investigate the rating reliability. Twenty-five speech pathology students rated…
Descriptors: Data Analysis, Measures (Individuals), Stimuli, Models
Peer reviewed Peer reviewed
Baird, Christopher; Wagner, Dennis; Healy, Theresa; Johnson, Kristen – Child Welfare, 1999
Compared reliability of three widely used child protective service risk-assessment models (one actuarial, two consensus based). Found that, although no system approached 100% interrater reliability, raters employing the actuarial model made consistent estimates of risk for a high percentage of cases they assessed. Interrater reliability for the…
Descriptors: At Risk Persons, Child Welfare, Children, Comparative Analysis
Kenyon, Dorry; Stansfield, Charles W. – 1993
This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…
Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics
De Ayala, R. J.; And Others – 1989
The graded response (GR) model of Samejima (1969) and the partial credit model (PC) of Masters (1982) were fitted to identical writing samples that were holistically scored. The performance and relative benefits of each model were then evaluated. Writing samples were both expository and narrative. Data were from statewide assessments of secondary…
Descriptors: Comparative Analysis, Essay Tests, Holistic Evaluation, Interrater Reliability
McNamara, T. F.; Adams, R. J. – 1991
A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability
Peer reviewed Peer reviewed
Fischer, Jan Lockwood; Krause Eheart, Brenda – Early Childhood Research Quarterly, 1991
Providers' demographic characteristics, training, support networks, business practices, and stability of services were examined relative to their caregiving practices. Results from a schematic model approach suggest correlations between some of these factors and variances in ratings of caregiver practices. (LB)
Descriptors: Behavior Rating Scales, Child Caregivers, Comparative Analysis, Data Analysis
Previous Page | Next Page »
Pages: 1  |  2