NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 79 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Putnikovic, Marko; Jovanovic, Jelena – IEEE Transactions on Learning Technologies, 2023
Automatic grading of short answers is an important task in computer-assisted assessment (CAA). Recently, embeddings, as semantic-rich textual representations, have been increasingly used to represent short answers and predict the grade. Despite the recent trend of applying embeddings in automatic short answer grading (ASAG), there are no…
Descriptors: Automation, Computer Assisted Testing, Grading, Natural Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023
Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…
Descriptors: Testing, Computation, Classification, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025
Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…
Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment
Peer reviewed Peer reviewed
PDF on ERIC Download full text
W. Jake Thompson – Grantee Submission, 2024
Diagnostic classification models (DCMs) are psychometric models that can be used to estimate the presence or absence of psychological traits, or proficiency on fine-grained skills. Critical to the use of any psychometric model in practice, including DCMs, is an evaluation of model fit. Traditionally, DCMs have been estimated with maximum…
Descriptors: Bayesian Statistics, Classification, Psychometrics, Goodness of Fit
Peer reviewed Peer reviewed
Direct linkDirect link
Andrew Ho – Teachers College Record, 2025
Background/Context: Public monitoring of educational progress and inequality often involves tracking changes in the percentage of "proficient" students across groups and over time. These trends are important signals of state and district provision of educational opportunity. I show how known flaws of this percentage metric, sometimes…
Descriptors: Educational Assessment, Progress Monitoring, Educational Trends, Educational Opportunities
Peer reviewed Peer reviewed
Direct linkDirect link
Barrenechea, Rodrigo; Mahoney, James – Sociological Methods & Research, 2019
This article develops a set-theoretic approach to Bayes's theorem and Bayesian process tracing. In the approach, hypothesis testing is the procedure whereby one updates beliefs by narrowing the range of states of the world that are regarded as possible, thus diminishing the domain in which the actual world can reside. By explicitly connecting…
Descriptors: Bayesian Statistics, Hypothesis Testing, Qualitative Research, Research Methodology
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rajagopal, Prabha; Ravana, Sri Devi – Information Research: An International Electronic Journal, 2017
Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Descriptors: Information Retrieval, Documentation, Scores, Information Systems
Spencer, Bryden – ProQuest LLC, 2016
Value-added models are a class of growth models used in education to assign responsibility for student growth to teachers or schools. For value-added models to be used fairly, sufficient statistical precision is necessary for accurate teacher classification. Previous research indicated precision below practical limits. An alternative approach has…
Descriptors: Monte Carlo Methods, Comparative Analysis, Accuracy, High Stakes Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Ginsburg, Herbert P.; Lee, Young-Sun; Pappas, Sandra – ZDM: The International Journal on Mathematics Education, 2016
This paper investigates the power of the computer guided clinical interview (CI) and new curriculum based measurement (CBM) measures to identify and help children at risk of low mathematics achievement. We use data from large numbers of children in Kindergarten through Grade 3 to investigate the construct validity of CBM risk categories. The basic…
Descriptors: Interviews, Curriculum Based Assessment, Evaluation Methods, At Risk Students
Kim, Jiseon – ProQuest LLC, 2010
Classification testing has been widely used to make categorical decisions by determining whether an examinee has a certain degree of ability required by established standards. As computer technologies have developed, classification testing has become more computerized. Several approaches have been proposed and investigated in the context of…
Descriptors: Test Length, Computer Assisted Testing, Classification, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry – Educational and Psychological Measurement, 2012
This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…
Descriptors: Test Items, Computer Assisted Testing, Measures (Individuals), Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Hout, Michael C.; Goldinger, Stephen D.; Ferguson, Ryan W. – Journal of Experimental Psychology: General, 2013
Although traditional methods to collect similarity data (for multidimensional scaling [MDS]) are robust, they share a key shortcoming. Specifically, the possible pairwise comparisons in any set of objects grow rapidly as a function of set size. This leads to lengthy experimental protocols, or procedures that involve scaling stimulus subsets. We…
Descriptors: Visual Stimuli, Research Methodology, Problem Solving, Multidimensional Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
Ferland, Chantale; Lepage, Celine; Moffet, Helene; Maltais, Desiree B. – Physical & Occupational Therapy in Pediatrics, 2012
This study aimed to quantify relationships between lower limb muscle strength and locomotor capacity for children and adolescents with cerebral palsy (CP) to identify key muscle groups for strength training. Fifty 6- to 16-year-olds with CP (Gross Motor Function Classification System level I or II) participated. Isometric muscle strength of hip…
Descriptors: Muscular Strength, Physical Fitness, Cerebral Palsy, Classification
Newman, Denis; Jaciw, Andrew P. – Empirical Education Inc., 2012
The motivation for this paper is the authors' recent work on several randomized control trials in which they found the primary result, which averaged across subgroups or sites, to be moderated by demographic or site characteristics. They are led to examine a distinction that the Institute of Education Sciences (IES) makes between "confirmatory"…
Descriptors: Educational Research, Research Methodology, Research Design, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Thompson, Nathan A. – Journal of Applied Testing Technology, 2008
The widespread application of personal computers to educational and psychological testing has substantially increased the number of test administration methodologies available to testing programs. Many of these mediums are referred to by their acronyms, such as CAT, CBT, CCT, and LOFT. The similarities between the acronyms and the methods…
Descriptors: Testing Programs, Psychological Testing, Classification, Educational Testing
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6