NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Does not meet standards4
Showing 1 to 15 of 324 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
San Martín, Ernesto; González, Jorge – Journal of Educational and Behavioral Statistics, 2022
The nonequivalent groups with anchor test (NEAT) design is widely used in test equating. Under this design, two groups of examinees are administered different test forms with each test form containing a subset of common items. Because test takers from different groups are assigned only one test form, missing score data emerge by design rendering…
Descriptors: Tests, Scores, Statistical Analysis, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Wendy Chan – Asia Pacific Education Review, 2024
As evidence from evaluation and experimental studies continue to influence decision and policymaking, applied researchers and practitioners require tools to derive valid and credible inferences. Over the past several decades, research in causal inference has progressed with the development and application of propensity scores. Since their…
Descriptors: Probability, Scores, Causal Models, Statistical Inference
Peer reviewed Peer reviewed
Direct linkDirect link
Rajeeb Das; Erika Schmitt; Michael T. Stephenson – Journal of College Student Retention: Research, Theory & Practice, 2024
First-year seminars (FYS) comprise one of 11 researched interventions in postsecondary education known as High-Impact Practices, but few rigorous studies report significantly high impacts. This study examined a FYS employing propensity score matching to link cases and controls in a quasi-experimental design. One semester later cumulative grade…
Descriptors: College Freshmen, First Year Seminars, Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Collier, Zachary K.; Leite, Walter L. – Journal of Experimental Education, 2022
Artificial neural networks (NN) can help researchers estimate propensity scores for quasi-experimental estimation of treatment effects because they can automatically detect complex interactions involving many covariates. However, NN is difficult to implement due to the complexity of choosing an algorithm for various treatment levels and monitoring…
Descriptors: Artificial Intelligence, Mentors, Beginning Teachers, Teacher Persistence
Peer reviewed Peer reviewed
Direct linkDirect link
Karoline A. Sachse; Sebastian Weirich; Nicole Mahler; Camilla Rjosk – International Journal of Testing, 2024
In order to ensure content validity by covering a broad range of content domains, the testing times of some educational large-scale assessments last up to a total of two hours or more. Performance decline over the course of taking the test has been extensively documented in the literature. It can occur due to increases in the numbers of: (a)…
Descriptors: Test Wiseness, Test Score Decline, Testing Problems, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Chan, Wendy – American Journal of Evaluation, 2022
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact…
Descriptors: Probability, Scores, Scoring, Generalization
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – International Journal of Educational Methodology, 2021
Although Goodman-Kruskal gamma (G) is used relatively rarely it has promising potential as a coefficient of association in educational settings. Characteristics of G are studied in three sub-studies related to educational measurement settings. G appears to be unexpectedly appealing as an estimator of association between an item and a score because…
Descriptors: Educational Assessment, Measurement, Item Analysis, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Joanna L. Dickert; Jian Li – Research in Higher Education, 2024
As colleges and universities grapple with uncertainty around current and future enrollment as well as increasingly vocal questions about the value of postsecondary education, it is critically important that institutions ascertain and invest in the elements of campus learning and engagement that add value to the undergraduate experience. This study…
Descriptors: College Graduates, Student Participation, Educational Practices, Longitudinal Studies
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Collier, Zachary K.; Zhang, Haobai; Liu, Liu – Practical Assessment, Research & Evaluation, 2022
Although educational research and evaluation generally occur in multilevel settings, many analyses ignore cluster effects. Neglecting the nature of data from educational settings, especially in non-randomized experiments, can result in biased estimates with long-term consequences. Our manuscript improves the availability and understanding of…
Descriptors: Artificial Intelligence, Probability, Scores, Educational Research
Peer reviewed Peer reviewed
Direct linkDirect link
Rios, Joseph A. – Applied Measurement in Education, 2022
Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the "Standards for Educational and Psychological Testing," this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic…
Descriptors: Testing, Guessing (Tests), Academic Ability, Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Feinberg, Richard A.; von Davier, Matthias – Journal of Educational and Behavioral Statistics, 2020
The literature showing that subscores fail to add value is vast; yet despite their typical redundancy and the frequent presence of substantial statistical errors, many stakeholders remain convinced of their necessity. This article describes a method for identifying and reporting unexpectedly high or low subscores by comparing each examinee's…
Descriptors: Scores, Probability, Statistical Distributions, Ability
Peer reviewed Peer reviewed
Direct linkDirect link
Sim, Min Kyu; Choi, Dong Gu – Research Quarterly for Exercise and Sport, 2020
Purpose: This study builds a stochastic model of a discrete-time Markov chain (DTMC) that fits well with a dataset of professional playing records. Methods: The point-by-point dataset of Men's single matches played in the Association of Tennis Professionals (ATP) tour from 2011 to 2015 is analyzed. A long-debated assumption on the…
Descriptors: Probability, Racquet Sports, Scores, Scoring
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  22