NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 57 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Alatli, Betül – International Journal of Curriculum and Instruction, 2022
This study was conducted to review the use of tests. For this purpose, 45 articles in which the Turkish form of the "Test Anxiety Inventory (TAI)," which is one of the tests frequently used in the field of education, was employed and that were published between 2000 and 2020 were examined in terms of factors that should be considered in…
Descriptors: Anxiety, Likert Scales, Test Anxiety, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Pamela R. Buckley; Katie Massey Combs; Karen M. Drewelow; Brittany L. Hubler; Marion Amanda Lain – Evaluation Review, 2025
As evidence-based interventions are scaled, fidelity of implementation, and thus effectiveness, often wanes. Validated fidelity measures can improve researchers' ability to attribute outcomes to the intervention and help practitioners feel more confident in implementing the intervention as intended. We aim to provide a model for the validation of…
Descriptors: Middle School Students, Middle School Teachers, Evidence Based Practice, Program Development
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal – Educational Measurement: Issues and Practice, 2019
Rater training is an important part of developing and conducting large-scale constructed-response assessments. As part of this process, candidate raters have to pass a certification test to confirm that they are able to score consistently and accurately before they begin scoring operationally. Moreover, many assessment programs require raters to…
Descriptors: Evaluators, Certification, High Stakes Tests, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Hui; van Rijn, Peter; Moore, John C.; Bauer, Malcolm I.; Pressler, Yamina; Yestness, Nissa – International Journal of Science Education, 2019
This article provides a validation framework for research on the development and use of science Learning Progressions (LPs). The framework describes how evidence from various sources can be used to establish an interpretive argument and a validity argument at five stages of LP research--development, scoring, generalisation, extrapolation, and use.…
Descriptors: Sequential Approach, Educational Research, Science Education, Validity
Jin, Hui; van Rijn, Peter; Moore, John C.; Bauer, Malcolm I.; Pressler, Yamina; Yestness, Nissa – Grantee Submission, 2019
This article provides a validation framework for research on the development and use of science Learning Progressions (LPs). The framework describes how evidence from various sources can be used to establish an interpretive argument and a validity argument at five stages of LP research--development, scoring, generalisation, extrapolation, and use.…
Descriptors: Sequential Approach, Educational Research, Science Education, Validity
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ketterlin-Geller, Leanne R.; Perry, Lindsey; Platas, Linda M.; Sitbakhan, Yasmin – Global Education Review, 2018
Test scoring procedures should align with the intended uses and interpretations of test results. In this paper, we examine three test scoring procedures for an operational assessment of early numeracy, the Early Grade Mathematics Assessment (EGMA). The EGMA is an assessment that tests young children's foundational mathematics knowledge and has…
Descriptors: Alignment (Education), Scoring, Test Use, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Schmidgall, Jonathan E.; Getman, Edward P.; Zu, Jiyun – Language Testing, 2018
In this study, we define the term "screener test," elaborate key considerations in test design, and describe how to incorporate the concepts of practicality and argument-based validation to drive an evaluation of screener tests for language assessment. A screener test is defined as a brief assessment designed to identify an examinee as a…
Descriptors: Test Validity, Test Use, Test Construction, Language Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Carlson, Sarah E.; Seipel, Ben; Biancarosa, Gina; Davison, Mark L.; Clinton, Virginia – Grantee Submission, 2019
This demonstration introduces and presents an innovative online cognitive diagnostic assessment, developed to identify the types of cognitive processes that readers use during comprehension; specifically, processes that distinguish between subtypes of struggling comprehenders. Cognitive diagnostic assessments are designed to provide valuable…
Descriptors: Reading Comprehension, Standardized Tests, Diagnostic Tests, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Wilcox, Bethany R.; Caballero, Marcos D.; Baily, Charles; Sadaghiani, Homeyra; Chasteen, Stephanie V.; Ryan, Qing X.; Pollock, Steven J. – Physical Review Special Topics - Physics Education Research, 2015
The use of validated conceptual assessments alongside conventional course exams to measure student learning in introductory courses has become standard practice in many physics departments. These assessments provide a more standard measure of certain learning goals, allowing for comparisons of student learning across instructors, semesters,…
Descriptors: Student Evaluation, Physics, Tests, Advanced Courses
Peer reviewed Peer reviewed
Direct linkDirect link
Hatala, Rose; Cook, David A.; Brydges, Ryan; Hawkins, Richard – Advances in Health Sciences Education, 2015
In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane's framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected…
Descriptors: Measures (Individuals), Test Validity, Surgery, Skills
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Tan; Mak, Barley; Zhou, Pei – Language Testing, 2012
The fuzziness of assessing second language speaking performance raises two difficulties in scoring speaking performance: "indistinction between adjacent levels" and "overlap between scales". To address these two problems, this article proposes a new approach, "confidence scoring", to deal with such fuzziness, leading to "confidence" scores between…
Descriptors: Speech Communication, Scoring, Test Interpretation, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Cheng, Liying; DeLuca, Christopher – Educational Assessment, 2011
Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in…
Descriptors: Language Tests, Test Validity, Test Use, English
Peer reviewed Peer reviewed
Dorn, Fred J.; Jereb, Ron – Measurement and Evaluation in Counseling and Development, 1985
Counseling psychologists (N=8) recorded the time needed to score the Counselor Rating Form (CRF) and the CRF-Quick Score (CRF-QS). Then 120 students viewed a counseling videotape and completed both measures. Results showed the two are comparable but the CRF-QS is significantly less time-consuming to score. (JAC)
Descriptors: Counselor Evaluation, Evaluation Methods, Scoring, Test Use
Smith, Douglas K.; And Others – 1986
The study examined the relationship between performance on the K-ABC (Kaufman Assessment Battery for Children) and the WISC-R (Wechsler Intelligence Scale for Children--Revised) for 67 students being considered for placement in a private school in a midwestern metropolitan area that serves students with severe learning disabilities. All were…
Descriptors: Elementary Education, Intelligence Quotient, Learning Disabilities, Scoring
Peer reviewed Peer reviewed
Gelin, Michaela N.; Zumbo, Bruno D. – Educational and Psychological Measurement, 2003
Investigated potentially biased scale items on the Center for Epidemiological Studies Depression scale (CES-D; Radloff, 1977) in a sample of 600 adults. Overall, results indicate that the scoring method has an effect on differential item functioning (DIF), and that DIF is a property of the item, scoring method, and purpose of the assessment. (SLD)
Descriptors: Depression (Psychology), Item Bias, Scoring, Test Items
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4