ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	11

Descriptor

Correlation	18
Scoring Formulas	18
Scores	5
Test Reliability	5
Evaluation Methods	4
Test Validity	4
Data Analysis	3
Interrater Reliability	3
Psychometrics	3
Reliability	3
Test Theory	3
Validity	3
Adults	2
Anatomy	2
Cognitive Style	2
Comparative Analysis	2
Criterion Referenced Tests	2
Elementary Secondary Education	2
Evaluation Criteria	2
Foreign Countries	2
Item Response Theory	2
Measurement	2
Measurement Techniques	2
Models	2
Multiple Choice Tests	2
More ▼

Source

Perceptual and Motor Skills	3
Anatomical Sciences Education	2
ETS Research Report Series	2
Applied Psychological…	1
Creativity Research Journal	1
Educational Assessment	1
Educational Sciences: Theory…	1
Journal of College Science…	1
Journal of Computer-Based…	1
Journal of Consulting and…	1
Journal of Research on…	1
Journal of Special Education	1
Practical Assessment,…	1
Research & Practice in…	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	17
Information Analyses	2
Opinion Papers	1
Reports - Descriptive	1

Education Level

Higher Education	4
Postsecondary Education	3
Secondary Education	2
Adult Education	1
Elementary Education	1
Grade 7	1
High Schools	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Denmark	1
Turkey	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

Group Embedded Figures Test	2
Bender Visual Motor Gestalt…	1
Goodenough Harris Drawing Test	1
Graduate Record Examinations	1
Rod and Frame Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Rounding in Angoff Ratings

Peer reviewed
PDF on ERIC

Download full text

Wyse, Adam E. – Practical Assessment, Research & Evaluation, 2018

One common modification to the Angoff standard-setting method is to have panelists round their ratings to the nearest 0.05 or 0.10 instead of 0.01. Several reasons have been offered as to why it may make sense to have panelists round their ratings to the nearest 0.05 or 0.10. In this article, we examine one reason that has been suggested, which is…

Descriptors: Interrater Reliability, Evaluation Criteria, Scoring Formulas, Achievement Rating

Test Assembly Implications for Providing Reliable and Valid Subscores

Peer reviewed

Direct link

Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017

This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…

Descriptors: Scores, Test Construction, Test Reliability, Test Validity

Evidence-Based Decision about Test Scoring Rules in Clinical Anatomy Multiple-Choice Examinations

Peer reviewed

Direct link

Severo, Milton; Gaio, A. Rita; Povo, Ana; Silva-Pereira, Fernanda; Ferreira, Maria Amélia – Anatomical Sciences Education, 2015

In theory the formula scoring methods increase the reliability of multiple-choice tests in comparison with number-right scoring. This study aimed to evaluate the impact of the formula scoring method in clinical anatomy multiple-choice examinations, and to compare it with that from the number-right scoring method, hoping to achieve an…

Descriptors: Anatomy, Multiple Choice Tests, Scoring, Decision Making

Research and Teaching: Correcting Missed Exam Questions as a Learning Tool in a Physiology Course

Peer reviewed

Direct link

Rozell, Timothy G.; Johnson, Jessica; Sexten, Andrea; Rhodes, Ashley E. – Journal of College Science Teaching, 2017

Students in a junior- and senior-level Anatomy and Physiology course have the opportunity to correct missed exam questions ("regrade") and earn up to half of the original points missed. The three objectives of this study were to determine if: (a) performance on the regrade assignment was correlated with scores on subsequent exams, (b)…

Descriptors: Physiology, Scores, Grades (Scholastic), Exit Examinations

Meta-Analysis of Criterion Validity for Curriculum-Based Measurement in Written Language

Peer reviewed

Direct link

Romig, John Elwood; Therrien, William J.; Lloyd, John W. – Journal of Special Education, 2017

We used meta-analysis to examine the criterion validity of four scoring procedures used in curriculum-based measurement of written language. A total of 22 articles representing 21 studies (N = 21) met the inclusion criteria. Results indicated that two scoring procedures, correct word sequences and correct minus incorrect sequences, have acceptable…

Descriptors: Meta Analysis, Curriculum Based Assessment, Written Language, Scoring Formulas

Climbing Bloom's Taxonomy Pyramid: Lessons from a Graduate Histology Course

Peer reviewed

Direct link

Zaidi, Nikki B.; Hwang, Charles; Scott, Sara; Stallard, Stefanie; Purkiss, Joel; Hortsch, Michael – Anatomical Sciences Education, 2017

Bloom's taxonomy was adopted to create a subject-specific scoring tool for histology multiple-choice questions (MCQs). This Bloom's Taxonomy Histology Tool (BTHT) was used to analyze teacher- and student-generated quiz and examination questions from a graduate level histology course. Multiple-choice questions using histological images were…

Descriptors: Taxonomy, Anatomy, Graduate Students, Scoring Formulas

Investigation of Coefficient of Individual Agreement in Terms of Sample Size, Random and Monotone Missing Ratio, and Number of Repeated Measures

Peer reviewed
PDF on ERIC

Download full text

Temel, Gülhan Orekici; Erdogan, Semra; Selvi, Hüseyin; Kaya, Irem Ersöz – Educational Sciences: Theory and Practice, 2016

Studies based on longitudinal data focus on the change and development of the situation being investigated and allow for examining cases regarding education, individual development, cultural change, and socioeconomic improvement in time. However, as these studies require taking repeated measures in different time periods, they may include various…

Descriptors: Investigations, Sample Size, Longitudinal Studies, Interrater Reliability

Is What You See What You Really Get? Comparison of Scoring Techniques in the Assessment of Real-World Divergent Thinking

Peer reviewed

Direct link

Plucker, Jonathan A.; Qian, Meihua; Schmalensee, Stephanie L. – Creativity Research Journal, 2014

In recent years, the social sciences have seen a resurgence in the study of divergent thinking (DT) measures. However, many of these recent advances have focused on abstract, decontextualized DT tasks (e.g., list as many things as you can think of that have wheels). This study provides a new perspective by exploring the reliability and validity…

Descriptors: Creative Thinking, Creativity Tests, Scoring Formulas, Evaluation Methods

A Note on Item-Restscore Association in Rasch Models

Peer reviewed

Direct link

Kreiner, Svend – Applied Psychological Measurement, 2011

To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…

Descriptors: Item Analysis, Correlation, Item Response Theory, Models

Evaluation of the "e-rater"® Scoring Engine for the "GRE"® Issue and Argument Prompts. Research Report. ETS RR-12-02

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…

Descriptors: Scoring, Test Scoring Machines, Automation, Models

Subscores and Validity. Research Report. ETS RR-08-64

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

In educational testing, subscores may be provided based on a portion of the items from a larger test. One consideration in evaluation of such subscores is their ability to predict a criterion score. Two limitations on prediction exist. The first, which is well known, is that the coefficient of determination for linear prediction of the criterion…

Descriptors: Scores, Validity, Educational Testing, Correlation

Relationship between Bender Errors, Emotional Indicators and Performance on Bender Recall.

Peer reviewed

Anderson, Barbara; Rallis, Kathryn – Perceptual and Motor Skills, 1981

Results of this study suggest that children (ages 6-16) with a sufficient number of emotional indicators (three or more) to create a suspicion of emotional disturbance do not recall significantly fewer Bender designs than do children whose indicators fall within normal limits. (Author/SJL)

Descriptors: Correlation, Elementary Secondary Education, Emotional Disturbances, Recall (Psychology)

Inter-Rater Reliability and Concurrent Validity of the Goodenough-Harris and McCarthy Draw-A-Child Scoring Systems.

Peer reviewed

Naglieri, Jack A.; Maxwell, Susanna – Perceptual and Motor Skills, 1981

Inter-rater reliability of the Goodenough-Harris and McCarthy Draw-A-Child scoring systems was examined for a sample of 60 children, including 20 school-labeled learning disabled, 20 mentally retarded, and 20 normal children between the ages of six and eight-and-one-half years. (Author)

Descriptors: Correlation, Intelligence Tests, Learning Disabilities, Mental Retardation

Measurement of Rod-and-Frame Test Performance.

Peer reviewed

Allen, Mary J.; And Others – Perceptual and Motor Skills, 1982

Adults took the Rod and Frame, Portable Rod and Frame, and Embedded Figures Tests. Absolute and algebraic frame-effect scores were more reliable and valid than rod-effect algebraic scores. Correlations with the Embedded Figures Test were so low that the interchangeability of these field articulation measures is questionable. (Author/RD)

Descriptors: Adults, Cognitive Style, Correlation, Measurement Techniques

Factor Analysis of the Mosher Forced-Choice Guilt Inventory.

Peer reviewed

O'Grady, Kevin E.; Janda, Louis H. – Journal of Consulting and Clinical Psychology, 1979

This inventory measures sex guilt, hostility guilt, and morality-conscience guilt. Analyses indicate the appropriateness of a simple present-absent scoring system. Internal structure of each subscale is complex. Intercorrelations of scores are larger for males. (Author/BEF)

Descriptors: Adults, Behavior Rating Scales, Correlation, Factor Analysis

Previous Page | Next Page »

Pages: 1 | 2

Abedi, Jamal	1
Allen, Mary J.	1
Anderson, Barbara	1
Bridgeman, Brent	1
Bruno, James	1
Cavaiani, Thomas P.	1
Davey, Tim	1
Erdogan, Semra	1
Ferreira, Maria Amélia	1
Gaio, A. Rita	1
Haberman, Shelby J.	1
Hortsch, Michael	1
Hwang, Charles	1
Janda, Louis H.	1
Johnson, Jessica	1
Kaya, Irem Ersöz	1
Kreiner, Svend	1
Lee, Minji K.	1
Lloyd, John W.	1
Maxwell, Susanna	1
Melican, Gerald J.	1
Naglieri, Jack A.	1
O'Grady, Kevin E.	1
Plucker, Jonathan A.	1
More ▼