Publication Date
| In 2026 | 3 |
| Since 2025 | 666 |
| Since 2022 (last 5 years) | 3167 |
| Since 2017 (last 10 years) | 7408 |
| Since 2007 (last 20 years) | 15046 |
Descriptor
| Test Reliability | 15036 |
| Test Validity | 10272 |
| Reliability | 9759 |
| Foreign Countries | 7141 |
| Test Construction | 4823 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3525 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1327 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 252 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Paden, Patricia A. – 1986
Two factors which may affect the ratings assigned to an essay test are investigated: (1) context effects; and (2) score level effects. Context effects exist in essay scoring if an essay is rated higher when preceded by poor quality essays than when preceded by high quality essays. A score level effect is defined as a change in the score (value)…
Descriptors: Context Effect, Essay Tests, Holistic Evaluation, Interrater Reliability
Lehmann, Rainer H. – 1987
A total of 1,487 eleventh grade students from the Hamburg (West Germany) school system were asked to complete four writing assignments used in an International Association for the Evaluation of Educational Achievement (IEA) study of writing assessment. In analyzing the writing samples, the study focused on: (1) between-rater effects; (2)…
Descriptors: Evaluation Problems, Foreign Countries, High Schools, International Programs
PDF pending restorationLarsson, Bernt – 1974
This report gives some simple examples of stability for one factor and 2 x 2 factorial analysis of variance, reliability and correlations. The findings are very different: from superstability (no transformation whatsoever can change the result) to almost total instability. This is followed by a discussion of applications to multivariate analysis,…
Descriptors: Analysis of Variance, Correlation, Discriminant Analysis, Factor Analysis
Weiss, David J. – 1969
Today's psychological measurement depends almost exclusively on the "standardized test." A certain amount of non-standardization, however, exists in the administration of any standardized test, with the amount unknown for any given test score. Time limits on tests pose a bigger problem since another variable is introduced, pressure. Test taking…
Descriptors: Computer Oriented Programs, Individual Testing, Measurement Instruments, Motivation
PDF pending restorationHarris, Chester W. – 1971
Livingston's work is a careful analysis of what occurs when one pools two populations with different means, but similar variances and reliability coefficients. However, his work fails to advance reliability theory for the special case of criterion-referenced testing. See ED 042 802 for Livingston's paper. (MS)
Descriptors: Analysis of Variance, Criterion Referenced Tests, Error of Measurement, Reliability
Peer reviewedJungwirth, E. – Studies in Educational Evaluation, 1978
This paper examines the individual consistency of eleventh grade respondents in normal test-retest situations as well as situations involving changed response methods. It postulates the personality-trait "discriminator" (high vs low) and confirms the trait's influence on individual consistency within and across Biology Cognitive…
Descriptors: Biology, Cognitive Style, Cognitive Tests, Foreign Countries
Peer reviewedFrick, Ted; Semmel, Melvyn I. – Review of Educational Research, 1978
Observer disagreement is important because it limits the reliabilities of observational measures. To avoid this limitation, observers should be trained, and criterion-related and intraobserver agreement measures should be used both before and during a study. Several agreement coefficients are examined for their applicability. (Author/BW)
Descriptors: Classroom Observation Techniques, Error Patterns, Mathematical Models, Reliability
Aydin, Selami – Online Submission, 2006
This research aimed to investigate the effect of computers on the test and inter-rater reliability of writing test scores of ESL learners. Writing samples of 20 pen-paper and 20 computer group students were scored in analytic scoring method by two scorers, and then the scores were analyzed in Alpha (Cronbach) model. The results showed that the…
Descriptors: Writing Tests, Interrater Reliability, Test Reliability, English (Second Language)
Crehan, Kevin D. – 1997
Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…
Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability
Peer reviewedSuen, Hoi K.; And Others – Journal of Early Intervention, 1995
This paper suggests that in addressing the issue of parent-professional congruence in child assessment, researchers should avoid focusing on the conventional aspects of interrater reliability and rater interchangeability, but rather should focus on the reliability of the pooled assessment information from parents and professionals. A…
Descriptors: Disabilities, Early Childhood Education, Early Intervention, Evaluation Methods
Peer reviewedCanivez, Gary L.; Watkins, Marley W. – Assessment for Effective Intervention, 2002
Teaching professionals (n=29) who shared the same classroom for a minimum of one hour per day provided independent ratings of the same child (ages 7-17) on the Adjustment Scales for Children and Adolescents (ASCA). Results indicated that statistically significant interrater agreement was achieved across all 22 syndromic profile classification…
Descriptors: Behavior Rating Scales, Disabilities, Elementary Secondary Education, Emotional Adjustment
Peer reviewedSloan, R. L.; And Others – International Journal of Rehabilitation Research, 1992
This study tested the interrater reliability of the Modified Ashworth Scale in measuring upper and lower limb spasticity in 34 hemiplegic adult patients examined by 2 physiotherapists and 2 doctors. Findings indicated satisfactory reliability for upper limb spasticity but less satisfactory results for lower limb spasticity. (DB)
Descriptors: Adults, Behavior Rating Scales, Evaluation Methods, Interrater Reliability
Peer reviewedDemsky, Yvonne I.; Gass, Carlton S.; Golden, Charles J. – Assessment, 1998
Standardization data based on responses of 616 Puerto Ricans to the Spanish version of the Wechsler Adult Intelligence Scale (D. Wechlser, 1981) reveal reliability data and base rates to assist in evaluating the clinical significance of differences between Performance Intelligence Quotient (PIQ) and Verbal Intelligence Quotient (VIQ).…
Descriptors: Adults, Clinical Diagnosis, Intelligence Tests, Performance Factors
Peer reviewedBond, Malcolm J.; Tustin, R. Don – Journal of Intellectual and Developmental Disability, 1999
This study assessed the psychometric properties of two subscales of the Adelaide Behaviour Disorder Scale that have been hypothesized to describe conduct problems and emotional problems of adults with intellectual disability. Criterion scores for identifying individuals needing clinical intervention were established and validated against…
Descriptors: Adults, Behavior Problems, Disability Identification, Eligibility
Peer reviewedVan Bourgondien, Mary E.; Reichle, Nancy C.; Campbell, Duncan G.; Mesibov, Gary B. – Research in Developmental Disabilities, 1998
This study assessed the psychometric properties of the Environmental Rating Scale, a measure specifically designed to assess residential treatment programs for individuals with autism. The measure's reliability was demonstrated by assessments of the internal consistency, stability, and interrater reliability. Preliminary analysis of validity…
Descriptors: Adults, Autism, Evaluation Methods, Interrater Reliability


