Publication Date
| In 2026 | 3 |
| Since 2025 | 666 |
| Since 2022 (last 5 years) | 3167 |
| Since 2017 (last 10 years) | 7408 |
| Since 2007 (last 20 years) | 15046 |
Descriptor
| Test Reliability | 15036 |
| Test Validity | 10272 |
| Reliability | 9759 |
| Foreign Countries | 7141 |
| Test Construction | 4823 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3525 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1327 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 252 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Chionh, Yan Huay; Fraser, Barry J. – International Research in Geographical and Environmental Education, 2009
This comprehensive study involved the use of the what is happening in this class? (WIHIC) questionnaire among 2310 Singaporean Grade 10 students (aged 15 years) in 75 geography and mathematics classes in 38 schools. A seven-scale factor structure was strongly supported and the alpha reliability of each scale was high. An investigation of…
Descriptors: Geography Instruction, Factor Structure, Measures (Individuals), Foreign Countries
Millar, Dorothy Squatrito – Education and Training in Developmental Disabilities, 2009
IEP transition-related content was compared between young adults with developmental disabilities who had or did not have legal guardians. It was found that students with guardians were more likely to earn a certificate of completion, and wanted to remain living with their families, in comparison to students without guardians who were more likely…
Descriptors: Developmental Disabilities, Young Adults, Individualized Education Programs, Self Determination
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
Sinharay, Sandip; Haberman, Shelby J. – Measurement: Interdisciplinary Research and Perspectives, 2009
In this commentary, the authors discuss some of the issues regarding the use of diagnostic classification models that practitioners should keep in mind. In the authors experience, these issues are not as well known as they should be. The authors then provide recommendations on diagnostic scoring.
Descriptors: Scoring, Reliability, Validity, Classification
Bridgeman, Brent; And Others – 1996
The various methods for computing the reliability of scores on Advanced Placement (AP) examinations are summarized. For the free response portion of the examinations, raters can contribute to score unreliability through both systematic severity errors (in which some raters consistently rate more severely than other raters) and through…
Descriptors: Advanced Placement, College Entrance Examinations, Error of Measurement, High School Students
Reckase, Mark D. – 1997
This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound…
Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability
Ackerman, Terry A. – 1986
The purpose of this paper is to compare the precision of direct and indirect measures of writing assessment using the test information functions from a graded response Item Response Theory (IRT) model. Subjects were 192 sophomore English students from a parochial high school in Wisconsin. Both direct and indirect measures of writing ability were…
Descriptors: Correlation, Essay Tests, High Schools, Interrater Reliability
Corcoran, Kevin J.; White, Lyle J.; Michels, Jennifer L.; Gilbert, David G. – 1997
Recently, a great deal of attention has been focused on the development of a system of relational diagnosis to be incorporated into the American Psychiatric Association's diagnostic system, that of the Diagnostic and Statistical Manual (DSM). One of the more intriguing components of this effort is the Global Assessment of Relational Functioning…
Descriptors: Adults, Diagnostic Tests, Emotional Response, Graduate Students
Peer reviewedSchor, Nina F.; And Others – Academic Medicine, 1997
A study investigated whether medical school faculty can arrive at consistent, non-idiosyncratic grades in a problem-based learning course. Analysis of grades given by three teachers, based on seven performance categories, to 16 groups of nine students in a seven-week University of Pittsburgh (Pennsylvania) course revealed that given specific…
Descriptors: Academic Achievement, Curriculum Design, Grading, Higher Education
Peer reviewedShaw, Stephanie; Coggins, Truman E. – Journal of Speech and Hearing Research, 1991
This study, involving five experienced and trained speech language pathologists, categorized the elicited imitations of five profoundly and five severely prelingually hearing-impaired subjects using the Phonetic Level Evaluation. Failure to obtain acceptably high levels of reliability suggests that this measure may not yet be an accurate and…
Descriptors: Acoustic Phonetics, Articulation (Speech), Congenital Impairments, Deafness
Peer reviewedGagne, Francoys; And Others – Gifted Child Quarterly, 1993
Forty prototypical descriptions representing 4 aptitude domains and 4 talent fields were rated by 2,343 intermediate-level pupils and their teachers, and indices of interpeer agreement were computed. A majority of the prototypes maintained acceptable interpeer agreement levels. Interpeer agreement depended primarily on the specific aptitude or…
Descriptors: Ability Identification, Evaluation Methods, Gifted, Intermediate Grades
Peer reviewedDuker, Pieter C. – Research in Developmental Disabilities, 1999
To assess the psychometric characteristics of the Verbal Behavior Assessment Scale, the 15-item questionnaire was administered to pairs of caregivers of 115 individuals with developmental disabilities. Exploratory factor analysis involving 11 more participants revealed evidence concerning the distinction of three different communicative functions…
Descriptors: Adults, Children, Communication Skills, Developmental Disabilities
Wininger, Steven R. – Teaching Statistics: An International Journal for Teachers, 2007
A hands-on activity is described in which students attempt to measure something that they cannot see. In small groups, students estimate the number of marbles in sealed boxes. Next, students' estimates are compared with the actual numbers. Last, values from both the students' estimates and actual numbers are used to explain measurement theory and…
Descriptors: Computation, Measurement, Experiential Learning, Theories
Schumacker, Randall E.; Smith, Everett V., Jr. – Educational and Psychological Measurement, 2007
Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Descriptors: Measurement Techniques, Error of Measurement, Item Sampling, Item Response Theory
Stein, Mary; Barman, Charles R.; Larrabee, Timothy – Journal of Science Teacher Education, 2007
This article describes the rationale for, and development of, an online instrument that helps identify commonly held science misconceptions. Science Beliefs is a 47-item instrument that targets topics in chemistry, physics, biology, earth science, and astronomy. It utilizes a true or false, along with a written-explanation, format. The true or…
Descriptors: Misconceptions, Scientific Concepts, Chemistry, Physics

Direct link
