Publication Date
| In 2026 | 2 |
| Since 2025 | 454 |
| Since 2022 (last 5 years) | 1933 |
| Since 2017 (last 10 years) | 4505 |
| Since 2007 (last 20 years) | 6990 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 454 |
| Practitioners | 319 |
| Teachers | 128 |
| Administrators | 73 |
| Policymakers | 33 |
| Counselors | 31 |
| Students | 17 |
| Parents | 10 |
| Community | 6 |
| Support Staff | 5 |
Location
| Turkey | 837 |
| Australia | 239 |
| China | 211 |
| Canada | 207 |
| Indonesia | 161 |
| Spain | 129 |
| United States | 123 |
| United Kingdom | 121 |
| Germany | 111 |
| Taiwan | 108 |
| Netherlands | 102 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 2 |
| Meets WWC Standards with or without Reservations | 2 |
| Does not meet standards | 1 |
Peer reviewedFodness, Ruth Wochnick; And Others – Journal of School Psychology, 1991
Examined test-retest reliability for Test of Language Development-2: Primary (TOLD-2 P) and Intermediate (TOLD-2 I). Findings from 60 children revealed that, with few exceptions, both tests had satisfactory reliability over 2-week interval. Less satisfactory reliability was found for TOLD-2 P Semantics Composite (ages 4, 6 ,and 8); Phonology…
Descriptors: Age Differences, Language Acquisition, Test Reliability, Young Children
Peer reviewedLivingston, Ronald B.; Gray, Robert M.; Haak, Ruth A. – Assessment, 1999
Examined the internal consistency of three tests from the Halstead-Reitan Neuropsychological Battery (R. Reitan and D. Wolfson, 1992) with a sample of 334 children, 9 to 14 years of age. Gives reliability coefficients for the Seashore Rhythm Test, two forms of the Speech Sounds Perception Test, and the Aphasia Screening Test. (SLD)
Descriptors: Children, Early Adolescents, Neuropsychology, Test Reliability
Peer reviewedQuilter, Shawn M.; Band, Jennie P.; Miller, Gary M. – Journal of Mental Health Counseling, 1999
Investigates some of the psychometric characteristics of the results from visual-analogue scales used to measure mental imagery. Reports that the scores from visual-analogue scales are positively related to scores from longer pencil-and-paper measures of mental imagery. Implications and limitations for the use of visual-analogue scales to measure…
Descriptors: Counseling Techniques, Instrumentation, Psychometrics, Test Reliability
Hoachlander, E. Gareth – Techniques: Making Education and Career Connections, 1998
Discusses state testing, various types of tests, and whether the increased attention to assessment is contributing to improved student learning. Describes uses of standardized multiple-choice, open-ended constructed response, essay, performance event, and portfolio methods. (JOW)
Descriptors: Academic Achievement, Student Evaluation, Test Format, Test Reliability
Peer reviewedLawrence, John W.; Heinberg, Leslie J.; Roca, Robert; Munster, Andrew; Spence, Robert; Fauerbach, James A. – Psychological Assessment, 1998
The Satisfaction with Appearance Scale (SWAP) was administered to 165 burn victims. SWAP showed a high level of internal consistency (Cronbach's alpha, r(a)=0.87); an 84-subject retest measured reliability (r(tt)=0.59). SWAP is both a reliable and valid measure of body image for a burn-injured population. (Author/MAK)
Descriptors: Body Image, Test Construction, Test Reliability, Test Validity
Peer reviewedBurton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Item-discrimination indices are numbers calculated from test data that are used in assessing the effectiveness of individual test questions. This article asserts that the indices are so unreliable as to suggest that countless good questions may have been discarded over the years. It considers how the indices, and hence overall test reliability,…
Descriptors: Guessing (Tests), Item Analysis, Test Reliability, Testing Problems
Berge, Jos M. F. Ten; Socan, Gregor – Psychometrika, 2004
To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts.…
Descriptors: Test Reliability, Sampling, Psychometrics, Test Bias
Burns, Matthew K.; VanDerHeyden, Amanda M.; Jiban, Cynthia L. – School Psychology Review, 2006
This study compared the mathematics performance of 434 second-, third-, fourth-, and fifth-grade students to previously reported fluency and accuracy criteria using three categories of performance (frustration, instructional, and mastery). Psychometric properties of the fluency and accuracy criteria were explored and new criteria for the…
Descriptors: Reading Improvement, Criteria, Psychometrics, Grade 5
Krishnamoorthy, K.; Xia, Yanping – Multivariate Behavioral Research, 2006
The conventional approach for testing the equality of two normal mean vectors is to test first the equality of covariance matrices, and if the equality assumption is tenable, then use the two-sample Hotelling T[superscript 2] test. Otherwise one can use one of the approximate tests for the multivariate Behrens-Fisher problem. In this article, we…
Descriptors: Statistical Analysis, Test Reliability, Test Selection, Error Patterns
Orwig, Denise; Brandt, Nicole; Gruber-Baldini, Ann L. – Gerontologist, 2006
Purpose: The purpose of this study was to describe the Medication Management Instrument for Deficiencies in the Elderly (MedMaIDE) and to provide results of reliability and validity testing. Design and Methods: Participants were 50 older adults, aged 65 and older, who lived in the community, took at least one prescription medication, and were then…
Descriptors: Older Adults, Validity, Interrater Reliability, Correlation
Garcia Laborda, Jesus – Online Submission, 2007
Interface design and ergonomics, while already studied in much of educational theory, have not until recently been considered in language testing (Fulcher, 2003). In this paper, we revise the design principles of PLEVALEX, a fully operational prototype Internet based language testing platform. Our focus here is to show PLEVALEX's interfaces and…
Descriptors: Language Tests, Internet, Computer Assisted Testing, Test Validity
Matson, Johnny L.; Boisjoli, Jessica A. – Journal of Intellectual & Developmental Disability, 2007
Background: The "Questions About Behavioral Function" (QABF) correctly identifies maintaining variables of challenging behaviour. However, for adults who have a long history of challenging behaviours, identifying one clear function of the maladaptive behaviour is difficult. Additionally, the person may develop multiple functions of their…
Descriptors: Behavior Problems, Mental Retardation, Adults, Test Reliability
Darby, Lynn A.; Marsh, Jennifer L.; Shewokis, Patricia A.; Pohlman, Roberta L. – Measurement in Physical Education and Exercise Science, 2007
To adhere to the principle of "exercise specificity" exercise testing should be completed using the same physical activity that is performed during exercise training. The present study was designed to assess whether aerobic step exercisers have a greater maximal oxygen consumption (max VO sub 2) when tested using an activity specific, maximal step…
Descriptors: Metabolism, Physical Activities, Exercise Physiology, Females
Albers, Craig A.; Grieve, Adam J. – Journal of Psychoeducational Assessment, 2007
The Bayley Scales of Infant and Toddler Development-Third Edition (Bayley-III) is a revision of the frequently used and well-known Bayley Scales of Infant Development-Second Edition (BSID-II; Bayley, 1993). Like its prior editions, the Bayley-III is an individually administered instrument designed to measure the developmental functioning of…
Descriptors: Test Reviews, Measures (Individuals), Child Development, Infants
Harlen, Wynne – Studies in Educational Evaluation, 2007
The assessment of students is used for various different purposes within an assessment system. It has an impact on students, teaching and the curriculum, the nature of this impact depending upon how it is carried out. In order to evaluate the advantages and disadvantages of particular assessment procedures, criteria need to be applied. This…
Descriptors: Evaluation Criteria, Student Evaluation, Test Validity, Construct Validity

Direct link
