Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 316 |
| Since 2017 (last 10 years) | 615 |
| Since 2007 (last 20 years) | 1736 |
Descriptor
| Evaluation Methods | 3975 |
| Test Validity | 2083 |
| Validity | 1473 |
| Test Reliability | 995 |
| Student Evaluation | 803 |
| Foreign Countries | 637 |
| Test Construction | 560 |
| Reliability | 527 |
| Higher Education | 452 |
| Elementary Secondary Education | 418 |
| Measurement Techniques | 418 |
| More ▼ | |
Source
Author
| Fuchs, Lynn S. | 12 |
| Baker, Eva L. | 11 |
| Cronin, John | 11 |
| Marsh, Herbert W. | 11 |
| Amrein-Beardsley, Audrey | 9 |
| Linn, Robert L. | 9 |
| Sireci, Stephen G. | 9 |
| Raykov, Tenko | 8 |
| Deno, Stanley L. | 7 |
| Epstein, Michael H. | 7 |
| Matson, Johnny L. | 7 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 193 |
| Practitioners | 121 |
| Teachers | 47 |
| Administrators | 31 |
| Policymakers | 27 |
| Students | 16 |
| Counselors | 7 |
| Media Staff | 4 |
| Community | 3 |
| Support Staff | 3 |
| Parents | 2 |
| More ▼ | |
Location
| Australia | 66 |
| United Kingdom | 56 |
| Canada | 47 |
| California | 32 |
| Netherlands | 30 |
| United States | 30 |
| United Kingdom (England) | 26 |
| Germany | 23 |
| Turkey | 22 |
| China | 21 |
| Taiwan | 21 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Huey T. Chen; Liliana Morosanu; Victor H. Chen – Asia Pacific Journal of Education, 2024
The Campbellian validity typology has been used as a foundation for outcome evaluation and for developing evidence-based interventions for decades. As such, randomized control trials were preferred for outcome evaluation. However, some evaluators disagree with the validity typology's argument that randomized controlled trials as the best design…
Descriptors: Evaluation Methods, Systems Approach, Intervention, Evidence Based Practice
Mohammad Hmoud; Hadeel Swaity; Eman Anjass; Eva María Aguaded-Ramírez – Electronic Journal of e-Learning, 2024
This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential.…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Simon Massey – International Journal of Social Research Methodology, 2024
The UK-based article develops a quantitative method for measuring 8-9-year-old children's Gender Ability Beliefs through drawings, assessing the reliability and validity of the measure and its association with respondents' self-reported gender. The measure, originally used in the US by Beilock et al. (2010), required respondents to draw two…
Descriptors: Children, Sex, Childrens Attitudes, Gender Differences
Hyemin Yoon; HyunJin Kim; Sangjin Kim – Measurement: Interdisciplinary Research and Perspectives, 2024
We have maintained the customer grade system that is being implemented to customers with excellent performance through customer segmentation for years. Currently, financial institutions that operate the customer grade system provide similar services based on the score calculation criteria, but the score calculation criteria vary from the financial…
Descriptors: Classification, Artificial Intelligence, Prediction, Decision Making
Weiwei Tong; Prasong Saihong; Kanyarat Sonsupap – International Journal of Language Education, 2024
The main objective of this study is to revise and validate the assessment of self-presentation skills of middle school students. The assessment is based on existing self-assessment scales and adaptively modified for a more accurate assessment of middle school students' self-presentation skills. Considering the characteristics of middle school…
Descriptors: Middle School Students, Self Evaluation (Individuals), Rating Scales, Reliability
Yuting Han; Zhehan Jiang; Lingling Xu; Fen Cai – AERA Online Paper Repository, 2024
To address the computational constraints of parameter estimation in the polytomous Cognitive Diagnosis Model (pCDM) in large-scale high data volume situations, this study proposes two two-stage polytomous attribute estimation methods: P_max and P_linear. The effects of the two-stage methods were studied via a Monte Carlo simulation study, and the…
Descriptors: Medical Education, Licensing Examinations (Professions), Measurement Techniques, Statistical Data
Jeffrey Matayoshi; Eric Cosyn; Christopher Lechuga; Hasan Uzun – International Educational Data Mining Society, 2024
ALEKS is an adaptive learning and assessment system, with courses covering subjects such as math, chemistry, and statistics. In this work, we focus on the ALEKS math courses, which cover a wide range of content starting at second grade math and continuing through college-level precalculus. To help instructors and students navigate this content,…
Descriptors: Student Placement, Evaluation Methods, Elementary Secondary Education, Accuracy
Bartolomé, Juan; Garaizar, Pablo; Larrucea, Xabier – Technology, Knowledge and Learning, 2022
During the last decades, digital competence has become essential at workplace. Nowadays, it is difficult to find a job where no ICT skills are required. At the same time, there is a lack of ecosystems for adult reskilling in digital competence. Moreover, most of them do not use of a common language and terminology, decreasing the possibilities of…
Descriptors: Technological Literacy, Performance Based Assessment, Evaluation Methods, Pragmatics
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Han, Chao – Language Testing, 2022
Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the…
Descriptors: Translation, Language Tests, Testing, Evaluation Methods
Liu, Yan; Zhang, Hongfeng – Journal of Education and e-Learning Research, 2022
Online learning is increasingly popular as the pandemic spreads around the globe. This shift in learning preferences presents opportunities and difficulties for the assessment of learning. As a method of assessment used throughout the learning process, formative assessment can encourage students' interest in learning, enhance learning outcomes,…
Descriptors: Formative Evaluation, Electronic Learning, Student Evaluation, Validity
Konstantinou, Ioannis Ch. – Open Journal for Educational Research, 2022
The purpose of this article is to review the literature on the issue of grading as a method and technique of expressing students' performance in terms of school reality. Initially, a growing concern about the role of assessment of student's performance in the learning and, generally, in the educational process, is highlighted. Subsequently, the…
Descriptors: Grading, Student Evaluation, Evaluation Methods, Performance Based Assessment
Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment
Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022
The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…
Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Simon G. Brett; Jacquiline E. den Houting; Melissa H. Black; Lauren P. Lawson; Julian Trollor; Samuel R. C. Arnold – Autism: The International Journal of Research and Practice, 2025
In autistic adults, measurement tools may not adequately differentiate between autistic characteristics and features of anxiety. This may be particularly evident in the case of social anxiety disorder; however, few measures of social anxiety disorder have been validated for autistic adults. Instead, assessments are often made using measures…
Descriptors: Autism Spectrum Disorders, Adults, Clinical Diagnosis, Comorbidity

Peer reviewed
Direct link
