Publication Date
| In 2026 | 10 |
| Since 2025 | 2328 |
| Since 2022 (last 5 years) | 12843 |
| Since 2017 (last 10 years) | 33968 |
| Since 2007 (last 20 years) | 68459 |
Descriptor
| Foreign Countries | 30579 |
| Test Validity | 21757 |
| Scores | 18263 |
| Academic Achievement | 16934 |
| Test Construction | 16763 |
| Test Reliability | 15036 |
| Achievement Tests | 14864 |
| Standardized Tests | 14724 |
| Comparative Analysis | 14431 |
| Elementary Secondary Education | 13046 |
| Language Tests | 12551 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 5034 |
| Teachers | 3394 |
| Researchers | 2630 |
| Policymakers | 1232 |
| Administrators | 979 |
| Students | 687 |
| Parents | 325 |
| Counselors | 216 |
| Community | 162 |
| Support Staff | 50 |
| Media Staff | 34 |
| More ▼ | |
Location
| Turkey | 2823 |
| Australia | 2430 |
| Canada | 2270 |
| California | 1854 |
| United States | 1727 |
| Texas | 1615 |
| China | 1579 |
| United Kingdom | 1315 |
| Florida | 1312 |
| United Kingdom (England) | 1203 |
| Germany | 1123 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 121 |
| Meets WWC Standards with or without Reservations | 189 |
| Does not meet standards | 174 |
Rustam, Ahmad; Naga, Dali Santun; Supriyati, Yetti – International Journal of Education and Literacy Studies, 2019
Detection of differential item functioning (DIF) is needed in the development of tests to obtain useful items. The Mantel-Haenszel method and standardization are tools for DIF detection based on classical theory assumptions. The study was conducted to highlight the sensitivity and accuracy between the Mantel-Haenszel method and the standardization…
Descriptors: Statistical Analysis, Test Bias, Accuracy, Multiple Choice Tests
White, Chris; Marshall, Jeff C.; Alston, Danny – School Science and Mathematics, 2019
School STEM Culture--an aspect of culture within a school community--is defined as the beliefs, values, practices, and resources in STEM fields as perceived by students, parents, teachers, and administrators and counselors within a school. This study validates the STEM Culture Assessment Tool (STEM-CAT), an instrument intended to advance the use…
Descriptors: School Culture, STEM Education, Test Construction, Test Validity
Kelley, Kairn Stetler; Littenberg, Benjamin – Journal of Speech, Language, and Hearing Research, 2019
Method: Sixty English-speaking children, 7-14 years old with normal hearing, had a single study visit during which each test was administered twice. Changes on retest were summarized by within-subject standard deviation ( S[subscript w]), compared among tests, and compared with binomial model predictions. Correlates of variance were explored.…
Descriptors: Children, Early Adolescents, Listening Skills, Test Reliability
Kosh, Audra E.; Simpson, Mary Ann; Bickel, Lisa; Kellogg, Mark; Sanford-Moore, Ellie – Educational Measurement: Issues and Practice, 2019
Automatic item generation (AIG)--a means of leveraging technology to create large quantities of items--requires a minimum number of items to offset the sizable upfront investment (i.e., model development and technology deployment) in order to achieve cost savings. In this cost-benefit analysis, we estimated the cost of each step of AIG and manual…
Descriptors: Cost Effectiveness, Automation, Test Items, Mathematics Tests
Carney, Michele; Crawford, Angela; Siebert, Carl; Osguthorpe, Rich; Thiede, Keith – Applied Measurement in Education, 2019
The "Standards for Educational and Psychological Testing" recommend an argument-based approach to validation that involves a clear statement of the intended interpretation and use of test scores, the identification of the underlying assumptions and inferences in that statement--termed the interpretation/use argument, and gathering of…
Descriptors: Inquiry, Test Interpretation, Validity, Scores
Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019
This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…
Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education
Wolkowitz, Amanda A.; Wright, Keith D. – Journal of Educational Measurement, 2019
This article explores the amount of equating error at a passing score when equating scores from exams with small samples sizes. This article focuses on equating using classical test theory methods of Tucker linear, Levine linear, frequency estimation, and chained equipercentile equating. Both simulation and real data studies were used in the…
Descriptors: Error Patterns, Sample Size, Test Theory, Test Bias
Pentimonti, J.; Petscher, Y.; Stanley, C. – National Center on Improving Literacy, 2019
Sample representativeness is an important piece to consider when evaluating the quality of a screening assessment. If you are trying to determine whether or not the screening tool accurately measures children's skills, you want to ensure that the sample that is used to validate the tool is representative of your population of interest.
Descriptors: Sampling, Screening Tests, Measurement, Test Validity
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.
Descriptors: Scores, Measurement, Test Reliability, Error Patterns
Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Educational and Psychological Measurement, 2022
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…
Descriptors: Virtual Classrooms, Artificial Intelligence, Item Response Theory, Item Analysis
Schneider, M. Christina; Agrimson, Jared; Veazey, Mary – Educational Measurement: Issues and Practice, 2022
This paper presents results of a score interpretation study for a computer adaptive mathematics assessment. The study purpose was to test the efficacy of item developers' alignment of items to Range Achievement-Level Descriptors (RALDs; Egan et al.) against the empirical achievement-level alignment of items to investigate the use of RALDs as the…
Descriptors: Computer Assisted Testing, Mathematics Tests, Scores, Grade 3
Mohammadkhah, Ebrahim; Kiany, Gholam Reza; Tajeddin, Zia; ShayesteFar, Parvaneh – International Journal of Language Testing, 2022
The contemporary era of learning-oriented assessment (LOA) demands teacher professional efforts to appropriately and accurately assess learners' attainment and use the assessment results for the enhancement of learning. In second/foreign language (L2) discipline, this has recently brought language assessment literacy (LAL) to the forefront,…
Descriptors: Language Teachers, Teacher Attitudes, Language Tests, Knowledge Base for Teaching
Romano, Luciano; Angelini, Giacomo; Consiglio, Piermarco; Fiorilli, Caterina – Education Sciences, 2022
Burnout is psychological, physical, and emotional suffering that may affect students with low or inadequate resources to face stressful events at school. Although the existing instruments are used worldwide to assess school burnout risk, they show several flaws and mainly focus on the emotional facets of the syndrome. No previous studies have…
Descriptors: Foreign Countries, Burnout, Measures (Individuals), Symptoms (Individual Disorders)
Shin, Ji-young – Language Testing, 2022
With the present study I investigated the sources of score variance and dependability in a local oral English proficiency test for potential international teaching assistants (ITAs) across four first language (L1) groups, and suggested alternative test designs. Using generalizability theory, I examined the relative importance of L1s (i.e., Indian,…
Descriptors: Foreign Students, Language Tests, Language Proficiency, Oral Language
Carretti, Barbara; Cornoldi, Cesare; Antonello, Arianna; Di Criscienzo, Laura; Toffalini, Enrico – Scientific Studies of Reading, 2022
The study examines whether the average performance of the population with dyslexia in a working memory measure can be inferred dimensionally from the characteristics of the typical population. Specifically, we focused on Associative Phonological Working Memory (APWM), an ability that we predicted being impaired in dyslexia due to the relationship…
Descriptors: Dyslexia, Short Term Memory, Reading Ability, Associative Learning

Peer reviewed
Direct link
