Publication Date
In 2025 | 39 |
Since 2024 | 162 |
Since 2021 (last 5 years) | 585 |
Since 2016 (last 10 years) | 1221 |
Since 2006 (last 20 years) | 2727 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 169 |
Practitioners | 49 |
Teachers | 32 |
Administrators | 8 |
Policymakers | 8 |
Counselors | 4 |
Students | 4 |
Media Staff | 1 |
Location
Turkey | 172 |
Australia | 81 |
Canada | 79 |
China | 69 |
United States | 55 |
Germany | 43 |
Taiwan | 43 |
Japan | 40 |
United Kingdom | 38 |
Iran | 36 |
Spain | 33 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Does not meet standards | 1 |
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
Two independent statistical tests of item compromise are presented, one based on the test takers' responses and the other on their response times (RTs) on the same items. The tests can be used to monitor an item in real time during online continuous testing but are also applicable as part of post hoc forensic analysis. The two test statistics are…
Descriptors: Test Items, Item Analysis, Item Response Theory, Computer Assisted Testing
Wang, Weimeng – ProQuest LLC, 2022
Recent advancements in testing differential item functioning (DIF) have greatly relaxed restrictions made by the conventional multiple group item response theory (IRT) model with respect to the number of grouping variables and the assumption of predefined DIF-free anchor items. The application of the L[subscript 1] penalty in DIF detection has…
Descriptors: Factor Analysis, Item Response Theory, Statistical Inference, Item Analysis
Sung-eun Baek; Christine Myung-hee Ahn – Journal of Psychoeducational Assessment, 2025
The purpose of this study was to evaluate the reliability and validity of the Korean Behavior Assessment System for Children 3rd Edition Teacher Rating Scales--Child Form (K·BASC-3 TRS-C). We used the generalized partial credit model based on item response theory (IRT) to examine the internal validity of the scale's items and the latent trait in a…
Descriptors: Reliability, Teacher Attitudes, Elementary Secondary Education, Asians
Hsieh-Chih Lai; Hsin-Yi Lien – Educational Management Administration & Leadership, 2025
Principal instructional leadership (PIL) refers to the management of school curriculum, instruction, and assessment by the principal of a school. It is essential to measure the extent of the instructional leadership provided by principals and to propose means of improving instructional leadership. The principal instructional leadership scale…
Descriptors: Instructional Leadership, High Schools, Principals, Rating Scales
Haladyna, Thomas M.; Rodriguez, Michael C. – Educational Assessment, 2021
Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative…
Descriptors: Test Items, Item Analysis, Reading Tests, Mathematics Tests
Aybek, Eren Can – Journal of Applied Testing Technology, 2021
The study aims to introduce catIRT tools which facilitates researchers' Item Response Theory (IRT) and Computerized Adaptive Testing (CAT) simulations. catIRT tools provides an interface for mirt and catR packages through the shiny package in R. Through this interface, researchers can apply IRT calibration and CAT simulations although they do not…
Descriptors: Item Response Theory, Computer Assisted Testing, Simulation, Models
Carmen Batanero; Luis A. Hernandez-Solis; Maria M. Gea – Statistics Education Research Journal, 2023
We present an exploratory study of Costa Rican and Spanish students' (11-16-year-olds) competence to compare probabilities in urns and compare ratios in mixture problems. A sample of 704 students in Grades 6 through to Grade 10, 292 from Costa Rica and 412 from Spain, were given one of two forms of a questionnaire with three probability comparison…
Descriptors: Statistics Education, Comparative Analysis, Foreign Countries, Probability
Laura Laclede – ProQuest LLC, 2023
Because non-cognitive constructs can influence student success in education beyond academic achievement, it is essential that they are reliably conceptualized and measured. Within this context, there are several gaps in the literature related to correctly interpreting the meaning of scale scores when a non-standard response option like I do not…
Descriptors: High School Students, Test Wiseness, Models, Test Items
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Richard O'Donovan; Nicola Sum – Australian Educational Researcher, 2024
This paper examines the psychometric properties of an existing school-based Parent Opinion Survey (POS) in order to investigate its validity as a measure of parent sentiments which may (eventually) be used to better inform the decision making of school leaders. The study focusses on the POS administered by all Victorian public schools at the time…
Descriptors: Parent Surveys, Parent Attitudes, Family School Relationship, Outcomes of Education
Ingela Holmström; Krister Schönström; Magnus Ryttervik – Language Assessment Quarterly, 2024
There is a lack of tests available for assessing sign language proficiency among L2 learners. We have therefore developed a sign repetition test, SignRepL2, with a specific focus on the phonological features of signs. This paper describes the two phases of developing this test. In the first phase, content was developed in the form of 50 items with…
Descriptors: Sign Language, Novices, Task Analysis, Second Language Learning
Krishna Mohan Surapaneni; Anusha Rajajagadeesan; Lakshmi Goudhaman; Shalini Lakshmanan; Saranya Sundaramoorthi; Dineshkumar Ravi; Kalaiselvi Rajendiran; Porchelvan Swaminathan – Biochemistry and Molecular Biology Education, 2024
The emergence of ChatGPT as one of the most advanced chatbots and its ability to generate diverse data has given room for numerous discussions worldwide regarding its utility, particularly in advancing medical education and research. This study seeks to assess the performance of ChatGPT in medical biochemistry to evaluate its potential as an…
Descriptors: Biochemistry, Science Instruction, Artificial Intelligence, Teaching Methods
Mehmet Kanik – International Journal of Assessment Tools in Education, 2024
ChatGPT has surged interest to cause people to look for its use in different tasks. However, before allowing it to replace humans, its capabilities should be investigated. As ChatGPT has potential for use in testing and assessment, this study aims to investigate the questions generated by ChatGPT by comparing them to those written by a course…
Descriptors: Artificial Intelligence, Testing, Multiple Choice Tests, Test Construction
Abbie Raikes; Rebecca Sayre Mojgani; Jem Heinzel-Nelson Alvarenga Lima; Dawn Davis; Cecelia Cassell; Marcus Waldman; Elsa Escalante – International Journal of Early Childhood, 2024
Quality early childhood care and education (ECCE) is important for young children's holistic healthy development. As ECCE scales, contextually relevant and feasible measurement is needed to inform policy and programs on strengths and areas for improvement. However, few measures have been designed for use across diverse contexts. Drawing on…
Descriptors: Foreign Countries, Early Childhood Education, Educational Quality, Program Effectiveness
Jennifer D. Deaton; Megan A. Whitbeck – Measurement and Evaluation in Counseling and Development, 2024
Objective: This study evaluated score reliability of the Professional Quality of Life Scale (ProQoL) when contextualizing "help" to a relevant derivative. Method: The researchers evaluated score reliability across three datasets among school-based professionals (n = 122), teachers (n = 216), and mental health professionals (n = 543)…
Descriptors: Measures (Individuals), Quality of Life, School Personnel, Teachers