Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 32 |
Descriptor
Source
Author
| Abdulrahman Alshammari | 1 |
| Adkins, Deborah | 1 |
| Alonzo, Alicia C. | 1 |
| Baker, Beverly A. | 1 |
| Baldwin, Su G. | 1 |
| Buffolino, Judy | 1 |
| Camilli, Gregory | 1 |
| Cech, Scott J. | 1 |
| Charles, Jennifer E. | 1 |
| Childs, Ruth A. | 1 |
| Clauser, Brian E. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 29 |
| Reports - Research | 12 |
| Opinion Papers | 9 |
| Reports - Evaluative | 8 |
| Reports - Descriptive | 2 |
| Dissertations/Theses -… | 1 |
| Numerical/Quantitative Data | 1 |
Education Level
| Elementary Secondary Education | 14 |
| Higher Education | 4 |
| Elementary Education | 3 |
| Postsecondary Education | 3 |
| Adult Education | 1 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Junior High Schools | 1 |
| Secondary Education | 1 |
Audience
Location
| Arizona | 1 |
| California | 1 |
| Canada | 1 |
| Colorado | 1 |
| Delaware | 1 |
| Denmark | 1 |
| Germany | 1 |
| Ghana | 1 |
| Idaho | 1 |
| Illinois | 1 |
| Indiana | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
| Advanced Placement… | 1 |
| Stanford Achievement Tests | 1 |
What Works Clearinghouse Rating
Abdulrahman Alshammari – ProQuest LLC, 2024
A critical component of modern software development practices, particularly continuous integration (CI), is the halt of development activities in response to test failures which requires further investigation and debugging. As software changes, regression testing becomes vital to verify that new code does not affect existing functionality.…
Descriptors: Computer Software, Programming, Coding, Test Reliability
Mücahit Öztürk – Open Praxis, 2024
This study examined the problems that pre-service teachers face in the online assessment process and their suggestions for solutions to these problems. The participants were 136 pre-service teachers who have been experiencing online assessment for a long time and who took the Foundations of Open and Distance Learning course. This research is a…
Descriptors: Foreign Countries, Preservice Teacher Education, Preservice Teachers, Distance Education
Sinharay, Sandip – Applied Measurement in Education, 2017
Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…
Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis
Orrill, Chandra Hawley; Kim, Ok-Kyeong; Peters, Susan A.; Lischka, Alyson E.; Jong, Cindy; Sanchez, Wendy B.; Eli, Jennifer A. – Mathematics Teacher Education and Development, 2015
Developing and writing assessment items that measure teachers' knowledge is an intricate and complex undertaking. In this paper, we begin with an overview of what is known about measuring teacher knowledge. We then highlight the challenges inherent in creating assessment items that focus specifically on measuring teachers' specialised knowledge…
Descriptors: Specialization, Knowledge Base for Teaching, Educational Strategies, Testing Problems
Gergen, Kenneth J.; Dixon-Román, Ezekiel J. – Teachers College Record, 2014
In the present offering we challenge the presumption that the educational testing of students provides objective information about such students. This presumption largely rests on an empiricist account of science. In light of mounting criticism, however, empiricist foundationalism has given way to a social epistemology. From this standpoint,…
Descriptors: Epistemology, Educational Testing, Test Validity, Evaluation Utilization
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Mori, Kazuo; Uchida, Akitoshi – Research in Education, 2012
Longitudinal change in the average Z scores for four groups of pupils sorted by quartiles was examined for its stability over three years. The data, collected from 1998 to 2009, was obtained from nine cohorts of Japanese junior high school pupils totaling 1,962 subjects. It showed illusionary declines among the mid-range pupils but improvements…
Descriptors: Foreign Countries, Junior High School Students, Cohort Analysis, Evaluation Problems
Haberman, Shelby J. – Educational Testing Service, 2010
Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…
Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy
Makransky, Guido; Glas, Cees A. W. – International Journal of Testing, 2013
Cognitive ability tests are widely used in organizations around the world because they have high predictive validity in selection contexts. Although these tests typically measure several subdomains, testing is usually carried out for a single subdomain at a time. This can be ineffective when the subdomains assessed are highly correlated. This…
Descriptors: Foreign Countries, Cognitive Ability, Adaptive Testing, Feedback (Response)
Penfield, Randall D.; Gattamorta, Karina; Childs, Ruth A. – Educational Measurement: Issues and Practice, 2009
Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item-level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has…
Descriptors: Test Bias, Test Items, Evaluation Methods, Scores
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
de La Torre, Jimmy; Karelitz, Tzur M. – Journal of Educational Measurement, 2009
Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM-based data with a linear attribute structure. The study utilizes a procedure…
Descriptors: Simulation, Item Response Theory, Psychometrics, Evaluation Methods
Cech, Scott J. – Education Week, 2008
There's a war of sorts going on within the normally staid assessment industry, and it's a war over the definition of a type of assessment that many educators understand in only a sketchy fashion. Formative assessments, also known as "classroom assessments," are in some ways easier to define by what they are not. They're not like the long,…
Descriptors: Formative Evaluation, Testing, Evaluation Problems, Testing Problems
Harmon, Oskar R.; Lambrinos, James; Buffolino, Judy – Online Journal of Distance Learning Administration, 2010
Many consider online courses to be an inferior alternative to traditional face-to-face (f2f) courses because exam cheating is thought to occur more often in online courses. This study examines how the assessment design in online courses contributes to this perception. Following a literature review, the assessment design in a sample of online…
Descriptors: Electronic Learning, Student Attitudes, Cheating, Online Courses
Baker, Beverly A. – Assessing Writing, 2010
In high-stakes writing assessments, rater training in the use of a rating scale does not eliminate variability in grade attribution. This realisation has been accompanied by research that explores possible sources of rater variability, such as rater background or rating scale type. However, there has been little consideration thus far of…
Descriptors: Foreign Countries, Writing Evaluation, Writing Tests, Testing

Direct link
Peer reviewed
