Publication Date
| In 2026 | 0 |
| Since 2025 | 197 |
| Since 2022 (last 5 years) | 1067 |
| Since 2017 (last 10 years) | 2577 |
| Since 2007 (last 20 years) | 4938 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003
Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…
Descriptors: Item Response Theory, Weighted Scores, Responses, Scores
van der Linden, Wim J. – Applied Psychological Measurement, 2006
Traditionally, error in equating observed scores on two versions of a test is defined as the difference between the transformations that equate the quantiles of their distributions in the sample and population of test takers. But it is argued that if the goal of equating is to adjust the scores of test takers on one version of the test to make…
Descriptors: Equated Scores, Evaluation Criteria, Models, Error of Measurement
ChanLin, Lih-Juan – Journal of Instructional Psychology, 2005
This paper uses the data from a survey among school teachers to conduct a series of factor analysis to test the reliability of a set of items to determine the factors deemed important in technology integration among teachers. The results suggest that there are specific dimensions of items that can be used to determine the factors perceived by…
Descriptors: Questionnaires, Technology Integration, Teacher Surveys, Factor Analysis
Belov, Dmitry I.; Armstrong, Ronald D. – Applied Psychological Measurement, 2005
A new test assembly algorithm based on a Monte Carlo random search is presented in this article. A major advantage of the Monte Carlo test assembly over other approaches (integer programming or enumerative heuristics) is that it performs a uniform sampling from the item pool, which provides every feasible item combination (test) with an equal…
Descriptors: Item Banks, Computer Assisted Testing, Monte Carlo Methods, Evaluation Methods
Enders, Craig K. – Educational and Psychological Measurement, 2004
A method for incorporating maximum likelihood (ML) estimation into reliability analyses with item-level missing data is outlined. An ML estimate of the covariance matrix is first obtained using the expectation maximization (EM) algorithm, and coefficient alpha is subsequently computed using standard formulae. A simulation study demonstrated that…
Descriptors: Intervals, Simulation, Test Reliability, Computation
Pomplun, Mark; Ritchie, Timothy – Journal of Educational Computing Research, 2004
This study investigated the statistical and practical significance of context effects for items randomized within testlets for administration during a series of computerized non-adaptive tests. One hundred and twenty-five items from four primary school reading tests were studied. Logistic regression analyses identified from one to four items for…
Descriptors: Psychometrics, Context Effect, Effect Size, Primary Education
Cheung, Mike W. L. – Structural Equation Modeling, 2004
Ipsative data (individual scores subject to a constant-sum constraint), suggested to minimize response bias, are sometimes observed in behavioral sciences. Chan and Bentler (1993, 1996) proposed a method to analyze ipsative data in a single-group case. Cheung and Chan (2002) extended the method to multiple-group analysis. However, these methods…
Descriptors: Behavioral Sciences, Data Analysis, Item Response Theory, Test Items
O'Neil, Timothy; Sireci, Stephen G.; Huff, Kristen L. – Educational Assessment, 2004
Educational tests used for accountability purposes must represent the content domains they purport to measure. When such tests are used to monitor progress over time, the consistency of the test content across years is important for ensuring that observed changes in test scores are due to student achievement rather than to changes in what the test…
Descriptors: Test Items, Cognitive Ability, Test Content, Science Teachers
Alderson, J. Charles; Figueras, Neus; Kuijper, Henk; Nold, Guenter; Takala, Sauli; Tardieu, Claire – Language Assessment Quarterly, 2006
The Common European Framework of Reference (CEFR) is intended as a reference document for language education including assessment. This article describes a project that investigated whether the CEFR can help test developers construct reading and listening tests based on CEFR levels. If the CEFR scales together with the detailed description of…
Descriptors: Test Content, Listening Comprehension Tests, Classification, Test Construction
Hernandez, Jose M.; Rubio, Victor J.; Revuelta, Javier; Santacreu, Jose – Educational and Psychological Measurement, 2006
Trait psychology implicitly assumes consistency of the personal traits. Mischel, however, argued against the idea of a general consistency of human beings. The present article aims to design a statistical procedure based on an adaptation of the pi* statistic to measure the degree of intraindividual consistency independently of the measure used.…
Descriptors: Personality Traits, Reliability, Test Items, Item Response Theory
Cunningham, J. Barton; MacGregor, James N. – Teaching of Psychology, 2006
This article illustrates the application of the Echo approach, originally designed to identify values of different cultures and subcultures, to the generation of questionnaire items for students to evaluate faculty teaching performance. Students preferred items generated using the Echo method over faculty-designed items and items developed by…
Descriptors: Student Evaluation of Teacher Performance, Teacher Effectiveness, Evaluation Methods, Test Items
de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S. – Applied Psychological Measurement, 2006
The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…
Descriptors: Computation, Monte Carlo Methods, Markov Processes, Item Response Theory
Holcomb, John; Spalsbury, Angela – Journal of Statistics Education, 2005
Textbooks and websites today abound with real data. One neglected issue is that statistical investigations often require a good deal of "cleaning" to ready data for analysis. The purpose of this dataset and exercise is to teach students to use exploratory tools to identify erroneous observations. This article discusses the merits of such…
Descriptors: Experiential Learning, Data Processing, Data Analysis, Error Correction
Hamzah, Hanizah; Ariffin, Siti Rahayah; Yassin, Ruhizan Mohd – Journal of Science and Mathematics Education in Southeast Asia, 2006
This study explored the differential performances of mathematics test items used to test secondary school girls and boys in the national examination. The main purpose was to find out whether type of items is the reason for girls' overachievement in the Malaysian mathematics national examination. To investigate seven types of items, Differential…
Descriptors: Test Items, Test Format, Females, Overachievement
Frisby, Craig L.; Osterlind, Steven J. – Journal of Psychoeducational Assessment, 2006
Modern scale construction techniques have been used to develop scales measuring examiner ratings of examinees' test session behavior (TSB) on Wechsler and Stanford-Binet intelligence tests. This study analyzes data from the Test Session Observation Checklist (TSOC), a measure developed by post hoc rational analysis, from a portion of the Woodcock…
Descriptors: Behavior, Measures (Individuals), Check Lists, Observation

Peer reviewed
Direct link
