Publication Date
| In 2026 | 0 |
| Since 2025 | 53 |
| Since 2022 (last 5 years) | 195 |
| Since 2017 (last 10 years) | 495 |
| Since 2007 (last 20 years) | 743 |
Descriptor
| Test Items | 1187 |
| Test Reliability | 1187 |
| Test Validity | 685 |
| Test Construction | 566 |
| Foreign Countries | 349 |
| Difficulty Level | 280 |
| Item Analysis | 253 |
| Psychometrics | 234 |
| Item Response Theory | 219 |
| Factor Analysis | 183 |
| Multiple Choice Tests | 173 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 12 |
| LaVenia, Mark | 5 |
| Liu, Ou Lydia | 5 |
| Anderson, Daniel | 4 |
| Bauduin, Charity | 4 |
| DiLuzio, Geneva J. | 4 |
| Farina, Kristy | 4 |
| Haladyna, Thomas M. | 4 |
| Huck, Schuyler W. | 4 |
| Petscher, Yaacov | 4 |
| Stansfield, Charles W. | 4 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 39 |
| Researchers | 30 |
| Teachers | 24 |
| Administrators | 13 |
| Support Staff | 3 |
| Counselors | 2 |
| Students | 2 |
| Community | 1 |
| Parents | 1 |
| Policymakers | 1 |
Location
| Turkey | 69 |
| Indonesia | 37 |
| Germany | 20 |
| Canada | 17 |
| Florida | 17 |
| China | 16 |
| Australia | 15 |
| California | 12 |
| Iran | 11 |
| India | 10 |
| New York | 9 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Lee, Eunjung; Lee, Won-Chan; Brennan, Robert L. – College Board, 2012
In almost all high-stakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can be used interchangeably. Test equating becomes even more challenging in mixed-format tests, such as Advanced Placement Program® (AP®) Exams, that contain both multiple-choice and constructed…
Descriptors: Test Construction, Test Interpretation, Test Norms, Test Reliability
Taskinen, Päivi H.; Steimel, Jochen; Gräfe, Linda; Engell, Sebastian; Frey, Andreas – Peabody Journal of Education, 2015
This study examined students' competencies in engineering education at the university level. First, we developed a competency model in one specific field of engineering: process dynamics and control. Then, the theoretical model was used as a frame to construct test items to measure students' competencies comprehensively. In the empirical…
Descriptors: Models, Engineering Education, Test Items, Outcome Measures
Foorman, Barbara R.; Petscher, Yaacov; Schatschneider, Chris – Florida Center for Reading Research, 2015
The FAIR-FS consists of computer-adaptive reading comprehension and oral language screening tasks that provide measures to track growth over time, as well as a Probability of Literacy Success (PLS) linked to grade-level performance (i.e., the 40th percentile) on the reading comprehension subtest of the Stanford Achievement Test (SAT-10) in the…
Descriptors: Reading Instruction, Screening Tests, Reading Comprehension, Oral Language
Cress, Cynthia J.; Lambert, Matthew C.; Epstein, Michael H. – Journal of Early Intervention, 2014
The Preschool Behavioral and Emotional Rating Scale (PreBERS) is an assessment of emotional and behavioral strengths in preschoolers with well-established reliability and validity for educational and clinical application in children with and without disabilities. The present study provides further evidence of psychometric rigor for items and…
Descriptors: Preschool Children, Rating Scales, Child Behavior, Behavior Problems
Thompson, James R.; Wehmeyer, Michael L.; Hughes, Carolyn; Shogren, Karrie A.; Palmer, Susan B.; See, Hyojeong – Grantee Submission, 2014
This article introduces the Supports Intensity Scale-Children's Version (SIS-C) designed and normed to be used with children across multiple contexts, including home, school, and community life. Steps taken to develop the scale are described, and findings from data collected on a field test version of the SIS-C are shared. Preliminary findings in…
Descriptors: Test Validity, Test Reliability, Children, Test Construction
del Carmen Domínguez Espinosa, Alejandra; van de Vijver, Fons J. R. – Measurement and Evaluation in Counseling and Development, 2014
We describe an Indigenous Social Desirability Scale for Mexico developed using a mixed-methods approach. Scores on the scale with two dimensions show adequate reliability and validity.
Descriptors: Indigenous Populations, Mixed Methods Research, Test Validity, Test Reliability
Perez, Kathryn E.; Hiatt, Anna; Davis, Gregory K.; Trujillo, Caleb; French, Donald P.; Terry, Mark; Price, Rebecca M. – CBE - Life Sciences Education, 2013
The American Association for the Advancement of Science 2011 report "Vision and Change in Undergraduate Biology Education" encourages the teaching of developmental biology as an important part of teaching evolution. Recently, however, we found that biology majors often lack the developmental knowledge needed to understand evolutionary…
Descriptors: Biology, Evolution, Development, Genetics
Becker, Kirk A.; Bergstrom, Betty A. – Practical Assessment, Research & Evaluation, 2013
The need for increased exam security, improved test formats, more flexible scheduling, better measurement, and more efficient administrative processes has caused testing agencies to consider converting the administration of their exams from paper-and-pencil to computer-based testing (CBT). Many decisions must be made in order to provide an optimal…
Descriptors: Testing, Models, Testing Programs, Program Administration
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Zandi, Hamed; Kaivanpanah, Shiva; Alavi, Seyed Mohammad – Iranian Journal of Language Teaching Research, 2014
Reviewing the test specifications to improve the quality of language tests may be a routine process in professional testing systems. However, there is a paucity of research about the effect of specifications review on improving the quality of small-scale tests. The purpose of the present study was twofold: how specifications review could help…
Descriptors: Test Reliability, Test Validity, Language Tests, Test Items
Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S. – Journal of Applied Testing Technology, 2014
This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Descriptors: Standard Setting, Difficulty Level, Test Items, Rating Scales
Brandriet, Alexandra R.; Bretz, Stacey Lowery – Journal of Chemical Education, 2014
This article describes the development of the Redox Concept Inventory (ROXCI) as a measure of students' understandings and confidence of both the symbolic and particulate domains of oxidation-reduction (redox) reactions. The ROXCI was created using a mixed-methods design in which the items were developed based upon themes that emerged from…
Descriptors: Science Instruction, Chemistry, Semi Structured Interviews, Evaluation Methods
Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013
The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Descriptors: Test Format, Test Items, Responses, Computation
Zumrawi, Abdel Azim; Bates, Simon P.; Schroeder, Marianne – Educational Research and Evaluation, 2014
This paper addresses the determination of statistically desirable response rates in students' surveys, with emphasis on assessing the effect of underlying variability in the student evaluation of teaching (SET). We discuss factors affecting the determination of adequate response rates and highlight challenges caused by non-response and lack of…
Descriptors: Inferences, Test Reliability, Response Rates (Questionnaires), Student Evaluation of Teacher Performance
Chiu, Chung-Yi; Jochman, Joseph; Fujikawa, Mayu; Strand, David; Cheing, Gladys; Lee, Gloria; Chan, Fong – Rehabilitation Research, Policy, and Education, 2014
Purpose: To examine the factorial structure of the "Coping Strategy Questionnaire"-24 (CSQ-24) in a sample of Canadians with chronic musculoskeletal pain. Method: The sample included 171 workers' compensation clients (50.9% men) recruited from outpatient rehabilitation facilities in Canada. Mean age of participants was 42.45 years (SD =…
Descriptors: Factor Analysis, Questionnaires, Coping, Measurement Techniques

Peer reviewed
Direct link
