ERIC Number: ED636561
Record Type: Non-Journal
Publication Date: 2021
Pages: 297
Abstractor: As Provided
ISBN: 979-8-3798-3555-2
ISSN: N/A
EISSN: N/A
Available Date: N/A
Towards Optimal Measurement and Theoretical Grounding of L2 English Elicited Imitation: Examining Scales, (Mis)Fits, and Prompt Features from Item Response Theory and Random Forest Approaches
Ji-young Shin
ProQuest LLC, Ph.D. Dissertation, Purdue University
The present dissertation investigated the impact of scales/scoring methods and prompt linguistic features on the measurement quality of L2 English elicited imitation (EI). Scales/scoring methods are an important feature for the validity and reliability of L2 EI test, but less is known (Yan et al., 2016). Prompt linguistic features are also known to influence EI test quality, particularly item difficulty, but item discrimination or corpus-based, fine-grained measures have rarely been incorporated into examining the contribution of prompt linguistic features. The current study addressed the research needs, using item response theory (IRT) and random forest modeling. Data consisted of 9,348 oral responses to forty-eight items, including EI prompts, item scores, and rater comments, which were collected from 779 examinees of an L2 English EI test at Purdue University. First, the study explored the current and alternative EI scales/scoring methods that measure grammatical/semantic accuracy, focusing on optimal IRT-based measurement qualities (RQ1 through RQ4 in Phase ?). Next, the project identified important prompt linguistic features that predict EI item difficulty and discrimination across different scales/scoring methods and proficiency, using multi-level modeling and random forest regression (RQ5 and RQ6 in Phase ?). The main findings were (although not limited to): 1) collapsing exact repetition and paraphrase categories led to more optimal measurement (i.e., adequacy of item parameter values, category functioning, and model/item/person fit) (RQ1); there were fewer misfitting persons with lower proficiency and higher frequency of unexpected responses in the extreme categories (RQ2); the inconsistency of qualitatively distinguishing semantic errors and the wide range of grammatical accuracy in the minor error category contributed to misfit (RQ3); a quantity-based, 4-category ordinal scale outperformed quality-based or binary scales (RQ4); sentence length significantly explained item difficulty only, with small variance explained (RQ5); Corpus-based lexical measures and phrase-level syntactic complexity were important to predicting item difficulty, particularly for the higher ability level. The findings made implications for EI scale/item development in human and automatic scoring settings and L2 English proficiency development. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Semantics, Grammar, Accuracy, Item Response Theory, Cues, Imitation, Test Validity, Test Reliability, Computational Linguistics, Item Analysis, Test Items, Scoring, Language Proficiency
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A