Automated Scoring of Chinese Grades 7-9 Students' Competence in Interpreting and Arguing from Evidence.

Wang, Cong; Liu, Xiufeng; Wang, Lei; Sun, Ying; Zhang, Hongyan

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1292771

Record Type: Journal

Publication Date: 2021-Apr

Pages: 14

Abstractor: As Provided

ISBN: N/A

ISSN: ISSN-1059-0145

EISSN: N/A

Available Date: N/A

Automated Scoring of Chinese Grades 7-9 Students' Competence in Interpreting and Arguing from Evidence

Wang, Cong; Liu, Xiufeng; Wang, Lei; Sun, Ying; Zhang, Hongyan

Journal of Science Education and Technology, v30 n2 p269-282 Apr 2021

Assessing scientific argumentation is one of main challenges in science education. Constructed-response (CR) items can be used to measure the coherence of student ideas and inform science instruction on argumentation. Published research on automated scoring of CR items has been conducted mostly in English writing, rarely in other languages. The objective of this study is to investigate issues related to the automated scoring of Chinese written responses. LightSIDE was used to score students' written responses in Chinese. The sample of this study was from Beijing (grades 7-9) consisting of 4000 students. Items for assessing interpreting data and making claims under an ecological topic developed by the Stanford NGSS Assessment Project were translated into Chinese and used to assess student competence of interpreting data and making claims. The results show that: (1) at least 800 human-scored student responses were needed as the training sample size to accurately build scoring models. When doubling the training sample size, the accuracy in kappa increased only slightly by 0.03-0.04; (2) there was a nearly perfect agreement between human scoring and computer-automated scoring based on both holistic scores and analytic scores, although analytic scores produced slightly better accuracy than holistic scores; (3) automated scoring accuracy did not differ substantially by student response length, although shorter text length produced slightly higher human-machine agreement. The above findings suggest that automated scoring of Chinese writings produced a similar level of accuracy compared with that of English writings reported in literature, although there are specific considerations, e.g., training data set size, scoring rubric, and text lengths, to be considered using automated scoring of student written responses in Chinese.

Descriptors: Automation, Scoring, Accuracy, Responses, Competence, Science Instruction, Persuasive Discourse, Data Interpretation, Written Language, Grade 7, Grade 8, Grade 9, Foreign Countries

Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link-springer-com.bibliotheek.ehb.be/

Publication Type: Journal Articles; Reports - Research

Education Level: Elementary Education; Grade 7; Junior High Schools; Middle Schools; Secondary Education; Grade 8; Grade 9; High Schools

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Identifiers - Location: China (Beijing)

Grant or Contract Numbers: N/A

Author Affiliations: N/A