ERIC Number: ED629130
Record Type: Non-Journal
Publication Date: 2020-Jul
Pages: 39
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
Improving the Science of Annotation for Natural Language Processing
Anglin, Kylie; Boguslav, Arielle; Hall, Todd
Grantee Submission
Text classification has allowed researchers to analyze natural language data at a previously impossible scale. However, a text classifier is only as valid as the the annotations on which it was trained. Further, the cost of training a classifier depends on annotators' ability to quickly and accurately apply the coding scheme to each text. Thus, researchers need guidance on how to generate training data with optimal efficiency and accuracy. To this end, this study proposes the single-case study design as a feasible and causally-valid research design for empirical decision-making in annotation projects. The key strength of the design is its ability to generate causal evidence with as few as one annotator. In this paper, we demonstrate the application of the single-case study in an applied experiment and argue that future researchers should incorporate the design into the pilot stage of annotation projects so that, over time, a causally-valid body of knowledge regarding the best annotation techniques is built. [This paper will be published in the "Journal of Data Science." The published article is titled: "Improving the Science of Annotation for Natural Language Processing: The Use of the Single-Case Study for Piloting Annotation Projects"]
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: Institute of Education Sciences (ED)
Authoring Institution: N/A
IES Funded: Yes
Grant or Contract Numbers: R305B200005
Author Affiliations: N/A