The Effects of Primacy on Rater Cognition: An Eye-Tracking Study.

Ballard, Laura

Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making decisions and inferences about writers" (Weigle, 2002, p. 108). In the current study I answer the call for continued research on rating processes by investigating rater cognition in the context of rubric use in writing assessment. This type of research is especially important for rater training and rubric development because, despite efforts to guide raters to a common understanding of the rubric criteria and to help raters converge on a common understanding of scoring bands, variance in rater scoring and rater behavior persists. The goal is not to eliminate the variance (which can't be done when using human raters), rather the goal is to reduce it and understand it. Researchers have shown that trained raters do not always use rubric criteria in consistent ways, nor do they consistently use the same processes to score samples. This is relevant for the design of and use of scores from analytic rubrics, as raters are expected to allocate equal attention to each criterion within an analytic rubric, and non-equal attention has been shown to coincide with category reliability (Winke & Lim, 2015), and, therefore, overall test reliability. One factor which has not been investigated in assessment research is the role of information-primacy in rater cognition. Thus, in this study I investigate the primacy effect in relation to rater-rubric interactions. I also investigate criteria ordering effects on raters' processing of the criteria while rating. Specifically, I investigate (1) whether the position of a category affects raters' assignment of importance to the category; (2) whether the position of a category affects raters' memory of a category; (3) whether raters pay more or less attention to a rubric category depending on its position in the rubric; (4) whether the position of the category affects the inter-rater reliability of a category; and (5) whether the position of a category affects the scores that raters assign to the category. I employed a mixed-methods within-subjects design. I included eye-tracking methodology, criteria importance surveys, criteria recall tasks, decision-making process outlines, and rater interviews. Thirty-one novice raters were randomly assigned to two groups, who, for counterbalancing purposes, were trained on two rubrics in two phases. The rubrics were a standard rubric (from Polio, 2013) and a reordered rubric (identical to the standard rubric, except with categories appearing in a mirrored order to the reordered rubric). In round 1, raters trained on one of the two rubrics and rated the same 20 essays using the rubric. The second round took place five weeks after the completion of the first. In round 2, raters trained on the alternate rubric and re-rated the same 20 essays. Throughout the two rounds, I utilized several data-collection tools to investigate rater's cognition and behavior related to their rubric of training. Using Criteria Importance Surveys, I examined raters' beliefs about category importance. From the Criteria Recall Tasks, I examined raters' recall of the descriptors in each rubric category. With eye tracking methodology, I recorded the raters' focus on the rubric criteria during essay rating to uncover how raters used the rubric criteria based on the position of the categories. Finally, from raters' essay scores, I examined the raters' scoring consistency and severity for each rubric category. The multiple data measures tell the same story: as novice raters train on a new rubric and assign scores using the individual categories on the rubric, the raters' behavior pertaining to the outer-most positions (e.g., left-most and right-most) seems most susceptible to ordering effects. That is, the findings of this study show that the category position affected the raters' beliefs about what criteria are the most and least important when scoring an essay, how many descriptors raters were able to recall from a category, how much attention raters paid to a category on the rubric while rating, and how severely raters scored a given category. Additionally, the findings provided evidence that there was an interplay between the category type and category positions, resulting in either more pronounced primacy effects or leveling effects for individual rubric categories. Based on these findings, I discuss rater training, rubric design, and test-construct considerations for rubric designers and test developers. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]