NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
PDF on ERIC Download full text
ERIC Number: ED607910
Record Type: Non-Journal
Publication Date: 2020-Jul
Pages: 12
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
Feature Selection Metrics: Similarities, Differences, and Characteristics of the Selected Models
Sanyal, Debopam; Bosch, Nigel; Paquette, Luc
International Educational Data Mining Society, Paper presented at the International Conference on Educational Data Mining (EDM) (13th, Online, Jul 10-13, 2020)
Supervised machine learning has become one of the most important methods for developing educational and intelligent tutoring software; it is the backbone of many educational data mining methods for estimating knowledge, emotion, and other aspects of learning. Hence, in order to ensure optimal utilization of computing resources and effective analysis of models, it is essential that researchers know which evaluation metrics are best suited to educational data. In this article, we focus on the problem of wrapper feature selection, where predictors are added to models based on how much they improve model accuracy in terms of a given metric. We compared commonly-used machine learning algorithms including naive Bayes, support vector machines, logistic regression, and random forests on 11 diverse learning-related datasets. We optimized feature selection based on nine different metrics, then evaluated each to address research questions about how effective each metric was in terms of the others (e.g., does optimizing for precision also result in good F1?) as well as calibration (i.e., are predictions produced by models accurate probabilities of correctness?). We provide empirical evidence that the Matthews correlation coefficient (MCC) produced the overall best results across the other metrics, but that root mean squared error (RMSE) selected the best-calibrated models. Finally, we also discuss issues related to the number of features selected when optimizing for each metric, as well as the types of datasets for which certain metrics were more effective. [For the full proceedings, see ED607784.]
International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: http://www.educationaldatamining.org
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A