NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 4 results Save | Export
Cai, Zhiqiang; Siebert-Evenstone, Amanda; Eagan, Brendan; Shaffer, David Williamson – Grantee Submission, 2021
When text datasets are very large, manually coding line by line becomes impractical. As a result, researchers sometimes try to use machine learning algorithms to automatically code text data. One of the most popular algorithms is topic modeling. For a given text dataset, a topic model provides probability distributions of words for a set of…
Descriptors: Coding, Artificial Intelligence, Models, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Lijin Zhang; Xueyang Li; Zhiyong Zhang – Grantee Submission, 2023
The thriving developer community has a significant impact on the widespread use of R software. To better understand this community, we conducted a study analyzing all R packages available on CRAN. We identified the most popular topics of R packages by text mining the package descriptions. Additionally, using network centrality measures, we…
Descriptors: Computer Software, Programming Languages, Data Analysis, Visual Aids
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Cai, Zhiqiang; Li, Hiyiang; Hu, Xiangen; Graesser, Art – Grantee Submission, 2016
This paper provides an alternative way of document representation by treating topic probabilities as a vector representation for words and representing a document as a combination of the word vectors. A comparison on summary data shows that this representation is more effective in document classification. [This paper was published in:…
Descriptors: Probability, Natural Language Processing, Models, Automation
Zhang, Zhiyong; Zhang, Danyang – Grantee Submission, 2021
Data science has maintained its popularity for about 20 years. This study adopts a bottom-up approach to understand what data science is by analyzing the descriptions of courses offered by the data science programs in the United States. Through topic modeling, 14 topics are identified from the current curricula of 56 data science programs. These…
Descriptors: Statistics Education, Definitions, Course Descriptions, Computer Science Education