ERIC Number: EJ1462635
Record Type: Journal
Publication Date: 2025-Mar
Pages: 5
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1470-8175
EISSN: EISSN-1539-3429
Available Date: 2024-12-05
Representing DNA for Machine Learning Algorithms: A Primer on One-Hot, Binary, and Integer Encodings
Yash Munnalal Gupta1,2; Satwika Nindya Kirana3; Somjit Homchan1,2
Biochemistry and Molecular Biology Education, v53 n2 p142-146 2025
This short paper presents an educational approach to teaching three popular methods for encoding DNA sequences: one-hot encoding, binary encoding, and integer encoding. Aimed at bioinformatics and computational biology students, our learning intervention focuses on developing practical skills in implementing these essential techniques for efficient representation and analysis of genetic data. The primary goal of this study is to enhance students' understanding and practical application of DNA encoding methods, which are crucial for various computational analyses in bioinformatics. Our intervention consists of three key components: (1) a conceptual framework that contextualizes these encoding methods within broader bioinformatics applications, (2) an interactive Jupyter Notebook with Python code examples (https://github.com/yashmgupta/Representing-DNA/tree/main), and (3) a user-friendly Streamlit application for visualizing encoded sequences (https://dnaencoding.streamlit.app/) that also enables students to input their own DNA sequences and visualize the different encoding methods, further enhancing their understanding and practical experience. By combining conceptual overview with practical coding and visualization tools, our approach provides a comprehensive foundation for students to leverage these key DNA sequence encoding methods in their future work. This study contributes to bioinformatics education by offering effective, hands-on learning resources that bridge the gap between theoretical knowledge and practical application in DNA sequence analysis, preparing students for advanced research and data analysis projects in the field.
Descriptors: Science Instruction, Teaching Methods, Genetics, Molecular Biology, Computation, Data, Programming Languages, Coding, Visualization, Information Science, Scientific Research, Data Analysis
Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www-wiley-com.bibliotheek.ehb.be/en-us
Publication Type: Journal Articles; Reports - Descriptive
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: 1Department of Biology, Faculty of Science, Naresuan University, Phitsanulok, Thailand; 2Center of Excellence for Innovation and Technology for Detection and Advanced Materials (ITDAM), Naresuan University, Phitsanulok, Thailand; 3Business Management and Languages, Faculty of Management Science, Silpakorn University, Phetchaburi, Thailand