Word Segmentation from Transcriptions of Child-Directed Speech Using Lexical and Sub-Lexical Cues.

Zébulon Goriely; Andrew Caines; Paula Buttery

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1485932

Record Type: Journal

Publication Date: 2025-Jan

Pages: 41

Abstractor: As Provided

ISBN: N/A

ISSN: ISSN-0305-0009

EISSN: EISSN-1469-7602

Available Date: 0000-00-00

Word Segmentation from Transcriptions of Child-Directed Speech Using Lexical and Sub-Lexical Cues

Zébulon Goriely¹; Andrew Caines¹; Paula Buttery¹

Journal of Child Language, v52 n1 p1-41 2025

We compare two frameworks for the segmentation of words in child-directed speech, PHOCUS and MULTICUE. PHOCUS is driven by lexical recognition, whereas MULTICUE combines sub-lexical properties to make boundary decisions, representing differing views of speech processing. We replicate these frameworks, perform novel benchmarking and confirm that both achieve competitive results. We develop a new framework for segmentation, the DYnamic Programming MULTIple-cue framework (DYMULTI), which combines the strengths of PHOCUS and MULTICUE by considering both sub-lexical and lexical cues when making boundary decisions. DYMULTI achieves state-of-the-art results and outperforms PHOCUS and MULTICUE on 15 of 26 languages in a cross-lingual experiment. As a model built on psycholinguistic principles, this validates DYMULTI as a robust model for speech segmentation and a contribution to the understanding of language acquisition.

Descriptors: Child Language, Language Acquisition, Vocabulary Development, Word Recognition, Speech Communication, Cues, Psycholinguistics

Cambridge University Press. 100 Brook Hill Drive, West Nyack, NY 10994. Tel: 800-872-7423; Tel: 845-353-7500; Fax: 845-353-4141; e-mail: subscriptions_newyork@cambridge.org; Web site: https://www.cambridge.org/core/what-we-publish/journals

Publication Type: Journal Articles; Reports - Evaluative

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Author Affiliations: ¹Department of Computer Science and Technology, University of Cambridge, Cambridge, UK