Disfluent Speech Segments Detection and Remediation.

Arbajian, Pierre

Notes FAQ Contact Us

Back to results

Direct link

ERIC Number: ED597724

Record Type: Non-Journal

Publication Date: 2019

Pages: 150

Abstractor: As Provided

ISBN: 978-1-3921-1277-9

ISSN: EISSN-

EISSN: N/A

Available Date: N/A

Disfluent Speech Segments Detection and Remediation

Arbajian, Pierre

ProQuest LLC, Ph.D. Dissertation, The University of North Carolina at Charlotte

Speech remediation by identifying those segments which compromise the quality of speech content can be performed by correctly identifying portions of a recording which can be deleted without diminishing from the overall quality of the speech, but rather improving it. Speech remediation is especially important when it is heavily disfluent as in the case of stuttering speakers' speeches. In our work we focused on two types of disfluency "blocks" and "interjections". The preparation work and the features required for each type of speech anomaly were different as we used distinct approaches according to the speech disfluency we were detecting and the application method. In this dissertation, we describe work which consists of (1) developing several methods to extract stutter speech segments, (2) creating the raw digital signal analysis features, (3) performing the feature engineering, (4) labeling the segments, (5) training the classifier, and finally (6) scoring the speech segments to identify the sounds that must be removed from a recording. The experimentation with statistical aggregation feature types for speech "blocks" yielded strong results, but "interjection" disfluencies scoring required spectral data analysis. The need for different features per disfluency type led us to three approaches, the first two were suitable for "blocks" detection and the third approach was applied to "interjections". Two approaches one with "Pre-qualification" sampling and the second with "Random" sampling used statistical representations of the segments analysis data as features with minimal neural network classifiers and performed well in two types of application: (1) Sampling of segments with a pre-qualification phase and (2) Sampling of training samples without pre-qualification followed by a sliding window classification scoring method. The second speech disfluency i.e. "interjections" required different techniques thus "Approach Spectral-analysis." In "Approach Spectral-analysis" we used spectral analysis metrics from the sound segments as predictors and trained Neural Network classifiers implemented as multiple "(CNN)" models to detect the "interjections". The corpus we used is a well recognized set of stuttered speech recordings "UCLASS." The speeches we used are not labeled in a matter conducive for our research, therefore we performed extensive experimentation to build and label the training data. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]

Descriptors: Stuttering, Language Fluency, Speech Communication, Phonemes, Scoring, Audio Equipment, Statistical Analysis, Classification, Networks, Predictor Variables, Computational Linguistics, Identification, Computer Software, Engineering

ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml

Publication Type: Dissertations/Theses - Doctoral Dissertations

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Author Affiliations: N/A