Background and Objectives Childhood Apraxia of Speech (CAS) is a speech disorder characterized by articulation errors, i.e. the replacement of certain phonemes with alternatives. In previous work we proposed a simple method to evaluate the child's speech as correct or incorrect with an overall accuracy of 88.2%. In this work we present an enhanced method that increases the accuracy of the correct/incorrect evaluation to 92.7%, in addition to identifying the incorrect phonemes with an accuracy of 60%. Method The goal of the mispronunciation detection system is to compare each phoneme in the child's production to their given prompt and identify mispronunciations. Figure 1 shows the block diagram of the system, which uses a search lattice for each prompt in the child's speech therapy treatment protocol to identify errors made. Each prompt is transcribed as per the corresponding phoneme sequence using the CMU pronunciation dictionary and then passed to the lattice generator along with the expected mispronunciation rules to generate the search lattice. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech signal with delta and acceleration to produce a 39- dimensional feature vector per frame. The extracted features are then fed to the speech recognizer along with the created lattice and the Hidden Markov Model (HMM) acoustic models to generate a sequence of phones from the child's utterance. An evaluation report is then generated by matching the recognized phoneme sequence with the correct phoneme sequence and specifying the errors made by the child. We use a search lattice with a specific number of alternative pronunciations for each phoneme; this limits the decoder search, making it faster and more accurate. Each phoneme in the correct phoneme sequence is compared with expected mispronunciation rules developed by a therapist after an assessment of 20 children with CAS; if a rule is matched, the pronunciation variants are added as alternative arcs to the current phoneme sequence. The mispronunciation rules depend on the type of the phoneme (consonant/vowel), the phoneme position in the word (Initial/Medial/Final) and the context of the phoneme. The lattice is then created using the matched rules as shown in Figure 2, where the garbage model absorbs any mispronounced phoneme not in the lattice. PA and PG are insertion penalties added to the alternative and the garbage arcs respectively so the decoder does not align the speech to the alternative error phonemes or the garbage node unless it is confident enough. Results The system overall system accuracy is 92.7% where the Correct Acceptance (CA) is 97.6% and the Correct Rejection (CR) is 83.1%. The system also detects phoneme errors made by the child with 60% accuracy. Conclusion In this paper we proposed a mispronunciation detection tool that can detect phonemes mispronounced by children with CAS and specify the errors made.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error