1887

Abstract

Background and Objectives Language production and speech articulation can be delayed in children due to developmental disabilities and neuromotor disorders such as childhood apraxia of speech (CAS). One of the behaviors that are commonly associated with the CAS is the articulation errors where the child mispronounced some of the produced phonemes. The presented Pronunciation Verification (PV) method automatically evaluates the child speech and detects any insertion, deletion or substitution errors made by the child on the phoneme level. Method The proposed PV method based on a search lattice with different competing paths to allow the system to detect insertions, deletions and substitutions of phonemes. Fig. 1 shows a block diagram of the lattice based PV component. The prompted word is first phonetically transcribed to obtain the expected phoneme sequence. The lattice generator then uses the phoneme sequence to generate a search lattice fed to the speech recognizer. The generated lattice is flexible enough to cover all the possible pronunciation errors (insertion, deletion and substitution) by adding alternative paths to the correct path for each of the expected errors. The deletion path can be represented as a null arc to allow the recognizer to skip the phoneme node during decoding while the garbage node is used as an alternative to collect phoneme other than the expected one (substitution error). A garbage loop is also added between two consecutive phonemes to collect inserted phonemes frames. Fig. 2 (a) shows an example of the lattice for the word "chair" where PG and PD are the penalties attached to the garbage and deletion arcs respectively, these penalties are added to avoid the recognizer skipping phonemes or aligning speech to the garbage node unless the fit is better than the correct path. The garbage node is composed of all the phonemes connected in parallel as shown in Fig. 2 (b). The Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech signal with delta and acceleration to produce a 39 dimension feature vector per frame. The extracted features are then fed to the speech recognizer along with the created lattice and the Hidden Markov Model (HMM) acoustic models to generate a sequence of phonemes from the child's utterance. The Context Dependent (CD) HMM model consists of multi-mixture tied-state tri-phones while the garbage model consists of single mixture mono-phones to reduce the complexity and speed up the recognition process. The output phoneme sequence is then compared to the expected phoneme sequence, if matched the utterance is marked as correct otherwise incorrect. Results The system overall accuracy is 88.2% where the Correct Acceptance (CA) is 91.5% and the Correct Rejection (CR) is 83.4%. Conclusion A PV method that uses a search lattice with different alternative paths and a garbage model was used to detect the articulation errors made by the child with overall accuracy around 88%.

Loading

Article metrics loading...

/content/papers/10.5339/qfarf.2013.BIOP-030
2013-11-20
2019-08-17
Loading full text...

Full text loading...

http://instance.metastore.ingenta.com/content/papers/10.5339/qfarf.2013.BIOP-030
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error