Machine learning methods can be used to train automatic speech recognizers (ASR). When porting ASR to a new language, however, or to a new dialect of spoken Arabic, we often have too few labeled training data to allow learning of a high-precision ASR. It seems reasonable to think that unlabeled data, e.g., untranscribed television broadcasts, should be useful to train the ASR; human infants, for example, are able to learn the distinction between phonologically similar words after just one labeled training utterance. Unlabeled data tell us the marginal distribution of speech sounds, p(x), but do not tell us the association between labels and sounds, p(y|x). We propose that knowing the marginal is sufficient to rank-order all possible phoneme classification functions, before the learner has heard any labeled training examples at all. Knowing the marginal, the learner is able to compute the expected complexity (e.g., derivative of the expected log covering number) of every possible classifier function, and based on measures of complexity, it is possible to compute the expected mean-squared probable difference between training-corpus error and test-corpus error. Upon presentation of the first few labeled training examples, then, the learner simply chooses, from the rank-ordered list of possible phoneme classifiers, the first one that is reasonably compatible with the labeled examples. This talk will present formal proofs, experimental tests using stripped-down toy problems, and experimental results from English-language ASR; future work will test larger-scale implementations for ASR in the spoken dialects of Arabic.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error