Learning to Recognize Speech from a Small Number of Labeled Examples

Mark Allan Hasegawa-Johnson; Roxana Girju; Rehab Mustafa Mohamma Duwairi; Eiman Mohd Tayyeb H B Mustafawi; Elabbas Benmamoun; Jui-Ting Huang

doi:10.5339/qfarf.2011.CSP15

oa Learning to Recognize Speech from a Small Number of Labeled Examples
Authors: Mark Allan Hasegawa-Johnson^1,2, Roxana Girju^1,2, Rehab Mustafa Mohamma Duwairi^1,2, Eiman Mohd Tayyeb H B Mustafawi^1,2, Elabbas Benmamoun^1,2 and Jui-Ting Huang^1,2
View Affiliations Hide Affiliations

Affiliations: ¹ University of Illinois, Urbana, Illinois, USA ² Qatar University, Doha, Qatar
Publisher: Hamad bin Khalifa University Press (HBKU Press)
Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2011 Issue 1, Nov 2011, Volume 2011, CSP15
DOI: https://doi.org/10.5339/qfarf.2011.CSP15

Abstract

Machine learning methods can be used to train automatic speech recognizers (ASR). When porting ASR to a new language, however, or to a new dialect of spoken Arabic, we often have too few labeled training data to allow learning of a high-precision ASR. It seems reasonable to think that unlabeled data, e.g., untranscribed television broadcasts, should be useful to train the ASR; human infants, for example, are able to learn the distinction between phonologically similar words after just one labeled training utterance. Unlabeled data tell us the marginal distribution of speech sounds, p(x), but do not tell us the association between labels and sounds, p(y|x). We propose that knowing the marginal is sufficient to rank-order all possible phoneme classification functions, before the learner has heard any labeled training examples at all. Knowing the marginal, the learner is able to compute the expected complexity (e.g., derivative of the expected log covering number) of every possible classifier function, and based on measures of complexity, it is possible to compute the expected mean-squared probable difference between training-corpus error and test-corpus error. Upon presentation of the first few labeled training examples, then, the learner simply chooses, from the rank-ordered list of possible phoneme classifiers, the first one that is reasonably compatible with the labeled examples. This talk will present formal proofs, experimental tests using stripped-down toy problems, and experimental results from English-language ASR; future work will test larger-scale implementations for ASR in the spoken dialects of Arabic.

Article metrics loading...

/content/papers/10.5339/qfarf.2011.CSP15

2011-11-20

2024-07-27

Full text loading...

/content/papers/10.5339/qfarf.2011.CSP15

oa Learning to Recognize Speech from a Small Number of Labeled Examples

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Osteoporosis: An under-recognized public health problem

E-learning in Saudi Arabia: Past, present and future

Association of erythrocytes antioxidant enzymes and their cofactors with markers of oxidative stress in patients with sickle cell anemia

Qatar’s economy: Past, present and future