Speech Recognition

Speech Recognition

Fall 2005
Tuesdays, 9:10 ~12:00 AM
Instructor: Berlin Chen (陳柏琳助理教授)

Tentative Topic List and Schedule:

9/13
　 Course Overview & Introduction
　

9/20
　 Hidden Markov Models (1/2)
　 HW-1:Problems 1&2 for HMM
Due: 10/11
HW-1B:Deriving Backward Algorithm (opt.)

9/27
　 Hidden Markov Models (2/2)
　 HW-2A:Problem 3 for HMM
Due: 10/18

10/4
　 Spoken Language Structure
　 HW-2B:Counting Word Error Rates (opt.)
Reference Files, Test Files

10/11
　 Introduction to Isolated Word Recognition
　 HW-3A: Drawing the Spectrograms (opt.)
　

10/18
　 Acoustic Modeling
　 HW-3B: Isolated Word Recognition
　

10/25
　 HTK Toolkit
　 HW-4: the Use of HTK Toolkit
　

11/1
　 Statistical Language Modeling
　

11/8
　 SRILM Toolkit
　 HW-5: the Use of SRILM Toolkit
　

11/15
　 Midterm
　

11/22
　 Digit Recognition, Word Recognition and Keyword Spotting
　

11/29
　 Large Vocabulary Continuous Speech Recognition (1/2)
　

12/6
　 Large Vocabulary Continuous Speech Recognition (2/2)

　 HW-6: Syllable Decoding with Bigram LM
(Syllable bigrams should be trained by SRILM)

12/13
　 Digital Signal Processing Related to Speech Recognition (1/2)
　

12/20

　 Digital Signal Processing Related to Speech Recognition (2/2)

　

12/27
　 Speech Signal Representations

　

1/3
　 Speech Enhancement and Robustness
　

1/10
　 FINAL
　

Discriminative Training Approaches for Continuous Speech Recognition

Textbook:
     1.   X. Huang, A. Acero, H. Hon, “Spoken Language Processing,” Prentice Hall, 2001 (全華代理)
     2.   W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
     3. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.

References:

Books:
     1. T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
     2. J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
     3. F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
     4. S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
     5. L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993

Papers:
     1. Lawrence Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
     2. L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”
         Proceedings of the IEEE, vol. 77, No. 2, February 1989
     3. A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm,"
        J. Royal Star. Soc., Series B, vol. 39, pp. 1-38, 1977
     4. Jeff A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter
         Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
      5. J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the
          IEEE, September 1993, pp. 1215-1247
     6. R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from
         Here?,” Proceedings of IEEE, August, 2000
     7. H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 200
     8. H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.
     9. Lawrence Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
     10. J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004.

     (a1) Proceedings of IEEE, vol. 88, no. 8, August, 2000 (Special Issue on Spoken Language Processing)
     (a2) IEEE Signal Processing Magazine, vol. 22, no. 5, September, 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication)

　

9/13	Course Overview & Introduction
9/20	Hidden Markov Models (1/2)	HW-1:Problems 1&2 for HMM Due: 10/11 HW-1B:Deriving Backward Algorithm (opt.)
9/27	Hidden Markov Models (2/2)	HW-2A:Problem 3 for HMM Due: 10/18
10/4	Spoken Language Structure	HW-2B:Counting Word Error Rates (opt.) Reference Files, Test Files
10/11	Introduction to Isolated Word Recognition	HW-3A: Drawing the Spectrograms (opt.)
10/18	Acoustic Modeling	HW-3B: Isolated Word Recognition
10/25	HTK Toolkit	HW-4: the Use of HTK Toolkit
11/1	Statistical Language Modeling
11/8	SRILM Toolkit	HW-5: the Use of SRILM Toolkit
11/15	Midterm
11/22	Digit Recognition, Word Recognition and Keyword Spotting
11/29	Large Vocabulary Continuous Speech Recognition (1/2)
12/6	Large Vocabulary Continuous Speech Recognition (2/2)	HW-6: Syllable Decoding with Bigram LM (Syllable bigrams should be trained by SRILM)
12/13	Digital Signal Processing Related to Speech Recognition (1/2)
12/20	Digital Signal Processing Related to Speech Recognition (2/2)
12/27	Speech Signal Representations
1/3	Speech Enhancement and Robustness
1/10	FINAL
	Discriminative Training Approaches for Continuous Speech Recognition