2004F-Speech Recognition

Speech Recognition

Fall 2004
Fridays, 9:10 ~12:00

Instructor: Berlin Chen (陳柏琳助理教授)

Topic List and Schedule:

Date
Topic 　

9/24
　 Course Overview & Introduction
　

10/1
　 Spoken Language Structure & Hidden Markov Models (I)
　 HW-01: (Due: 10/22)
Forward-Backward Procedure, Viterbi Algorithm; or the derivation of the Backward Procedure

10/8 Break (ICSLP2004, Jeju island)

10/15
　 Spoken Language Structure & Hidden Markov Models (II)
　

10/22
　 Acoustic Modeling & HTK HMM Toolkit (I)
　 HW-02:(Due: 11/5)
(a) Baum-Welch Training for HMM;
(b) ASR Measures - Calculating Word Error Rate (WER) (Reference and ASR output)

10/29
　 Acoustic Modeling & HTK HMM Toolkit (II)
　

11/5
　 Statistical Language Modeling (I)
Isolated Word Recognition HW-03:(Due: 11/12)
(a) Derivation of Backward Recursion
(b) AM Training Using HTK

11/12
　 Statistical Language Modeling (II)
　 HW-04:(Due: 11/26)
Free Syllable Decoding

11/19
　 Midterm
　

11/26
　 School Games
　

12/3
　 Search Algorithm and Keyword Spotting
HW-05:(Due: 12/10)
Syllable Decoding Using Bigram LM (Test Set)

12/10
　 Large Vocabulary Continuous Speech Recognition
　 HW-06:
LM Training Using SRI LM Toolkit (Due: 12/24)

12/17
　 SRI LM Toolkit
Digital Signal Processing
　

12/24
　 Digital Signal Processing
　

12/31
　 Speech Signal Representations
　 HW-07:(Due: 1/28)
Linear Prediction Analysis of Speech Signals

1/7

　 Speech Signal Representations
Linear Prediction Analysis
　 HW-08:(Due: 1/28)
Experiments on Acoustic Feature Extraction

1/14
　 Speech Enhancement and Robustness
　

1/21 Final

1/28 Paper Survey

Maximum Likelihood and Discriminative Training (EM, MCE, MMI etc.)

Textbook:
     1.   X. Huang, A. Acero, H. Hon, “Spoken Language Processing,” Prentice Hall, 2001 (全華代理)
     2.   W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
     3. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.

References:

Books:
     1. T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
     2. J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
     3. F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
     4. S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
     5. L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993

Papers:
     1. L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech
         Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
     2. A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm,"
        J. Royal Star. Soc., Series B, vol. 39, pp. 1-38, 1977
     3. Jeff A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter
         Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
      4. J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the
          IEEE, September 1993, pp. 1215-1247
     5. R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from
         Here?,” Proceedings of IEEE, August, 2000
     6. H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 200
     7. H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.
     8. Lawrence Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.

　

Date	Topic
9/24	Course Overview & Introduction
10/1	Spoken Language Structure & Hidden Markov Models (I)	HW-01: (Due: 10/22) Forward-Backward Procedure, Viterbi Algorithm; or the derivation of the Backward Procedure
10/8	Break (ICSLP2004, Jeju island)
10/15	Spoken Language Structure & Hidden Markov Models (II)
10/22	Acoustic Modeling & HTK HMM Toolkit (I)	HW-02:(Due: 11/5) (a) Baum-Welch Training for HMM; (b) ASR Measures - Calculating Word Error Rate (WER) (Reference and ASR output)
10/29	Acoustic Modeling & HTK HMM Toolkit (II)
11/5	Statistical Language Modeling (I) Isolated Word Recognition	HW-03:(Due: 11/12) (a) Derivation of Backward Recursion (b) AM Training Using HTK
11/12	Statistical Language Modeling (II)	HW-04:(Due: 11/26) Free Syllable Decoding
11/19	Midterm
11/26	School Games
12/3	Search Algorithm and Keyword Spotting	HW-05:(Due: 12/10) Syllable Decoding Using Bigram LM (Test Set)
12/10	Large Vocabulary Continuous Speech Recognition	HW-06: LM Training Using SRI LM Toolkit (Due: 12/24)
12/17	SRI LM Toolkit Digital Signal Processing
12/24	Digital Signal Processing
12/31	Speech Signal Representations	HW-07:(Due: 1/28) Linear Prediction Analysis of Speech Signals
1/7	Speech Signal Representations Linear Prediction Analysis	HW-08:(Due: 1/28) Experiments on Acoustic Feature Extraction
1/14	Speech Enhancement and Robustness
1/21	Final
1/28	Paper Survey
	Maximum Likelihood and Discriminative Training (EM, MCE, MMI etc.)