Speech Recognition
Summer 2004 (at TSFST, Hsinchu)
Tuesdays, 18:30 ~21:30Instructor: Berlin Chen
Topic List and Schedule:
Date |
Topic | |
7/6 |
Course Overview &
Introduction,
Spoken
Language Structure |
|
7/13 |
Hidden Markov Models
|
HW-1 |
7/20 |
Break |
|
7/27 |
Hidden Markov Models (cont.) |
HW-2 |
8/3 |
Statistical
Language Modeling |
|
8/10 |
Statistical
Language Modeling (cont.) & Isolated Word Recognition: An Example |
HW-3:IsolatedWordRecognition |
8/17 |
Acoustic Modeling
|
HW-4:HTK_Tutorial_Materials |
8/24 |
Break |
|
8/31 |
Hidden Markov Toolkit Explanation of the Isolated Word Recognition Program |
Digital Signal Processing |
9/7 |
Speech Signal
Representations |
|
9/14 |
Search Algorithms
for Word Recognition,
Keyword Spotting Large Vocabulary Continuous Speech Recognition |
|
9/21 |
Speech Enhancement &
Robustness |
Textbook:
1. X. Huang, A. Acero, H. Hon, “Spoken Language Processing,” Prentice Hall, 2001 (全華代理)
2. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.References:
Books:
1. T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
2. J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
3. F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
4. S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
5. L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993Papers:
1. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
2. A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm,"
J. Royal Star. Soc., Series B, vol. 39, pp. 1-38, 1977
3. Jeff A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter
Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
4. J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the
IEEE, September 1993, pp. 1215-1247
5. R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from
Here?,” Proceedings of IEEE, August, 2000
6. Hermann Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000
7. "Progress in Dynamic Programming Search for LVCSR", Proceedings of the IEEE, 88(8), August 2000.
8. H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.