NTNU Speech Processing Course

Speech Processing

Spring 2019
9:10 ~12:10 am, Tuesdays
Instructor: Dr. Berlin Chen (陳柏琳)

Reference Books:

§ 　 L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011 　

§ 　 X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001 　

§ 　 Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007 　

§ 　 Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013 　

§ 　 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993 　

§ 　 M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008 　

§ 　 L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007 　

§ 　 W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003 　

§ 　 S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk" 　

§ 　 T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002 　

§ 　 F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999 　

§ 　 Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015 　

§ 　 J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000 　

§ 　 C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999 　

§ 　 J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008 　

§ 　 T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007 　

§ 　 Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004 　

§ 　 D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002 　

§ 　 G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008 　

§ 　王士元、彭剛，語言、語音與技術，香港城市大學出版社，2007 　

Reference Papers:

§ 　 L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 　

§ 　 Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494. 　

§ 　 Kenneth Church, “A Pendulum Swung too Far,” LiLT 2011. 　

　　 S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013. 　

§ 　 G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, Vol. 29, No. 6, 2012 　

§ 　 L. Deng , “Deep Learning: from Speech Recognition to Language and Multimodal Processing,” APSIPA Transactions on Signal and Information Processing, Vol. 5, 2016 　

§ 　 Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015 　

§ 　 L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 　

§ 　 A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004. 　

§ 　 A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977 　

§ 　 J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021 　

§ 　 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 　

§ 　 R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000 　

§ 　 H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000 　

§ 　 Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114. 　

§ 　 H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998. 　

§ 　 H. Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011. 　

§ 　 K. Saberi and D. R. Perrott, "Cognitive restoration of reversed speech," Nature, Vol. 398, 1999. 　

§ 　 R. Shannon et al., "Speech recognition with primarily temporal cues," Science, Vol. 270, 1995. 　

§ 　 J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004. 　

§ 　 B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009. 　

§ 　 L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006. 　

§ 　 P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003. 　

○ 　 Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing) 　

§ 　 X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014. 　

§ 　 L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013. 　

○ 　 IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication) 　

○ 　 IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology) 　

○ 　 IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition) 　

○ 　 Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications) 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 J. Schalkwyk et al., "Google Search by Voice: A case study," 2010. 　

§ 　 M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008. 　

§ 　 S. Watanabe et al., “Hybrid CTC/Attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, Vol. 11 , No. 8 , 2017 　

§ H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013. 　

§ 　 J. H.L. Hansen, T, Hasan, "Speaker Recognition by Machines and Humans: A tutorial review," IEEE Signal Processing Magazine, Vol. 32, No. 6, 2015 　

Reference Presentations/Web Pages:

§ 　 J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008 　

§ 　 S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST 　

§ 　 J. Kincaid, A Brief History of ASR: Automatic Speech Recognition 　

§ 　 S. Ruder, A Review of the Neural History of Natural Language Processing 　

§ 　 S. Ruder, The 4 Biggest Open Problems in NLP 　

§ 　莊炳湟院士, “人工智能的科技內涵,” a talk given at Academia Sinica, 2018 　

§		L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011
§		X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001
§		Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007
§		Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013
§		L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993
§		M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008
§		L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007
§		W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
§		S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
§		T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
§		F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
§		Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015
§		J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
§		C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
§		J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008
§		T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
§		Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004
§		D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002
§		G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008
§		王士元、彭剛，語言、語音與技術，香港城市大學出版社，2007

§		L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
§		Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.
§		Kenneth Church, “A Pendulum Swung too Far,” LiLT 2011.
		S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013.
§		G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, Vol. 29, No. 6, 2012
§		L. Deng , “Deep Learning: from Speech Recognition to Language and Multimodal Processing,” APSIPA Transactions on Signal and Information Processing, Vol. 5, 2016
§		Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015
§		L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
§		A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004.
§		A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977
§		J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
§		J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247
§		R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000
§		H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000
§		Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114.
§		H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.
§		H. Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011.
§		K. Saberi and D. R. Perrott, "Cognitive restoration of reversed speech," Nature, Vol. 398, 1999.
§		R. Shannon et al., "Speech recognition with primarily temporal cues," Science, Vol. 270, 1995.
§		J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004.
§		B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009.
§		L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006.
§		P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009.
§		J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009.
§		A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003.
○		Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing)
§		X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014.
§		L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013.
○		IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication)
○		IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology)
○		IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition)
○		Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications)
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009.
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009.
§		J. Schalkwyk et al., "Google Search by Voice: A case study," 2010.
§		M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008.
§		S. Watanabe et al., “Hybrid CTC/Attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, Vol. 11 , No. 8 , 2017
§		H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013.
§		J. H.L. Hansen, T, Hasan, "Speaker Recognition by Machines and Humans: A tutorial review," IEEE Signal Processing Magazine, Vol. 32, No. 6, 2015

§		J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008
§		S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST
§		J. Kincaid, A Brief History of ASR: Automatic Speech Recognition
§		S. Ruder, A Review of the Neural History of Natural Language Processing
§		S. Ruder, The 4 Biggest Open Problems in NLP
§		莊炳湟院士, “人工智能的科技內涵,” a talk given at Academia Sinica, 2018