NTNU Information Retrieval Course

Information Retrieval
Spring 2016
9:10 ~12:10 AM, Fridays
Instructor: Prof. Berlin Chen (陳柏琳)

Tentative List of Topics:

02/26

Course Overview & Introduction
Book Chapter: Modern Information Retrieval, Ch. 1
Paper: The History of Information Retrieval Research

03/04 　 Classic Models cf. Modern Information Retrieval, Ch.3

03/11 　 Retrieval Evaluation cf. Modern Information Retrieval, Ch.4
Homework #1 :Evaluation Metrics for IR
Homework #2 : Retrieval Models

03/25 　 Benchmark Collections 　

04/08 　 Extensions of Classical (Set, Algebra & Probabilistic) Models 　

04/15 　 Relevance Feedback and Query Expansion Homework #3 : Query Expansion and Relevance Feedback

04/22 　 Latent Semantic Analysis Homework #4 : LSA for IR

04/29 　 Language Modeling for Information Retrieval 　

05/06 　 Clustering: Metrics and Techniques 　

05/13 　 Clustering: Metrics and Techniques 　

05/20 　 Indexing And Searching Homework #5 : Efficiently Indexing for IR

　　 Paper Presentations
05/27
黃家儀 (CIKM 2015) Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework
蔡淳伊 (SIGIR 2015) Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
吳佳樺 (ICML 2006) The Relationship Between Precision-Recall and ROC Curves
臧之瑄 (CIKM 2015) Short Text Similarity with Word Embeddings
胡全燊 (IJCAI 2013) Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing
06/03
陳映文 (SIGIR 2015) Image-based Recommendations on Styles and Substitutes
蔡謹安 (KDD 2015) Gender and Interest Targeting for Sponsored Post Advertising at Tumblr
林奕儒 (arxiv 2015) Combining temporal and content aware features for microblog retrieval
許宸瑋 (CIKM 2015) Rank by Time or by Relevance? Revisiting Email Search
李怡慧 (WSDM 2016) Towards Modelling Language Innovation Acceptance in Online Social Networks
楊登堯 (ICSE 2012) Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports
歐陽亦凡 (ICAICTA 2015) Combining temporal and content aware features for microblog retrieval
張庭韶 (BigComp 2016) Bagging-Based Active Learning Model for Named Entity Recognition with Distant Supervision
邱琬琇 (SIIE 2015) Neural Networks for Proper Name Retrieval in the Framework of Automatic Speech Recognition.
袁儀齡 (WSDM 2016) Semantic Documents Relatedness using Concept Graph Representation
簡少凡 (CIKM 2015) Organic or Organized? Exploring URL Sharing Behavior
石敬弘 (ICML 2015) Learning Word Representations with Hierarchical Sparse Coding
顏必成 (IEEE HIS) Using NMF-based Text Summarization to Improve Supervised and Unsupervised Classification

　　 User Interface for Search 　

　　 Web Search Basics 　

　　 Brief Overview of Automatic Summarization 　

　　 Brief Overview of Text Readability Assessment 　

Textbooks:

•
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), ACM Press, 2011

•
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008

• W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison Wesley, 2009

References:

• C. C. Aggarwal, ,C.X. Zhai (eds.), Mining Text Data, Springer, 2012.

• W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms, Prentice-Hall, 1992.

• C.X. Zhai, Statistical Language Models for Information Retrieval (Synthesis Lectures Series on Human Language Technologies), Morgan & Claypool Publishers, 2008)

• W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms, Prentice-Hall, 1992.

•
T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007

• D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics, Springer, 2004.

• I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.

• C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.

• D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.

• W.B. Croft and J. Lafferty (eds.), Language Models for Information Retrieval, Kluwer International Series on Information Retrieval, Volume 13, Kluwer Academic Publishers, 2002.

• Stephen Robertson and Hugo Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3 no. 4, 333-389 (2009).

• D. Carmel and E. Yom-Tov , "Estimating the Query Difficulty for Information Retrieval," Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers, 2010.

• Juan-Manuel Torres-Moreno , "Automatic Text Summarization," Wiley-ISTE, 2014.

Papers:

• M. Sanderson and W. B. Croft, "The history of information retrieval research," Proceedings of the IEEE, Vol. 100, pp. 1444 - 1451, May 2012.

• O. Kolomiyets, M.-F. Moens, "A survey on question answering technology from an information retrieval perspective," Information Sciences 181 (2011) 5412–5434

• Johan Schalkwyk et al., "Google Search by Voice: A case study," 2010.

• D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation," Journal of Machine Learning Research, 3:993-1022, January 2003.

• V. Lavrenko and W.B. Croft, "Relevance-Based Language Models" ACM SIGIR 2001.

• C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, "Latent semantic indexing: A probabilistic analysis,'' analyzes an information retrieval technique related to principle components analysis.

• Liu, X. and Croft, W.B., "Statistical Language Modeling For Information Retrieval," the Annual Review of Information Science and Technology, vol. 39, 2005

• Lan Huang. A Survey On Web Information Retrieval Technologies. 2000.

• Karen Spa¨rck Jones, "Some Points in a Time," Computational Linguistics, Vol. 31, No. 1, 2005.

• D. Hiemstra, "Information Retrieval Model," In: A. Goker, J. Davies, and M. Graham (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009

• M. Steyvers, T. Griffiths, "Probabilistic Topic Models," In T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.). Handbook of Latent Semantic Analysis, Mahwah NJ: Lawrence Erlbaum, 2007.

• X. Yi, J. Allan, "A Comparative Study of Utilizing Topic Models for Information Retrieval," in the Proceedings of ECIR'09.

• Nallapati, Discriminative Models for Information Retrieval, in the Proceedings of SIGIR 2004

• T. Joachims and F. Radlinski, Search Engines that Learn from Implicit Feedback, IEEE Trans. on Computer 40(8), pp. 34-40, 2007

• B. Chen, H.M. Wang, L.S. Lee, “A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, pp. 128-145, June 2004.

　

Information Retrieval Resources

• SIGIR-Information Retrieval Resources

　

　

02/26	Course Overview & Introduction	Book Chapter: Modern Information Retrieval, Ch. 1 Paper: The History of Information Retrieval Research
03/04	Classic Models	cf. Modern Information Retrieval, Ch.3
03/11	Retrieval Evaluation	cf. Modern Information Retrieval, Ch.4 Homework #1 :Evaluation Metrics for IR Homework #2 : Retrieval Models
03/25	Benchmark Collections
04/08	Extensions of Classical (Set, Algebra & Probabilistic) Models
04/15	Relevance Feedback and Query Expansion	Homework #3 : Query Expansion and Relevance Feedback
04/22	Latent Semantic Analysis	Homework #4 : LSA for IR
04/29	Language Modeling for Information Retrieval
05/06	Clustering: Metrics and Techniques
05/13	Clustering: Metrics and Techniques
05/20	Indexing And Searching	Homework #5 : Efficiently Indexing for IR
	Paper Presentations 05/27 黃家儀 (CIKM 2015) Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework 蔡淳伊 (SIGIR 2015) Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks 吳佳樺 (ICML 2006) The Relationship Between Precision-Recall and ROC Curves 臧之瑄 (CIKM 2015) Short Text Similarity with Word Embeddings 胡全燊 (IJCAI 2013) Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing 06/03 陳映文 (SIGIR 2015) Image-based Recommendations on Styles and Substitutes 蔡謹安 (KDD 2015) Gender and Interest Targeting for Sponsored Post Advertising at Tumblr 林奕儒 (arxiv 2015) Combining temporal and content aware features for microblog retrieval 許宸瑋 (CIKM 2015) Rank by Time or by Relevance? Revisiting Email Search 李怡慧 (WSDM 2016) Towards Modelling Language Innovation Acceptance in Online Social Networks 楊登堯 (ICSE 2012) Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports 歐陽亦凡 (ICAICTA 2015) Combining temporal and content aware features for microblog retrieval 張庭韶 (BigComp 2016) Bagging-Based Active Learning Model for Named Entity Recognition with Distant Supervision 邱琬琇 (SIIE 2015) Neural Networks for Proper Name Retrieval in the Framework of Automatic Speech Recognition. 袁儀齡 (WSDM 2016) Semantic Documents Relatedness using Concept Graph Representation 簡少凡 (CIKM 2015) Organic or Organized? Exploring URL Sharing Behavior 石敬弘 (ICML 2015) Learning Word Representations with Hierarchical Sparse Coding 顏必成 (IEEE HIS) Using NMF-based Text Summarization to Improve Supervised and Unsupervised Classification
	User Interface for Search
	Web Search Basics
	Brief Overview of Automatic Summarization
	Brief Overview of Text Readability Assessment

•		R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), ACM Press, 2011
•		Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008
•		W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison Wesley, 2009

•		C. C. Aggarwal, ,C.X. Zhai (eds.), Mining Text Data, Springer, 2012.
•		W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms, Prentice-Hall, 1992.
•		C.X. Zhai, Statistical Language Models for Information Retrieval (Synthesis Lectures Series on Human Language Technologies), Morgan & Claypool Publishers, 2008)
•		W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms, Prentice-Hall, 1992.
•		T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
•		D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics, Springer, 2004.
•		I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.
•		C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
•		D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
•		W.B. Croft and J. Lafferty (eds.), Language Models for Information Retrieval, Kluwer International Series on Information Retrieval, Volume 13, Kluwer Academic Publishers, 2002.
•		Stephen Robertson and Hugo Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3 no. 4, 333-389 (2009).
•		D. Carmel and E. Yom-Tov , "Estimating the Query Difficulty for Information Retrieval," Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers, 2010.
•		Juan-Manuel Torres-Moreno , "Automatic Text Summarization," Wiley-ISTE, 2014.

•		M. Sanderson and W. B. Croft, "The history of information retrieval research," Proceedings of the IEEE, Vol. 100, pp. 1444 - 1451, May 2012.
•		O. Kolomiyets, M.-F. Moens, "A survey on question answering technology from an information retrieval perspective," Information Sciences 181 (2011) 5412–5434
•		Johan Schalkwyk et al., "Google Search by Voice: A case study," 2010.
•		D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation," Journal of Machine Learning Research, 3:993-1022, January 2003.
•		V. Lavrenko and W.B. Croft, "Relevance-Based Language Models" ACM SIGIR 2001.
•		C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, "Latent semantic indexing: A probabilistic analysis,'' analyzes an information retrieval technique related to principle components analysis.
•		Liu, X. and Croft, W.B., "Statistical Language Modeling For Information Retrieval," the Annual Review of Information Science and Technology, vol. 39, 2005
•		Lan Huang. A Survey On Web Information Retrieval Technologies. 2000.
•		Karen Spa¨rck Jones, "Some Points in a Time," Computational Linguistics, Vol. 31, No. 1, 2005.
•		D. Hiemstra, "Information Retrieval Model," In: A. Goker, J. Davies, and M. Graham (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009
•		M. Steyvers, T. Griffiths, "Probabilistic Topic Models," In T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.). Handbook of Latent Semantic Analysis, Mahwah NJ: Lawrence Erlbaum, 2007.
•		X. Yi, J. Allan, "A Comparative Study of Utilizing Topic Models for Information Retrieval," in the Proceedings of ECIR'09.
•		Nallapati, Discriminative Models for Information Retrieval, in the Proceedings of SIGIR 2004
•		T. Joachims and F. Radlinski, Search Engines that Learn from Implicit Feedback, IEEE Trans. on Computer 40(8), pp. 34-40, 2007
•		B. Chen, H.M. Wang, L.S. Lee, “A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, pp. 128-145, June 2004.