NLP Homework Page
Homework #1 : Preprocessing for Part-of-Speech Tagging
A list of words with different tags is given. Please build a lexicon, in which each line should include the following information:
Word ID, Word Names, and the associated tags information.
Homework #2 : Regular Expressions
1. Find a regular expression for the set {anbm
: (m+n) is even}
2. Find a regular expression for L ={abnw : n≥3, w
belongs to [a,b]+}
3. Find the complement of 1.
4. Find a regular expression for L ={vwv : w and v
belong to [a,b]*, |v|=2}
Project #1 : Chinese Input System
Write a Chinese Input System. Resources
including:
1. A lexicon with
72K words.
2. A text corpus.
3. A Toneless
Syllable List.
Things you have to do are:
1. A program for word tokenization (segmentation)
2. A program for word unigram and bigram language model training
3. A search algorithm for the syllable-string to word-string
conversation