NLP Homework Page

Homework #1 : Preprocessing for Part-of-Speech Tagging

A list of words with different tags is given. Please build a lexicon, in which each line should include the following information:

Word ID,  Word Names, and the associated tags information. 

Homework #2 : Regular Expressions

1. Find a regular expression for the set {anbm : (m+n) is even}
2. Find a regular expression for L ={abnw : n≥3, w belongs to [a,b]+}
3. Find the complement of 1.
4. Find a regular expression for L ={vwv : w and v belong to [a,b]*, |v|=2}

Project #1 : Chinese Input System

Write a Chinese Input System.  Resources including:
   1. A lexicon with 72K words.
   2. A text corpus.
   3. A Toneless Syllable List.

Things you have to do are:
   1. A program for word tokenization (segmentation)
   2. A program for word unigram and bigram language model training
   3. A search algorithm for the syllable-string to word-string conversation