Huge amounts of valuable data are stored outside of structured databases as human language: text and speech. This course covers modern techniques to extract useful information from this language data.
Week | Date | Topic | Reading | Materials | Assignments |
---|---|---|---|---|---|
1 | Sep 24 | introduction; regular expressions; finite-state automata | SLP3 2.1 | slides 1, slides 2 | HW0 out |
2 | Oct 1 | finite-state transducers; text normalization (e.g., tokenization, stemming) | SLP2 2.2–2.3, IR 2.1–2.2 | slides 1, slides 2 | HW0 due (Tue), HW1 out |
3 | Oct 8 | n-gram models; frequency analysis; cooccurrence analysis; edit distance; spelling correction; noisy channel models | SLP3 4, SLP3 6 | slides 1, slides 2 | |
4 | Oct 15 | document classification; naive Bayes; logistic regression; sentiment analysis | SLP3 7 | slides 1, slides 2 | HW1 due, HW2 out |
5 | Oct 22 | indexing and retrieval; Lucene | IR 1, 6 | slides 1, slides 2, slides 3 | |
6 | Oct 29 | similarity and clustering; latent semantic analysis; latent Dirichlet allocation; distributed word representations | SLP3 19, Blei (2012) | slides 1, slides 2 | |
7 | Nov 5 | class cancelled | HW2 due, HW3 out | ||
8 | Nov 12 | part-of-speech tagging; hidden Markov models; Viterbi algorithm; maximum entropy models | SLP3 8, SLP3 9 | slides 1, slides 2 | |
9 | Nov 19 | named entity recognition; relation extraction; advanced maximum entropy models; coreference; formal grammars | SLP3 20 | slides 1, slides 2, slides 3, slides 4 | HW3 due, HW4 out |
10 | Nov 26 | Thanksgiving holiday: no class! | |||
11 | Dec 3 | syntactic parsing; wrap-up; speech recognition for automatic transcription | slides 1, slides 2, slides 3, slides 4 | ||
12 | Dec 10 | [finals week] | HW4 due 4pm |