Computational linguistics

Klinton Bicknell /// Spring 2016

Computational linguistics allows computers to use language, such as recognizing speech, correcting spelling, and translating. This course introduces students to the field using a modern statistical approach.

Schedule Piazza forum

Schedule

Week Date Topic Reading Materials Assignments
1.1 Mar 29 No 1.1 for Tu/Th classes this quarter. (10.1 instead.)
1.2 Mar 31 What is computational linguistics, unix/linux JM 1, unix/linux tutorial (through tutorial 5) slides, transcript Connect to the SSCC [Instructions]
2.1 Apr 5 Programming in python 1 NLTK 1 python transcript
2.2 Apr 7 Finite-state automata, regular expressions JM 2 nltk setup, nano tutorial, transcript, optional: emacs tutorial hw1 out
3.1 Apr 12 Programming in python 2 NLTK 2–3
3.2 Apr 14 Programming in python 3 NLTK 4 lecture notes, transcript, python code
4.1 Apr 19 Probability theory, maximum likelihood estimation (MLE), unigram models JM 4.1–4.2 hw1 due
4.2 Apr 21 No class. Klinton traveling.
5.1 Apr 26 Graphical models, n-gram models, Markov chains JM 4.3–4.4, Levy appendix hw2 out
5.2 Apr 28 Smoothing, perplexity, training and test sets, basic information theory JM 4.5–4.7, 4.9.1, 4.10
6.1 May 3 Bayesian inference, Hidden Markov models (HMMs), part-of-speech tagging JM 5.1–5.3, 5.5–5.5.2, 5.7 hw2 due
6.2 May 5 Forward algorithm, Viterbi decoding JM 5.5.3, 6.1–6.4
7.1 May 10 Class cancelled project proposals due; hw3 out
7.2 May 12 Programming: best practices Wilson et al. (2014) scripts archive
8.1 May 17 Supervised and unsupervised learning, noisy channel models for spelling correction/autocorrect JM 5.5.4, JM 5.9
8.2 May 19 Context-free grammars (CFGs) for syntax, classes of grammars, regular expressions on trees, basic parsing JM 12.1–12.6, 13.1–13.4.2
9.1 May 24 Probabilistic CFGs (PCFGs), statistical parsing JM 14.1–14.4 hw3 due; hw4 out
9.2 May 26 Automatic speech recognition (ASR), Machine translation (MT) JM 9.1–9.2, 9.5–9.6; 25.1–25.3
10.1 May 31 Computational psycholinguistics Bicknell & Levy (2010) hw4 due
Jun 6 Final project reports due 5pm

Logistics

Course

Time
Tuesdays & Thursdays 12:30–1:50
Location
University Library B182
Textbooks
Website
kbicknell.github.io/ling334spring2016/

Instructor

Name
Klinton Bicknell
Office hours
Tuesdays 2–3 (i.e., immediately following class) and by appointment
Office
Linguistics [2016 Sheridan Road] Office 107

Policies

Email
Questions that are not personal should be posted on the Piazza forum (where they can be posted anonymously if desired). To contact the instructor directly, coming to office hours is encouraged. For questions that are personal, students can email the instructor at kbicknell@northwestern.edu.
Description
Hands-on introduction to computational linguistics, viewed from a modern probabilistic perspective. The class begins with an introduction to programming and probability theory, goes through language modeling, hidden Markov models, and syntactic parsing, and ends with state-of-the-art methods in machine translation and automatic speech recognition. Students will also learn practical skills for extracting information from large linguistic datasets using natural language processing techniques, as well as good programming practices.
Academic integrity
Violations of academic integrity will be referred to the Dean’s office, per WCAS policies. Sanctions can be quite severe, including suspension or permanent expulsion from the university. For details and discussion of how to avoid plagiarism, see the Academic Integrity section of the WCAS undergraduate handbook.

Requirements

Course Grade
  • 70% homeworks (4)
  • 30% final project
Homeworks
There will be four homework assignments throughout the quarter. These assignments will involve a combination of programming exercises and short answer responses. Working together in pairs or small groups when discussing the assignments is encouraged, but each student must code and write up their own homework independently. In addition, students must list on each assignment all students they discussed the assignment with. Homework must be handed in through Canvas.
Final project
Students will complete a final project on a topic related to the course content. This project will either investigate a language research question using computational techniques or will implement a computational linguistic model beyond those covered in class. These projects should be completed in pairs or individually. Students will write short project proposals by the end of week 6, and then a final paper on the project will be due on the first day of finals week.
Keeping up
The syllabus (topics, assignments, due dates) may change. These changes will be announced in class, over email (via Piazza), and on the course website. It is students' responsibility to keep up with them.
Deadlines
All assignments are due at 5pm. For late work turned in between this deadline and 11:59pm the following day (i.e., the first 31 hours after the deadline), I will deduct one percentage point per hour (or partial hour). After the following day, I will give comments and suggestions on work turned in, but you will not receive credit for the assignment. (Of course, if some unusual external circumstance arises which will cause you have trouble meeting a deadline, please contact the instructor as soon as possible.)
AccessibleNU
Any student requesting accommodations related to a disability or other condition is required to register with AccessibleNU (accessiblenu@northwestern.edu; 847-467-5530) and provide professors with an accommodation notification from AccessibleNU, preferably within the first two weeks of class. All information will remain confidential.