Text Mining 2019-2020

Teacher: Suzan Verberne
Teaching assistants: Hugo de Vos, Mohamed Barbouch, Jeroen Rook





Course schedule

The course weeks consist of: a lecture, literature to read, and either a practical exercise (tutorial style) or a hand-in assignment. The lectures are on Wednesday, 9.15-11.00 in Snellius 407-409.

We use two textbooks in this course:


WeekLectureLiteratureExercise / assignment
1 (4 Sept)IntroductionZ&M chapter 1. Introduction
2 (11 Sept)Text processingJ&M chapter 2. Regular Expressions, Text Normalization, Edit DistanceExercise: Pre-processing tutorial
3 (18 Sept)Vector SemanticsJ&M chapter 6. Vector SemanticsExercise: Word embeddings tutorial
4 (25 Sept)Text categorizationJ&M chapter 4.1-4.3. Naive Bayes Classification
Z&M chapter 15. Text categorization
Exercise: Text classification tutorial (sklearn)
5 (2 Oct)Data collection and annotationFinin (2010). Annotating Named Entities in Twitter Data with Crowdsourcing
McHugh (2012). Interrater reliability: the kappa statistic
Assignment 1. Text classification
6 (9 Oct)Neural NLP and transfer learning (slides)J&M chapter 7. Neural Nets and Neural Language ModelsExercise: BERT Fine-Tuning with PyTorch
(16 Oct)No lecture
7 (23 Oct)Information ExtractionJ&M chapter 17. Information ExtractionExercise: Sequence labelling tutorial (crfsuite)
8 (30 Oct)Text summarizationZ&M chapter 16. Summarization
Kryściński et al (2019). Neural Text Summarization: A Critical Evaluation
Assignment 2. Information Extraction
9 (6 Nov)Sentiment analysisZ&M chapter 18. Opinion Mining and Sentiment Analysis
10 (13 Nov)Biomedical text miningFleuren & Alkema (2015). Application of text mining in the biomedical domainExercise: Sentiment analysis tutorial
11 (20 Nov)Authorship attributionLiterature for final assignment (3 topics to choose from)Assignment 3. Sentiment Analysis
12 (27 Nov)Industrial Text Mining:
guest lecture by TextKernel
Dahlmeier (2017). On the Challenges of Translating NLP Research into Commercial Products
13 (4 Dec)ConclusionsFinal assignment


The assessment of the course consists of a written exam (50% of course grade) and practical assignments (50% of course grade). The practical assignments comprise three smaller assignments (10% each) and one more substantial, final assignment (20%). The grade for the written exam should be 5.5 or higher in order to complete the course. The average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0.


Earlier editions of this course

Link to the course page for this course in 2018-2019