Text Mining 2018-2019

Teachers: Suzan Verberne and Alex Brandsen (TA)



Course schedule

The course weeks consist of: a lecture, literature to read, and either a practical exercise (tutorial style) or a hand-in assignment.

We use two textbooks in this course:


WeekLectureLiteratureExercise / assignment
1IntroductionZ&M chapter 1. Introduction
2Text preprocessingJ&M chapter 2. Regular Expressions, Text Normalization, Edit DistanceExercise: pre-processing noisy OCR'ed data
3Data collection, annotation and evaluation (slides)Finin (2010). Annotating Named Entities in Twitter Data with CrowdsourcingAssignment 1. Pre-processing
4Text categorizationJ&M chapter 4.1-4.3. Naive Bayes Classification
Z&M chapter 15. Text categorization
Exercise: Text classification tutorial (sklearn)
5Information Retrieval Z&M chapter 5 Overview of text data accessAssignment 2. Text classification
6Information ExtractionJ&M chapter 17. Information ExtractionExercise: Sequence labelling tutorial (crfsuite)
7SummarizationZ&M chapter 16. SummarizationAssignment 3. Sequence labelling
8Vector semanticsJ&M chapter 6. Vector SemanticsExercise: Word embeddings tutorial
9Sentiment analysisZ&M chapter 18. Opinion Mining and Sentiment AnalysisExercise: Sentiment analysis tutorial
10Biomedical text miningFleuren & Alkema (2015). Application of text mining in the biomedical domainAssignment 4. Sentiment Analysis
11Authorship attributionLiterature for final assignment (3 topics to choose from)
12Industrial Text MiningDahlmeier (2017). On the Challenges of Translating NLP Research into Commercial ProductsFinal assignment


The assessment of the course consists of a written exam (60% of course grade) and practical assignments (40% of course grade). The practical assignments comprise four small tasks (5% each) and one more substantial report (20%). The grade for the written exam should be 5.5 or higher in order to complete the course. The average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0.