Text Mining 2025-2026
Teaching assistants: Yumeng Wang, Jujia Zhao, Yinzhi Xie, Jiayi Tian, Ruzanna Baghdasaryan
Contact address: tmcourse@liacs.leidenuniv.nl
Course schedule
The course weeks consist of: a lecture, literature to read, and either a practical exercise (tutorial style) or a hand-in assignment.
The lectures are on Wednesday (times and locations differ throughout the semester, please check your timetable).
The literature will be distributed on Brightspace. The majority of the chapters comes from this book, abbreviated as J&M in the course schedule below.
- J&M: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed), 2025
Week | Lecture | Literature | Exercise / assignment |
---|---|---|---|
1 (3 Sept) | Introduction | ||
2 (10 Sept) | Text processing | J&M chapter 2. Regular Expressions, Text Normalization, Edit Distance | Exercise: Chapter 1 of "Advanced NLP with Spacy" |
3 (17 Sept) | Vector Semantics | J&M chapter 6. Vector Semantics and embeddings | Exercise: Chapter 2 of "Advanced NLP with Spacy", sections 8-15 |
4 (24 Sept) | Text categorization | J&M chapter 4. Naive Bayes, Text Classification, and Sentiment | Exercise: Sections 6.2.3. Text feature extraction and Classification of text documents of the scikit-learn user guide |
5 (1 Oct) | Data collection and annotation | Lim et al. (2020) Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing | Assignment 1. Text classification (deadline 7 October) |
6 (8 Oct) | Transformer models & transfer learning | J&M chapter 9. Transformers J&M chapter 11 Masked Language Models | Exercise: Chapters 2 and 3 of the Huggingface NLP course |
(15 Oct) | No lecture | ||
7 (22 Oct) | Generative large language models | J&M chapter 10. Large Language Models | Exercise: Generation with LLMs HuggingFace tutorial |
8 (19 Oct) | Information Extraction | J&M chapter 17. Sequence Labeling for Parts of Speech and Named Entities | Exercise: Token classification tutorial in the Huggingface NLP course |
9 (5 Nov) | Topic Modelling & Text summarization | J&M chapter 12. Model Alignment, Prompting, and In-Context Learning | Exercise: Summarization tutorial in the Huggingface NLP course |
10 (12 Nov) | Sentiment analysis | Scaria et al. (2024) InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis. | Assignment 2. Information Extraction (deadline 18 November) |
11 (19 Nov) | Text Mining in practice (guest lecture) | Paper reading for the final assignment | |
12 (26 Nov) | Conclusions and exam preparation | ||
13 (3 Dec) | Online lab session | Final assignment (deadline 5 January) | |
(17 Dec) | Written exam | ||
(28 Jan) | Re-sit |
The assessment of the course consists of a written exam (50% of course grade) and practical assignments (50% of course grade). The practical assignments comprise two smaller assignments (10% each) and one large final assignment (30%).
The grade for the written exam should be 5.5 or higher in order to complete the course. The weighted average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0. Each assignment has a re-sit opportunity (a later submission). The maximum grade for a re-sit assignment is 6.
Group work is an integral part of the course. You will be expected to complete the assignments together with a team mate.
Earlier editions of this course
Link to the course page for this course in 2024-2025
Link to the course page for this course in 2023-2024
Link to the course page for this course in 2022-2023
Link to the course page for this course in 2021-2022
Link to the course page for this course in 2020-2021
Link to the course page for this course in 2019-2020
Link to the course page for this course in 2018-2019