Text Mining and Retrieval Leiden


Text Mining and Retrieval Leiden (TMRL) is the research group of Suzan Verberne at Leiden University. The group is one of the Special Interest Groups (SIGs) of the Data Science research programme.

The research focus of the group is on text mining and retrieval problems in complex domains. Current projects implement and evaluate methods in the legal, the archaeological, the policy-making, and the health domain. In addition, the group includes two more linguistics-oriented projects on machine translation and comparative syntax. We work with a large diversity of textual data: grey literature reports, scientific and legal publications, EU law texts, health records, user-generated content in online patient communities (discussion forums), and news posts on social media.

We highly value the involvement of domain experts in our research. They deliver the data, define the problem, contribute to the evaluation of our methods, and add interpretation to our results. In our opinion, Data Science can only be of societal relevance if target group users are actively involved in research projects.

Social values of our group are: diversity (in background, working style, and personal interests); teamwork (a PhD project can feel as a lonely job, but not if you have peers that support you); and work-life balance (yes, we take time off during weekends and vacations).

We have published a number of Dutch-language data sets, together with pre-trained BERT and ULMFiT language models on textdata.nl.

You can follow us on Twitter and GitHub.