Research – Research Network for Text Analysis

Our research has two major goals

Increase fine-tuning efficiency of deep language models

Language models, such as BERT, have led to unprecedented performances in many NLP tasks, such as sentiment classification, named entity recognition, coreference resolution, and question answering. However, they often require large amounts of training data for fine-tuning to achieve these high performances. Our research investigates how we can increase the fine-tuning efficiency of language models and other machine learning techniques.

Facilitate use of state-of-the-art ML and NLP methods in other disciplines

Despite more advanced techniques for automated text analysis are available, many works in various scientific disciplines rely on more “traditional” approaches. We aim to bridge this gap and facilitate the use of state-of-the-art ML and NLP techniques in disciplines were these techniques could be useful, such as the social sciences. Our prior work has shown that strong improvements can be achieved by interdisciplinary research on specific topics, such as media bias identification.

To achieve these two goals, we currently pursue three main projects.

Fine-tuning efficiency

The Textalysis project is a joint project with researchers from the Machine Learning Group at the HU Berlin, the Data & Knowledge Engineering Group at the University of Wuppertal, the Data and Information Mining Group at the University of Konstanz, and with various other colleagues and partners. The project’s goal is to devise methods that enable language models to learn from few (<10) training examples while still achieving sufficiently high prediction performance. We primarily research methods in active learning, transfer learning, multi-task learning, and few-shot learning to accomplish this.

WIN project

The WIN project is a joint project with colleagues in computer science at the University of Wuppertal and in political science at the University of Zurich concerned with devising NLP methods and using them in social science research. From 2019 until 2021, the project’s focus was on the automated identification of media bias (a summary of Felix Hamborg’s Ph.D. research can be found in this paper or video). Since 2022, the project has been investigating how to extend the automated analysis workflow from this specific use case to (ideally) any use case in the social sciences that rests on content analysis.

Textada

Textada is our group’s spin-off intended to make the results of our research available in an intuitive user interface. The Textada team develops a content analysis software that drastically reduces cost and manual effort in annotation projects. Our software uses machine learning techniques to aid during the annotation by suggesting annotations or by fully automatically annotating documents after the user provided only a few manual annotations.

We are always thankful for opportunities to collaborate, e.g., to exchange ideas, knowledge, and resources, or – even better – work on a common topic jointly.