Maria Kunilovskaya

linguistics, contrastive and computational

bham_vert.jpg

University of Saarland

SFB B7 project

Saarbrücken, Germany

Postdoc Researcher

I am currently a postdoc with University of Saarland (Germany) working on modelling mediated language to explore the memory-surprisal trade-off hypothesis from information theory. My PhD (completed March 2023, supervisor: Prof. Mitkov, UK) was on human translation quality estimation. A lot of my efforts were invested in building learner parallel and comparable corpora.

Before that I held an Associate Professor position at a Translation Studies department, lecturing in Translation Studies, Theoretical Linguistics and Corpus Linguistics. I have a PhD (Candidate of Science) in Contrastive Linguistics (completed 2004, adviser: Prof. Brodovich, Saint Petersburg University).

My research interests have shifted from corpus- and feature-based approaches to machine learning, language modelling and representation learning. In the past few years, I was involved in several computational humanities projects, especially focused on the propaganda and social media analysis.

Keywords:

  • language modelling, information theory
  • Python, machine learning, distributional semantics
  • computational humanities, data collection and analysis
  • translation quality estimation, data annotation
  • languages varieties, register studies, text complexity

Download curriculum vitae, publications (2017-2025)

recent news

Jun 11, 2025 – I am happy to have established a new promising collaboration on spoken data/interpreting analysis, which has now yielded a paper accepted to Disfluency in Spontaneous Speech (DISS-2025, Lisbon, 4-5 September 2025), a satellite event of Interspeech 2025.
May 13, 2025 – This semester, except for my own seminar on translation quality, I also co-teach P4: Abschlusskolloquium for BA Language Science with Annemarie Verkerk. Today I talked about structuring Related, reference managers and note-taking tools. This is a truly rewarding experience I must say. Not only you get to understand things in more depth and realise that you actually have a lot to share, but also it feels like the students were excited about these instruments. I was offered to teach this course alone next semester.
May 9, 2025 – I had a throw-back to conferences that accept abstracts and issue certificates of attendance. But I am proud to have implemented a new approach to modelling for that talk: I used GLM with Negative Binomial family for counts of disfluencies in our data, comparing the explanatory and predictive power of corpus measures of complexity vs surprisal from off-the-shelf and domain-adapted GPT2 and MarianMT models. It feels like an achievement.
Apr 7, 2025 Back to regular teaching! This semester (SoSe-2025), I volunteered to offer a research seminar Quality in Human and Machine Translation (QH&MT) at the Language Science and Technology Department, University of Saarland. The seminar looks into the properties of MT, especially with regard to how it compares to human translation. It is designed to bring together the linguistic expertise on, and the technological aspects/issues of measuring, quality. We will look into (i) the theoretical pre-requisites of translation quality, (ii) compare approaches applied to humans and machines, and (iii) overview the best practices in manual as well as automatic quality annotation. The proposed research topics include linguistic studies based on comparative-contrastive analysis, developing TQ test sets, investigating existing metrics and designing new methods, tweaking MT and MT quality models to capture specific errors or address specified aspects of production. I invite computationally-minded linguists and NLP students who are curious whether today’s technology is a real competition to human translators, and what nuances there are to this comparison. We start next Monday, 14 April 2025, at 16.15 (Gebäude C7 2 - Seminarraum -1.05).
Feb 15, 2025 – I have three (sic!) posters as the 1st author at an SBF1102-organised RAILS conference. Overachiever, ahem. Slavic intercomprehension, translation task difficulty, cognitive load factors in interpreting
Feb 4, 2025 (1) Had a throwback to the best part of my past life, when I gave a 90 min lecture as part of BA Vorlesung Perspektiven der Linguistik. Oh my, I miss that! (handout)
(2) On the same day, 15 min after the lecture, I had to take the spoken part of the exam at German B2 level. That went surprisingly well.
Jan 24, 2025 – I am proud to be named an outstanding reviewer by COLING-2025 organizers: see a picture
Dec 9, 2024 – I am going north-east:
(1) A paper produced in collaboration with C4 is accepted for NoDaLiDa 2025 to be presented in early March in Tallin (Estonia). Title: Predictability of Microsyntactic Units across Slavic Languages: A Translation-based Study.
(2) Next week (December 17, 2024), I am giving a talk at LTG research seminar (The Faculty of Mathematics and Natural Sciences, University of Oslo). It will summarise B7’s progress in applying information theory to the study of translated language.
Jun 24, 2024 – Koel Dutta Chowdhury, my co-author, presented our work on GPT-4 prompting for translationese reduction task at EAMT in Sheffield. See (paper, slides).
Jun 7, 2024 – hosted the Multilingual Modelling Workshop (MM-WS), an all-SFB event that attracted researchers interested in modelling multilingual/cross-lingual data (programme). A brief summary is here.
May 16, 2024 – delivered a teaching session+lab for MA Translation Science and Technology students withing Hauptseminar “Empirical Linguistics and Translatology”. The lecture introduced the students to Corpus-based Translation Studies and had a focus on “Human Translation Quality Estimation (HTQE)” (slides). The lab was a walk-through on parallel corpus building, including practical views on manual and automatic annotation as well as the link between corpus structure and the research objectives.
Apr 12, 2024 – gave an invited talk “Application of Information Theory in Translation(ese) Studies” (slides) for participants of the Information Theory Course.
Feb 23, 2024 – talked about linguistic neighbours of Luxembourgish in the looking-glass world of NMT at the 1st Roundtable on NLP for Luxembourg(ish), organised by Institute of Luxembourgish Language and Literature and the Culture & Computation Lab at the University of Luxembourg (slides).
Feb 6, 2024 – together with Marie Escribe delivered a 2-day training on conference setup and management via START for over 20 people
Dec 15, 2023 – submitted a short paper to NAACL-2024: “Prompting Large Language Models to Mitigate Translationese”
Dec 1, 2023 – delivered an invited talk “Can Translations Be Less Translated? Leveraging GPT Prompts to Mitigate Translationese” within Conversations Series event of Culture & Computation Lab at the University of Luxemburg
Sep 4, 2023 – presented two papers at the RANLP and discussed a piece of research that did not feel like a paper. I was also heavily involved in the RANLP OC.
Jun 30, 2023 – A paper by SFB B7 team “Simultaneous Interpreting as a Noisy Channel: How Much Information Gets Through” is accepted as a long paper to RANLP 2023. The paper is among 22% of top-scoring submissions based on the scores from three double-blind peer reviews.
May 17, 2023 – joined the Journal of Natural Language Engineering as an Editorial Board Member.
May 5, 2023 – released WarMM-2023 and presented the results of Russian media-at-wartime monitoring project at EACL workshop
Mar 13, 2023 – passed Viva Voce examination and in the subsequent month submitted the final version of the thesis. It is available here.
Dec 5, 2022 – started a postdoc position in Saarland University
Jul 1, 2022 – started collecting data for a computational sociology/politology project that aims to compare publications in Russian mass media and social networks to capture the interplay between propaganda and vox populi
May 14, 2022 – from 22 to 27 May I am attending ACL in distant mode due to a sad misunderstanding about the Ireland visa.
– delivered 3-day workshop on practical skills supporting research to EMTTI students in Malaga and attended a very special, entertaining and well-organised International Workshop on Interpreting Technologies.
Apr 16, 2022 – two of my Master students are accepted to New Trends in Translation and Technology (NeTTT). Looking forward to this grand rehearsal of vivas.
  • Kateryna Poltorak: Computational Approaches to Register as a Factor in English-to-Spanish Translation
  • Rene Garcia Taboada: Neutralising Latin American Spanish Dialects for Localisation Purposes: A Collocational Focus
Apr 5, 2022 – summarised my research in human translation quality estimation in my annual 3-hour session for EMTTI and computational linguistics students. See slides.
Mar 25, 2022 – finished teaching a short training course on LaTeX, referencing and Git/GitHub for EMTTI students (see Digital Skills for Research).
Dec 15, 2021 – talked at AIST research conference: see a FB post about it.
Jul 19, 2021 – completed PGCert “Academic Practice in Higher Education”, including modules on inclusivity, future of higher education and educational theories.
Jan 1, 2020 – started working on a Digital Humanities project that attempts to pick up global and national cultural trends based on the analysis of cultural events announcements.

selected publications

  1. Interspeech-2025
    Euh...where do interpreters hesitate? An information-theoretic perspective on sentence-initial filler particles in simultaneous interpreting
    Pollkläsener, Christina, and Kunilovskaya, Maria
    In Disfluency in Spontaneous Speech (DISS 2025) Sep 2025
  2. NoDaLiDa-2025
    Predictability of Microsyntactic Units across Slavic Languages: A translation-based Study
    Kunilovskaya, Maria, Zaitova, Iuliia, Xue, Wei, Stenger, Irina, and Avgustinova, Tania
    In The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies Mar 2025
  3. UM Press
    Confuse and Normalise: Authoritarian Propaganda in a High-Choice Media Environment during Russia’s Invasion of Ukraine
    Alyukov, Maxim,  Kunilovskaya, Maria, and Semenov, Andrei
    In Russian Propaganda Today: Challenges, Effectiveness and Resistance 2025
  4. EAMT-2024
    Mitigating Translationese with GPT-4: Strategies and Performance
    Kunilovskaya, Maria, Chowdhury, Koel Dutta, Przybyl, Heike, España i Bonet, Cristina, and Van Genabith, Josef
    In Proceedings of the 25th Annual conference of the European Association for Machine Translation Jun 2024
  5. TSAR-2023
    Cross-lingual Mediation: Readability Effects
    Kunilovskaya, Maria, Mitkov, Ruslan, and Wandl-Vogt, Eveline
    In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2023) Sep 2023
  6. EACL
    Wartime Media Monitor (WarMM-2022): A Study of Information Manipulation on Russian Social Media during the Russia-Ukraine War
    Alyukov, Maxim,  Kunilovskaya, Maria, and Semenov, Andrei
    2023
  7. Target
    Source language difficulties in learner translation: Evidence from an error-annotated corpus
    Kunilovskaya, Maria, Ilyushchenya, Tatyana, Morgoun, Natalia, and Mitkov, Ruslan
    Target 2023
  8. LREC
    Lexicogrammatic Translationese across Two Targets and Competence Levels
    Kunilovskaya, Maria, and Lapshinova-Koltunski, Ekaterina
    In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) 2020