Maria Kunilovskaya

University of Saarland

Saarbrücken, Germany

Postdoc Researcher

I am currently a postdoc with University of Saarland (Germany) working on modelling mediated language to explore the memory-surprisal trade-off hypothesis from information theory. My PhD (completed March 2023, supervisor: Prof. Mitkov, UK) was on human translation quality estimation. A lot of my efforts were invested in building learner parallel and comparable corpora.

Before that I held an Associate Professor position at a Translation Studies department, lecturing in Translation Studies, Theoretical Linguistics and Corpus Linguistics. I have a PhD (Candidate of Science) in Contrastive Linguistics (completed 2004, adviser: Prof. Brodovich, Saint Petersburg University).

My research interests have shifted from corpus- and feature-based approaches to machine learning, language modelling and representation learning. In the past few years, I was involved in several computational humanities projects, especially focused on the propaganda and social media analysis.

Keywords:

language modelling, information theory
Python, machine learning, distributional semantics
computational humanities, data collection and analysis
translation quality estimation, data annotation
languages varieties, register studies, text complexity

Download curriculum vitae, publications (2017-2025)

recent news

Jun 11, 2025	– I am happy to have established a new promising collaboration on spoken data/interpreting analysis, which has now yielded a paper accepted to Disfluency in Spontaneous Speech (DISS-2025, Lisbon, 4-5 September 2025), a satellite event of Interspeech 2025.
May 13, 2025	– This semester, except for my own seminar on translation quality, I also co-teach P4: Abschlusskolloquium for BA Language Science with Annemarie Verkerk. Today I talked about structuring Related, reference managers and note-taking tools. This is a truly rewarding experience I must say. Not only you get to understand things in more depth and realise that you actually have a lot to share, but also it feels like the students were excited about these instruments. I was offered to teach this course alone next semester.
May 9, 2025	– I had a throw-back to conferences that accept abstracts and issue certificates of attendance. But I am proud to have implemented a new approach to modelling for that talk: I used GLM with Negative Binomial family for counts of disfluencies in our data, comparing the explanatory and predictive power of corpus measures of complexity vs surprisal from off-the-shelf and domain-adapted GPT2 and MarianMT models. It feels like an achievement.
Apr 7, 2025	Back to regular teaching! This semester (SoSe-2025), I volunteered to offer a research seminar `Quality in Human and Machine Translation` (QH&MT) at the Language Science and Technology Department, University of Saarland. The seminar looks into the properties of MT, especially with regard to how it compares to human translation. It is designed to bring together the linguistic expertise on, and the technological aspects/issues of measuring, quality. We will look into (i) the theoretical pre-requisites of translation quality, (ii) compare approaches applied to humans and machines, and (iii) overview the best practices in manual as well as automatic quality annotation. The proposed research topics include linguistic studies based on comparative-contrastive analysis, developing TQ test sets, investigating existing metrics and designing new methods, tweaking MT and MT quality models to capture specific errors or address specified aspects of production. I invite computationally-minded linguists and NLP students who are curious whether today’s technology is a real competition to human translators, and what nuances there are to this comparison. We start next Monday, 14 April 2025, at 16.15 (Gebäude C7 2 - Seminarraum -1.05).
Feb 15, 2025	– I have three (sic!) posters as the 1st author at an SBF1102-organised RAILS conference. Overachiever, ahem. Slavic intercomprehension, translation task difficulty, cognitive load factors in interpreting
Feb 4, 2025	(1) Had a throwback to the best part of my past life, when I gave a 90 min lecture as part of BA Vorlesung Perspektiven der Linguistik. Oh my, I miss that! (handout) (2) On the same day, 15 min after the lecture, I had to take the spoken part of the exam at German B2 level. That went surprisingly well.
Jan 24, 2025	– I am proud to be named an outstanding reviewer by COLING-2025 organizers: see a picture
Dec 9, 2024	– I am going north-east: (1) A paper produced in collaboration with C4 is accepted for NoDaLiDa 2025 to be presented in early March in Tallin (Estonia). Title: Predictability of Microsyntactic Units across Slavic Languages: A Translation-based Study. (2) Next week (December 17, 2024), I am giving a talk at LTG research seminar (The Faculty of Mathematics and Natural Sciences, University of Oslo). It will summarise B7’s progress in applying information theory to the study of translated language.
Jun 24, 2024	– Koel Dutta Chowdhury, my co-author, presented our work on GPT-4 prompting for translationese reduction task at EAMT in Sheffield. See (paper, slides).
Jun 7, 2024	– hosted the Multilingual Modelling Workshop (MM-WS), an all-SFB event that attracted researchers interested in modelling multilingual/cross-lingual data (programme). A brief summary is here.
May 16, 2024	– delivered a teaching session+lab for MA Translation Science and Technology students withing Hauptseminar “Empirical Linguistics and Translatology”. The lecture introduced the students to Corpus-based Translation Studies and had a focus on “Human Translation Quality Estimation (HTQE)” (slides). The lab was a walk-through on parallel corpus building, including practical views on manual and automatic annotation as well as the link between corpus structure and the research objectives.
Apr 12, 2024	– gave an invited talk “Application of Information Theory in Translation(ese) Studies” (slides) for participants of the Information Theory Course.
Feb 23, 2024	– talked about linguistic neighbours of Luxembourgish in the looking-glass world of NMT at the 1st Roundtable on NLP for Luxembourg(ish), organised by Institute of Luxembourgish Language and Literature and the Culture & Computation Lab at the University of Luxembourg (slides).
Feb 6, 2024	– together with Marie Escribe delivered a 2-day training on conference setup and management via START for over 20 people
Dec 15, 2023	– submitted a short paper to NAACL-2024: “Prompting Large Language Models to Mitigate Translationese”
Dec 1, 2023	– delivered an invited talk “Can Translations Be Less Translated? Leveraging GPT Prompts to Mitigate Translationese” within Conversations Series event of Culture & Computation Lab at the University of Luxemburg
Sep 4, 2023	– presented two papers at the RANLP and discussed a piece of research that did not feel like a paper. I was also heavily involved in the RANLP OC.
Jun 30, 2023	– A paper by SFB B7 team “Simultaneous Interpreting as a Noisy Channel: How Much Information Gets Through” is accepted as a long paper to RANLP 2023. The paper is among 22% of top-scoring submissions based on the scores from three double-blind peer reviews.
May 17, 2023	– joined the Journal of Natural Language Engineering as an Editorial Board Member.
May 5, 2023	– released WarMM-2023 and presented the results of Russian media-at-wartime monitoring project at EACL workshop
Mar 13, 2023	– passed Viva Voce examination and in the subsequent month submitted the final version of the thesis. It is available here.
Dec 5, 2022	– started a postdoc position in Saarland University
Jul 1, 2022	– started collecting data for a computational sociology/politology project that aims to compare publications in Russian mass media and social networks to capture the interplay between propaganda and vox populi
May 14, 2022	– from 22 to 27 May I am attending ACL in distant mode due to a sad misunderstanding about the Ireland visa. – delivered 3-day workshop on practical skills supporting research to EMTTI students in Malaga and attended a very special, entertaining and well-organised International Workshop on Interpreting Technologies.
Apr 16, 2022	– two of my Master students are accepted to New Trends in Translation and Technology (NeTTT). Looking forward to this grand rehearsal of vivas. Kateryna Poltorak: Computational Approaches to Register as a Factor in English-to-Spanish Translation Rene Garcia Taboada: Neutralising Latin American Spanish Dialects for Localisation Purposes: A Collocational Focus
Apr 5, 2022	– summarised my research in human translation quality estimation in my annual 3-hour session for EMTTI and computational linguistics students. See slides.
Mar 25, 2022	– finished teaching a short training course on LaTeX, referencing and Git/GitHub for EMTTI students (see Digital Skills for Research).
Dec 15, 2021	– talked at AIST research conference: see a FB post about it.
Jul 19, 2021	– completed PGCert “Academic Practice in Higher Education”, including modules on inclusivity, future of higher education and educational theories.
Jan 1, 2020	– started working on a Digital Humanities project that attempts to pick up global and national cultural trends based on the analysis of cultural events announcements.

selected publications

Interspeech-2025

Euh...where do interpreters hesitate? An information-theoretic perspective on sentence-initial filler particles in simultaneous interpreting

Pollkläsener, Christina, and Kunilovskaya, Maria

In Disfluency in Spontaneous Speech (DISS 2025) Sep 2025

Bib

@inproceedings{pollklaesener2025disfluency,
  abbr = {Interspeech-2025},
  author = {Pollkläsener, Christina and Kunilovskaya, Maria},
  title = {Euh...where do interpreters hesitate? An information-theoretic perspective on
  	sentence-initial filler particles in simultaneous interpreting},
  booktitle = {Disfluency in Spontaneous Speech (DISS 2025)},
  year = {2025},
  location = {Lisbon, Portugal},
  month = sep,
  pages = {in print},
  selected = {true},
  bibtex_show = {true}
}

NoDaLiDa-2025

Predictability of Microsyntactic Units across Slavic Languages: A translation-based Study

Kunilovskaya, Maria, Zaitova, Iuliia, Xue, Wei, Stenger, Irina, and Avgustinova, Tania

In The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies Mar 2025

Bib HTML

@inproceedings{kunilovskaya2025predictability,
  abbr = {NoDaLiDa-2025},
  author = {Kunilovskaya, Maria and Zaitova, Iuliia and Xue, Wei and Stenger, Irina and Avgustinova, Tania},
  title = {{Predictability of Microsyntactic Units across Slavic Languages: A translation-based Study}},
  booktitle = {The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies},
  year = {2025},
  month = mar,
  pages = {313--322},
  selected = {true},
  bibtex_show = {true},
  html = {https://aclanthology.org/2025.nodalida-1.34.pdf}
}

UM Press

Confuse and Normalise: Authoritarian Propaganda in a High-Choice Media Environment during Russia’s Invasion of Ukraine

Alyukov, Maxim, Kunilovskaya, Maria, and Semenov, Andrei

In Russian Propaganda Today: Challenges, Effectiveness and Resistance 2025

Bib

@inproceedings{Alyukov2025,
  abbr = {UM Press},
  author = {Alyukov, Maxim and Kunilovskaya, Maria and Semenov, Andrei},
  title = {{Confuse and Normalise: Authoritarian Propaganda in a High-Choice Media Environment during Russia's Invasion of Ukraine}},
  booktitle = {Russian Propaganda Today: Challenges, Effectiveness and Resistance},
  editor = {Goode, Paul},
  pages = {in print},
  publisher = {University of Michigan press, University of Manchester Press},
  year = {2025},
  month = {},
  selected = {true},
  bibtex_show = {true}
}

EAMT-2024

Mitigating Translationese with GPT-4: Strategies and Performance

Kunilovskaya, Maria, Chowdhury, Koel Dutta, Przybyl, Heike, España i Bonet, Cristina, and Van Genabith, Josef

In Proceedings of the 25th Annual conference of the European Association for Machine Translation Jun 2024

Bib HTML

@inproceedings{Kunilovskaya2024prompting,
  abbr = {EAMT-2024},
  author = {Kunilovskaya, Maria and Chowdhury, Koel Dutta and Przybyl, Heike and {Espa{\~{n}}a i Bonet}, Cristina and {Van Genabith}, Josef},
  title = {{Mitigating Translationese with GPT-4: Strategies and Performance}},
  booktitle = {Proceedings of the 25th Annual conference of the European Association for Machine Translation},
  month = jun,
  pages = {411--430},
  publisher = {Association for Computational Linguistics},
  address = {Sheffield, UK},
  year = {2024},
  selected = {true},
  bibtex_show = {true},
  html = {https://aclanthology.org/2024.eamt-1.35.pdf}
}

TSAR-2023

Cross-lingual Mediation: Readability Effects

Kunilovskaya, Maria, Mitkov, Ruslan, and Wandl-Vogt, Eveline

In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2023) Sep 2023

Bib HTML

@inproceedings{Kunilovskaya2023readability,
  abbr = {TSAR-2023},
  author = {Kunilovskaya, Maria and Mitkov, Ruslan and Wandl-Vogt, Eveline},
  booktitle = {Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2023)},
  editor = {},
  month = sep,
  pages = {33--43},
  location = {Varna, Bulgaria},
  publisher = {INCOMA Ltd.},
  title = {{Cross-lingual Mediation: Readability Effects}},
  year = {2023},
  selected = {true},
  bibtex_show = {true},
  html = {https://aclanthology.org/2023.tsar-1.4.pdf}
}

EACL

Wartime Media Monitor (WarMM-2022): A Study of Information Manipulation on Russian Social Media during the Russia-Ukraine War

Alyukov, Maxim, Kunilovskaya, Maria, and Semenov, Andrei

2023

Bib HTML

@article{Alyukov2023warmm,
  abbr = {EACL},
  author = {Alyukov, Maxim and Kunilovskaya, Maria and Semenov, Andrei},
  title = {{Wartime Media Monitor (WarMM-2022): A Study of Information Manipulation on Russian Social Media during the Russia-Ukraine War}},
  booktitle = {Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature},
  pages = {152--161},
  publisher = {Association for Computational Linguistics},
  year = {2023},
  month = {},
  selected = {true},
  bibtex_show = {true},
  html = {https://aclanthology.org/2023.latechclfl-1.17}
}

Target

Source language difficulties in learner translation: Evidence from an error-annotated corpus

Kunilovskaya, Maria, Ilyushchenya, Tatyana, Morgoun, Natalia, and Mitkov, Ruslan

Target 2023

Bib HTML

@article{Kunilovskaya2023err,
  abbr = {Target},
  author = {Kunilovskaya, Maria and Ilyushchenya, Tatyana and Morgoun, Natalia and Mitkov, Ruslan},
  journal = {Target},
  title = {{Source language difficulties in learner translation: Evidence from an error-annotated corpus}},
  year = {2023},
  month = {},
  selected = {true},
  bibtex_show = {true},
  html = {https://doi.org/10.1075/target.20189.kun}
}

LREC

Lexicogrammatic Translationese across Two Targets and Competence Levels

Kunilovskaya, Maria, and Lapshinova-Koltunski, Ekaterina

In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) 2020

Bib

@inproceedings{Kunilovskaya2020vars,
  abbr = {LREC},
  author = {Kunilovskaya, Maria and Lapshinova-Koltunski, Ekaterina},
  booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
  editor = {Calzolari, Nicoletta and Bechet, Frederic and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Others, And},
  keywords = {contrastive analysis,machine learning,multivariate analysis,parallel corpora,translation competence,translation norms,translation varieties,translationese},
  pages = {4102--4112},
  publisher = {The European Language Resources Association (ELRA)},
  title = {{Lexicogrammatic Translationese across Two Targets and Competence Levels}},
  year = {2020},
  month = {},
  bibtex_show = {true},
  selected = {true}
}