Maria Kunilovskaya | Chao_translation

– I started the two-week countdown for my academic activities in applied linguistics, particularly translation studies. A few days ago, I uploaded my last two first-author papers to arXiv. Both deal with information-theoretic approaches to translationese. One introduces the surprisal-index corpus
EPIC-EuroParl-UdS.

The other paper reports empirical results showing that information-theoretic indicators of source difficulty and cross-lingual transfer difficulty can explain part of the variation in translationese. The explanatory power of the model reaches R2 = 0.21.

Three other results stand out:

Accuracy–fluency trade-off.
The hypothesis—operationalised as a negative correlation between MT surprisal and target GPT-2 surprisal—holds up to about 11 bits of MT surprisal per word in a segment. Beyond that point, the correlation turns positive.
Transfer vs. source difficulty.
Transfer difficulty is generally more predictive of translationese than source difficulty. The exception is German → English, where understanding the source (especially in simultaneous interpreting) appears to be about as important as transfer difficulty.
Spoken vs. written asymmetry.
There is a striking difference between modes: in spoken translation, the more difficult the task, the less translationese appears in the output.