– I started the two-week countdown for my academic activities in applied linguistics, particularly translation studies. A few days ago, I uploaded my last two first-author papers to arXiv. Both deal with information-theoretic approaches to translationese. One introduces the surprisal-index corpus
EPIC-EuroParl-UdS.

The other paper reports empirical results showing that information-theoretic indicators of source difficulty and cross-lingual transfer difficulty can explain part of the variation in translationese. The explanatory power of the model reaches R2 = 0.21.

Three other results stand out:

  1. Accuracy–fluency trade-off.
    The hypothesis—operationalised as a negative correlation between MT surprisal and target GPT-2 surprisal—holds up to about 11 bits of MT surprisal per word in a segment. Beyond that point, the correlation turns positive.

  2. Transfer vs. source difficulty.
    Transfer difficulty is generally more predictive of translationese than source difficulty. The exception is German → English, where understanding the source (especially in simultaneous interpreting) appears to be about as important as transfer difficulty.

  3. Spoken vs. written asymmetry.
    There is a striking difference between modes: in spoken translation, the more difficult the task, the less translationese appears in the output.