A journey into translations, their complexity, and the consequences …



How this started

I had noticed a little thing to improve in the router console’s GUI, when I modified how dates are displayed in the logs. As I got the sources I looked into, at first thinking “it’s just changing a string …”. However, that string was encapsulated in a function call like intl._t(), which rang a bell: Translation of course.

The whole translation handling had remained a major mystery asides “they are using gettext somehow” so far, so I figured I better just inform the team instead of attempting to change it myself. Accordingly, I created a ticket. What got me once again interested in translations is that, this could not be done because of translations.

Before I get into those, if any of you is looking for what date format is/can be used for the router console, zzz supplied the according link [1]

How I understood it works …

So generally to-be-translated strings are marked/tagged somehow and tools allow to extract all the strings and compose a translation file for translators. I had seen in i2p.www that tagging is done with % trans -% ... %- trans % (in brackets, though), whereas I just “learned” that in the i2p.i2p code that is done by calling that intl._t().

The extraction of the strings to a translation file is done by gettext and the resulting files are .pot files. Actual translations then create .po files based on that, in which each string is accompanied by another string to hold the translation. So in the end there is the one .pot and for each language a .po file.

In the i2p projects there are several .pot files, though, so it might be more precise to speak of one .pot per scope. For example the i2p.www project has a “priority.pot” where (I assume) the most important strings are contained. And then various others aside of that. In a similar way there are several different files for i2p.i2p as well. It’s also not one string, but often several strings, that make up one translation string. But any way …

What changes do then

When changing one of the original strings, the extraction now will yield a different string than it did before. If unmatched/marked somehow, than that will result in a non-translated string, as it factually is a different string now, right? Every little change breaking all existing translations of course seems … bad. It means you either refrain from little improvements because they break all translations, or break all translations all the time. Both sound not ideal …

I was considering/asking about manually hacking those changes into the existing translations, which would indeed have been a lot to do. I think I counted 39 different translations in one of the folders. So there must be an automatic way, or there is no way. For the ticket it was no way, and I got interested in possible automatic ways.

Fuzzy matching to the Rescue?

It turns out, the pro-tools of gettext already have this addressed. They automatically detect minor changes by fuzzy matching and then keep the old translation for the new string. Now that will never be as good as an updated translation, but it would allow to do such changes to start with. Also the po-format has an option tag translations as fuzzy. This is described as also a tool for translators to mark strings “for review” basically. And they are automatically marked so when fuzzy matched as mentioned.

So I was wondering, how this is handled in the i2p projects. And so I returned to the issue and attempted to change the string displayed in the router console there.

A little change

So I added like 15 characters to the beginning of the string:

lbt@go:~/synced-git.idk.i2p/i2p.i2p$ git diff apps/routerconsole/jsp/configlogging.jsp
diff --git a/apps/routerconsole/jsp/configlogging.jsp b/apps/routerconsole/jsp/configlogging.jsp
index dcb1bc8c7..6d0740ad6 100644
--- a/apps/routerconsole/jsp/configlogging.jsp
+++ b/apps/routerconsole/jsp/configlogging.jsp
@@ -32,7 +32,7 @@
         </tr><tr><td align="right"><b><%=intl._t("Log date format")%>:</b></td>
           <td><input type="text" name="logdateformat" size="20" value="<jsp:getProperty name="logginghelper" property="datePattern" />" >
             </td>
-          <td><%=intl._t("('MM' = month, 'dd' = day, 'HH' = hour, 'mm' = minute, 'ss' = second, 'SSS' = millisecond)")%></td>
+          <td><%=intl._t("('yyyy' = year, 'MM' = month, 'dd' = day, 'HH' = hour, 'mm' = minute, 'ss' = second, 'SSS' = millisecond)")%></td>
         </tr><tr><td align="right"><b><%=intl._t("Max log file size")%>:</b></td>
           <td><input type="text" name="logfilesize" size="10" value="<jsp:getProperty name="logginghelper" property="maxFileSize" />" ></td>
           <td></td>

Seems not like a big change. But let’s see about the impact …

Compiling Translations

The ant target to process translations (extract the strings, update the files) seems to be poupdate. As I don’t know understand the whole building, yet, I run that and tried to see how the files changed.

lbt@go:~/synced-git.idk.i2p/i2p.i2p$ ant poupdate
[...]
     [exec] 121 translated messages.
     [exec] ........................... done.
     [exec] 106 translated messages, 15 untranslated messages.
     [exec] Generating i2p.susi.webmail.messages_vi ResourceBundle...
[...]
BUILD SUCCESSFUL
Total time: 11 seconds

There must be several thousand of strings handled in the process. Amongst those the one, that I changed above. So let us look at what changed with regard to that …

lbt@go:~/synced-git.idk.i2p/i2p.i2p$ view ./apps/routerconsole/locale/messages_de.po
#: ../jsp/WEB-INF/classes/net/i2p/router/web/jsp/configlogging_jsp.java:545
#, fuzzy
msgid ""
"('yyyy' = year, 'MM' = month, 'dd' = day, 'HH' = hour, 'mm' = minute, 'ss' = "
"second, 'SSS' = millisecond)"
msgstr ""
"('MM' = Monat, 'dd' = Tag, 'HH' = Stunde, 'mm' = Minute, 'ss' = Sekunde, "
"'SSS' = Millisekunde)"

It was fuzzy matched. The changed string is translated with the old translation and marked as fuzzy, so translators can identify it needs some work. But how is the translation broken then? Well, so far I have only been looking into files, not into the product. So …

Lost in Translation

… I built the linux installer next, to install it and see:

lbt@go:~/synced-git.idk.i2p/i2p.i2p$ ant installer-linux
[...]
BUILD SUCCESSFUL
Total time: 10 seconds
lbt@go:~/synced-git.idk.i2p/i2p.i2p$ java -jar i2pinstall_2.1.0-4_linux-only.jar
lbt@go:~/i2p$ ./i2prouter start

And yes, there I had the changed description when starting up in English. But when I changed the language to German, that string stayed in the English version, i.e. the translation was indeed lost.

Lost in … translation? ;) Feels kind of funny when looking at it without knowing all the attached strings (lol). We got a translation file, but the translation isn’t used like that. The line from above

     [exec] Generating i2p.susi.webmail.messages_vi ResourceBundle...

probably gives a hint where this is happening. It seems for use in Java the .po files are packed together in ResourceBundles. So either in that process, or while those are read then, the existing translation is ignored - I assume because it is tagged fuzzy?

I started to look where exactly, but have not found out, yet. zzz gave a hint as to the why, though. It seems the translation plattform used for the web-based translations does not handle these fuzzy tags as desired. Not sure how, but I will probably file this under “external dependency restrictions” and be done with it. Sounds like something I cannot influence …


Clearnet-Links: [1] https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html