Continuous integration, can it work for software localisation?
At Translate.org.za we want to keep delivering the best FOSS localisation tools. To do that we've started using Continuous Integration (CI) in the development of Pootle, Virtaal and the Translate Toolkit. We're using a tool called Hudson to manage our CI process.
Since the tools that we develop are all focused on localisation we thought, "Wouldn't it be great if we could use CI to continuously check our translations?". I hope that you will start to use some of our scripts, or your own, to ensure that localisation is part of your CI build process.
The problem
Since we build localisation tools we pride ourselves on doing localisation well. But even we've made a few mistakes along the way, mistakes like:
- Shipping broken translation files. There is nothing quite as frustrating as sending out an application that breaks because of a typo in the translation of a variable. The cost of fixing the issue and releasing a bug fix build is just too much for a small development team. We want to focus on cool new features, we'd rather not fix a bug that we could have caught with CI.
- All text not present in the translation files. We work on string freezes and try hard not to change things while in freeze. So nothing hurts as much as discovering that a feature you added many months ago is not actually present in the new translation files. You now realise that you are about to release a feature that will only be in English. So now we must break string freeze and get the new files to translators with a lot of communication overhead. For translators it means updating their just completed translations to the new set of translations, they might not have the time. Many of these are simple steps but they require lots of overhead and because so many people are involved there is a real potential for other errors to occur. So we want to make sure that when we enter string freeze everything that we want to be translated is ready for translation. We'd rather not break string freeze simply because we forgot to add a file to POTFILES.in.
- Broken XML file building. As we're using intltool we build some files (mimetype XML and .desktop) from our translations. We don't need to run this step very often, so infrequently in fact that we might only run it as we prepare to release. We'd like to catch any errors in the building of these files when they the error occurs, not just before the release.
We want to apply CI to our localisations because we're not machines, we simply want to be able to forget about localisation issues while we work towards a release. We want to know that our code is always ready for localisation and that our localisations are always 100% technically correct. We don't want any surprises and we want to fix errors that occur when they occur.
We've manage to achieve this.

As you can see above we have a Hudson job called validate-translations that runs a number of localisation related build steps.
The solution to catching technical localisation errors
We run intltool as part of the build process to catch files that aren't being extracted for localisation and for mimetype and .desktop file building from the translation. That part was easy, the harder part of making sure that translations that are committed are correct, for that we built a more elaborate script around the Gettext tools.
Hudson can monitor errors reported in the JUnit XML format. Our solution was to build a simple bash script that exercises Gettext's msgfmt command and outputs the results in a JUnit XML file. The script is simple. For each PO file that it finds it runs msgfmt -cv. Any errors are captured so that we can more easily fix them when we review the results.
Feel free to use the JUnit XML script for PO files within your own Hudson jobs.
Since starting this CI process we've seen good results.

As you can see above we solved 20 msgfmt errors over just three builds. More importantly we can now can safely modify our code and know that our CI will catch any localisation issues.
So what is next for CI and localisation?
At the moment we simply catch msgfmt errors, we will be looking to add the following:
- PO file snippet - it would make it easier to find and fix the errors that we find if we have the snippet of PO that caused the error. Currently we only have the line number and have to first find that line in the PO file before we can even check what is causing the error. With the snippet we can make the full diagnosis while reviewing the Hudson test failure report.
- pofilter checks - the Translate Toolkit has a number of checks (47 in fact) that catch technical localisation errors. We'd like to XML test result files that show those errors. The Translate Toolkit is very useful for human review but we'll need to create a method to mark false positives that we wish to ignore in the future test runs.
- pocount - We want to count the translation status of a group of PO files. You might wonder why we'd want to do that. The reason is that many projects ship with translations that meet some level of completeness. For Virtaal, our Computer Aided Translation tool, we set that threshold at 75% complete for shipping translations. With pocount we should be able to automate this so that we return a test failure if a translation falls below this threshold. If you are able to compare the files that meet the threshold with the files listed in a LINGUAS file (a file that lists all shipped translations) then it's possible to raise an error when a new file needs to be added to the LINGUAS file to ensure that it's shipped. Similarly it would be possible to raise an error if an existing translation falls below the threshold, in which case it needs to be removed from the list of shipped localisations. Now there will be no risk of shipping incomplete translations or of forgetting to ship a new translation.
I'll try to post new blog entries when we add some of these new features or scripts to our own build process.
| Attachment | Size |
|---|---|
| junitmsgfmt.sh | 1.15 KB |
- dwayne's blog
- Login or register to post comments

Welcome to the club :-)
Hi Dwayne, good to see more people going down the route to use CI for automated checks.
Growing the club
Hi Pike, nice to see you here. Yeah I hope we can grow the club, especially around the l10n issues. I've got an experimental patch for pofilter that I hope to apply to our Mozilla translations to see how we can work on software checks and CI.
(Broken link comment)
Oh dear. Something's broken online.
http://www.translate.org.za/blogs/dwayne/sites/translate.org.za.blogs.dw... is the wrong link to junitmsgfmt.sh, or at least I can't see the file.
Seems interesting, so I clicked and got a miss. I came here via Planet Mozilla, at complete random.
Woops you're right - script inline
#!/bin/bash
files=$*
# local variables
message=""
body=""
failures=0
successes=0
function failure {\n"; echo "$message \n \n")
pofile=$1
body=$(echo $body; echo "
message=""
failures=$(($failures + 1))
}
function success { \n")
pofile=$1
body=$(echo $body; echo "
message=""
successes=$(($successes + 1))
}
function run_msgfmt {
pofile=$1
exit_status=$(msgfmt -cv -o /dev/null $pofile 2>/dev/null > /dev/null; echo $?)
message=$(msgfmt -cv -o /dev/null $pofile 2>/dev/stdout | while read i; do echo "$i\n" ; done)
return $exit_status
}
function print_header {"
echo "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
echo "
}
function print_body {
cat -
echo -e $body
}
function print_footer {
cat -
echo ""
}
for pofile in $files
do
run_msgfmt $pofile && success $pofile || failure $pofile
done
print_header |
print_body |
print_footer
Generating Translated output
We've used CI (hudson) to build translated docbook documentation, allowing translators to translate the POs online then view the result without leaving their browser. It's quite a powerful tool...
CI for translated output
Nice idea. CI is quite a powerful way to automate the translation workflow it seems.