Unexpected uses for the Translate Toolkit's pseudo localisation tools

Friedel introduced pseudo localisation covering simple rewrite rules, the insertion of source tags and an interview with Rail Aliev where he tagging OpenOffice.org.

Recently three events have shown me the powerful new features that we can and are adding to podebug:

  • Orthography changing
  • Porting translations
  • Automatic translation

Orthography changing

In an email exchange with Toni Hermoso Pulido I took a look at his sed scripts for converting Catalan to Valentian. These sed scripts makes changes to a Catalan PO file so that it follows Valentian orthography. I built a quick framework around what Toni had done so that we could take his rules and use podebug to do the rewriting.

Some of the benefits include:

  • Easier to read regex's, a Python regex is just easier to read then a sed regex IMHO.
  • The ability to work on any file that the toolkit can manage. Toni's tool does PO, poedug does; PO, XLIFF, TMX, and many more
  • A general framework for anyone to make similar rules for orthography changes.

Porting translations

Wil Clouser from Mozilla needed to port PO files with PHP placeholders (variables) to Python placeholders. This is brought about by their migration from PHP to Django. He needed to change:

  • A string like this: "I have %2$s apples and %1$s oranges"
  • To this: "I have {1} apples and {0} oranges"

Of course changing this both in the Englihs, the translation and in any comments.

Wil wrote a converter using the toolkit that can achieve what is needed. I hope that we can integrate this into podebug to make it easy for anyone to make such changes in the future. Hopefully also in a way that would allow a change from almost any placeholder system to another.

Automatic translation

While we can do fuzzy matching with pretranslate we aren't able to intelligently take strings that follow a common translation pattern and simply translate them according to a set rule. This was an idea suggested on the translate-devel mailing list. Strings such as:

  • An mp3 file
  • A PDF file

all follow a common pattern. In some languages you could make rules similar to the Catalan->Valentian orthography changing rules reducing the effort required to translate. I'm not 100% sure how often these would occur to make them useful but it could be used to offer a translation memory suggestion to a translator or to provide some simple quality assurance.