Mother Tongue Bloggers

February 09, 2010

Dwayne Bailey

Virtaal supports Haitian Creole through Machine Translation plugin

Update: Originally published on 2006-01-26 but somehow I didn't get this pushed on to the RSS feed so I've published it with a newer date.

Virtaal, a Computer Assisted Translation (CAT) tool, has been providing translators with Machine Translation suggestions through its plugin system. We've just committed a new Machine Translation plugin that allows Virtaal to use Microsoft Translator's new Haitian Creole translation engine.

The Microsoft Translator plugin has been waiting for the next major release as we didn't want to introduce any User Interface changes. But the usefulness of this tool in the current Haitian crisis means that we've released it so that people can use it and benefit from the feature.

How does this help?

  • Any software that needs to be localised into Haitian Creole can now be more easily translated in Virtaal with Haitian Machine Translation support.
  • Documents (OpenDocument Format and wiki texts) can be translated into Haitian Creole using Virtaal and the Translate Toolkit's txt2po and odf2xliff converters.

How do I use the plugin?

You have two options:

  1. Windows: Download our special Virtaal .exe for Windows.
  2. Linux: Copy the Microsoft Translator plugin into your plugin directory (on Linux this is /usr/lib/python2.6/site-packages/virtaal/plugins/tm/models/

How to use the plugin?

  1. You'll need to define Haitian Creole as a language in Virtaal
    1. Bottom Right > New Language Pair...
    2. Bottom Right /> New Language Pair... New Language Pair..." /> New Language Pair..." /> New Language Pair...">

    3. Add Missing Language...
    4. Add New Language

    5. Name 'Haitian Creole' and language code 'ht'
    6. Add Missing Haitian Language

  2. Ensure that the plugin is enabled. Edit > Preferences > Plug-ins > Translation Memory > Configure... Ensure Microsoft Translator is checked.

Open your file and begin translating, I'm using the tutorial Help > Tutorial and as you can see I'm getting Machine Translation suggestions from Microsoft for Haitian Creole.

by dwayne at February 09, 2010 01:35

February 04, 2010

Dwayne Bailey

Translate Toolkit - a powerful localisation toolkit

What did it take to allow Pootle, our web-based localisation platform to support Qt Linguist (.ts), TMX and TBX formats?

Three lines of code!

Pootle uses the Translate Toolkit which Translate.org.za has developed since 2002 to provide a powerful set of tools to manipulate localisation files. The Toolkit has evolved and grown as our needs have changed and as we began to localise other applications like Mozilla Firefox and OpenOffice.org.

The toolkit has the concept of a localisation base class from which all our localisation file implementations derive. We use the base class within Pootle to access XLIFF and PO files. We were pretty sure we'd be able to support other bilingual files (localisation files with both the source and target language in one file) in Pootle but we didn't think it would be this easy.

Of course we'd like to make some further changes so that Pootle simply supports any bilingual file store from the Translate Toolkit and we'll need to test Pootle running with this new feature, expect a beta soon.

Now you might be wondering about monolingual files (localisation files with just one language stored in the file e.g. Java properties files). Well we're busy looking at those and hopefully with the next release of Pootle we'll be able to directly translate monolingual files. Have a look at our list of currently supported localisation files and start dreaming about what will be possible.

For software and content developers the exciting new is this. You want to localise your esoteric format in Pootle? Simply write a storage class for the Translate Toolkit and you will automatically have support within Pootle for your format.

Update: Just before publishing this blog post the Translate Toolkit development list got news of a new Gettext PO diffing tool that uses... the Translate Toolkit of course.

by dwayne at February 04, 2010 06:49

January 28, 2010

Dwayne Bailey

The sky's the limit for new Zulu spell checker

Translate.org.za are the proud parents of a new Zulu spell checker.

What makes us such proud parents? We've ported the spell checker from the Myspell platform to Hunspell. Which means what exactly? It means that we can now spell check Zulu text at much higher precision. It also puts the platform in place to ratchet up the checkers performance.

You can try the checker as an extension for OpenOffice.org, Mozilla Firefox or Mozilla Thunderbird.

Spell checking in Zulu is hard because it is a highly conjunctive language. This means that what in English would be seen as a number of separate words is written as one word. What this means for the spell checker is that we first need to deconstruct the Zulu word. We then check that the remaining root is a correct word and that the rules for building the expression where correct. If we fail any of that then we have a spelling error.

We have most of the most productive classes of nouns and verbs covered. Our work now is to expand the rules as needed and most importantly to add and classify many root forms.

The initial work for the Zulu spell checker was done as part of the ANLoc spell checker project with funding from the International Development Research Centre.

by dwayne at January 28, 2010 09:02

Friedel Wolff

It seems my web browser is unique

I read yesterday about Panopticlick which tries to determine how easily web users can be traced without the use of web cookies. It collects information sent by the browser in the HTTP protocol, and things that can be collected by means of JavaScript, Flash and Java. The website reports that my web browser is unique in the pool of browsers that visited Panopticlick.

The interesting/worrying thing is that most of the fields could identify uniquely me on their own. Nobody else uses the same language combination for HTTP's Accept-Lang (the list of languages that I prefer). Nobody else uses the same version of Firefox in Afrikaans on my Linux distribution.

I won't go into why this is good or bad, but it is interesting to know that if I had reason to not want to be identified, I can't use Firefox in Afrikaans any more, and I can't indicate my preferred language to web sites. I realise that plugins can probably improve the matter, but if this is seen as a danger, it is probably a danger for more people than the number of people who would know about NoScript or TorButton.

Should we perhaps think afresh about how much information is sent by web browsers? At least I now feel a little bit more special...

by Friedel at January 28, 2010 01:35

January 26, 2010

Dwayne Bailey

Everyone has the power to champion their language

Mondli Makhanya, editor of the Sunday Times (South Africa), raises an interesting point about Afrikaners and language in his piece, "Afrikaners set a fine example in championing their language".

While Mondli places much of the blame on government, and I would agree that there is much blame that can be placed on government's shoulders, is it not the speakers of a language who carry the greatest responsibility? While the Afrikaners are concerned about this issue where are the Zulus, Vendas and Tswanas?

The biggest problem with our constitution is that in appointing PanSALB as the custodian of our languages we seem to think that that means we don't need to do anything!

When an Afrikaans school is forced to become dual medium because black parents want English education for their children is that not a concern? It's a grave concern that most people see this as a conservative white issue. While I'm sure it had those elements, isn't it of more concern that non-English speaking children are getting a bad education by being forced to study in English?

Mondli talked about UNESCO's monitoring of the situation of dying languages. What he doesn't mention is that UNESCO champions mother language education because time after time it has been shown that mother language education is better. It leads to more engaged students, better thinking and better assimilation of fundamental concepts. All of these seem to be missing from the poorest of our schools who hack along in English. So it is a grave concern to me that we are forever assigning children to inferior education because we hope that English education will make them the best.

Where were the mother tongue speakers when UNISA closed down much of its African language department? Not a peep.

When I go to home affairs and see only English forms why am I the only one to complains (English is my mother language)?

The beauty of seeing language as a personal issue is that then you are able to change the situation. If you take language personally then you will decide to speak your mother language at home. If you take it personally then you will use the ATM in your language, you'll tweet and post status updates on Facebook in your language. You'll set your cellphone to your language.

Lastly, you'll claim your workplace for your language by insisting that you use computer software in your language. Organisations like Translate.org.za have been making mother language technology such as the Venda keyboards, Firefox spell checkers, South African calenders and other technologies for years and you can empower your language by simply using them.

When we blame government we remain powerless, when we make language our priority we can change the world.

by dwayne at January 26, 2010 09:33