Support for Afrikaans at Google Translate

  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • This discussion is closed: you can't post new comments.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function split() is deprecated in /var/www/translate.org.za/blogs/friedel/sites/all/modules/i18n/i18nstrings/i18nstrings.module on line 617.

Google is constantly extending its software for automatic machine translation. A new set of languages that was recently added, includes Afrikaans and Swahili. This is the first two languages from Africa that is added to the list. This is interesting for several reasons, and I'm wondering how this can change the landscape for these two languages.
Google Translate now also in Afrikaans

As an Afrikaans speaker, the first reason why this is interesting, is to see how well it fares, and to see which mistakes it makes (totally expected, of course). We all know that Google uses statistical machine translation. This theoretically means that it should just keep on improving as they continue to get more data to work with.

Interesting mistakes that I noticed in translation from English to Afrikaans; (please excuse possible mistakes in grammatical terms. Afrikaans ones added for reference.)

  • Morphology (woordbou). Compounds are mostly not handled correctly. It knows about something like "Wêreldbeker" (World cup), but probably just because it encountered it before. How well does statistical machine translation handle target languages like German and Dutch? Will more data make the problem go away?
  • Words with apparent Dutch or German inspiration, such as "epigrammatisch", "gewijd", "gefascineerd" that I can't imagine coming out of any Afrikaans source.
  • The article 'n is is frequently wrong. It occurs very often as vir' n been (with the apostrophe stuck onto the previous word. It seems that the apostrophe is handled as a quotation, and then it closes the "quotation" from time to time with the apostrophe.
  • Where sentences start with the article (lidwoord) 'n, the use of capitalisation is wrong.

Dwayne mentioned that some of the mistakes could be due to training from texts that were collected through optical character recognition. This would explain the problem with the apostrophe, for example. Although statistical machine translation might be language agnostic, the same is definitely not true for optical character recognition.

An interesting one to see was the translation for "long and short-term relationships" — not a bad attempt. The mistake with the incorrect "distant compounding" (afstandsamestelling) can easily be due to optical character recognition.

A few more comments about this:

  • African languages is important enough for Google to put this effort in. Although it is later than what we would have liked, at least this is something. I hope more companies will take note and follow the example. It is interesting to see that Afrikaans is supported before some big languages of India. (With big I mean a language like Bengali with more than 200 million speakers.)
  • Google is not first. The Apertium-project has had a translator for some time already, which works on totally different principles (it is rule based). I would recommend any interested people to collaborate with the Apertium project to improve their software, especially for languages with fewer resources. They are good at helping people who want to contribute. You don't need to have programming knowledge.
  • It works, sort of. Somebody who doesn't understand Afrikaans, should be able to to get an idea of what is written in an Afrikaans text. I did however not get the idea that you would be getting a good idea with the current quality of translation. Try it out and comment below. Humorous examples are especially welcome.
  • Due to the previous point, I believe Afrikaans people can now start to write more in Afrikaans, especially if the intended audience is partially Afrikaans. An argument for making a weblog more accessible for a theoretical international audience doesn't measure as strong as before. I translated this blog post myself. Can I use automatic translation from now on for the English version of my weblog? How accessible will it be?
  • I have no idea how well it translates into languages other than English. I also haven't yet tested it with source languages other than English. I'm keen to hear if somebody can evaluate this.
  • Now more than ever we will need to inform people about the limitations of machine translation. It will be a huge insult if people start to use this without realising that it can in no way and under no circumstances serve as a substitute for a professional translator. There will definitely be people who will want to use this incorrectly, even if they mean well by doing so. If it can't manage the indefinite article 'n, why should we trust it with our marketing material?
  • It doesn't matter all that much how good or bad this is. Because it caries the Google name, and will be integrated with other Google services, it will probably become the machine translation system that people will use.