Trimming VLC translation

  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.
  • : Function ereg() is deprecated in /var/www/translate.org.za/blogs/friedel/includes/file.inc on line 895.

As part of the ANLoc sub-project to localise some software, we decided to tackle VLC. It is a popular multimedia player, and we wanted the localisation teams to work on it. However, it is a difficult localisation project and really big; weighing in at over 33 000 words. We wanted to trim it down to something manageable, and I thought I'll write a bit about the approach.

An approach I have used in the past to trim big localisation tasks was to use pogrep to extract the short strings, or to eliminate certain things. An initial attempt at removing the long strings got it down to under 6000 words. This seems like a huge saving. However, one has to test the translation with podebug in the running application to see if you really got good coverage. A quick test indicated that we were missing some important strings longer than 5 words, and we are already translating a lot at 5700 words. I thought we needed to help the teams much more than this.

So I saw language names in the reduced file, which I don't think is meaningful. It is a chore to translate these, and probably only a feature for subtitles, and probably won't include our African languages.

Looking at the location comments (lines starting with #:) I guessed everything in modules/gui/ is important — we just need to check the number of words (it is almost 6000). So I guess we cut out the translations from modules/gui/ncurses.c which is not a graphical user interface, which cuts it down to about 5700 words which corresponds to the work in the original reduced file.

Many of the tooltips seem to come from src/libvlc-module, which would have been really nice to do, but that is too much (about 5800 words), so I decided to leave that out. Some menus are untranslated with my method, since they come from include/vlc_intf_strings.h - these are only about 300 words, but this pushes us over 6000. I thought it is important, so I added it.

If we now take out the long strings (more than 20 words), we end up with a file of only 4300 words to translate. If we take out all strings longer than 10 words, we only have 3400 words. So we can decide how much we feel people can manage.

We now leave out useful things like the basic help screen (one very long string), and some settings in the preferences, so we could decide if we maybe want to add that by hand. As a pleasant side effect, it seems as if we also cut out some nasty technical terms, although I'm sure there are still many left.

Reducing the translation to about 10% of its original size means:

  • Teams have far less to do.
  • We can be reasonably sure that we are doing the most important 10% of the application (we are still missing some useful things).
  • Teams avoid many of the hidden settings that probably has some of the most difficult terminology, therefore saving more time.
  • Teams will get something that feels like a translated application reasonably soon, which provides a lot of inspiration. This also means that localisation teams can move on to another app if they want to without leaving VLC in a horribly incomplete state.

Comments

Messages should be prioritized

It would help a lot if the translation infrastructure could understand some sort of prioritization. If you translate the config stuff, you end wasting a lot of time on something that probably < 0.1% is ever going to look it. If you don't translate it, the stats don't work so you can't easily track if you're missing a messages.

Unfortunately, gettext and friends don't seem to be getting a lot of love these days.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options