libtranslate, TM plugins and Virtaal
Someone asked on the translate-devel list about integrating something like libtranslate into Virtaal. What a great idea! Except, its already done and then I realised I hadn't told anyone about it outside of the devel lists. So let me tell you what I did.
Virtaal 0.3, due in a few days maybe hours, added - thanks Walter - a plugin framework that allows various features to be coded as plugins, including of course translation memory or TM. Walter and Alaa created some that can access open-tran.eu and our own TM server. I was curious and wondered how hard it would be to add others, so I added two:
- TinyTM - a TM solutions that we stumbled upon. Although they talk about linking with OmegaT and others, on careful reading it turns out we're probably the first CAT tool client to be able to connect with TinyTM.
- libtranslate - this uses the libtranslate library to query various online machine translation tools to get potential translations.
It took me a weekend to do this. Most of the time was consumed finding and fixing libtranslate compile issues to allow it to be used by Python's ctypes module. I spent very little time doing the actual implementation. With my TinyTM implementation I spent most of the time working out which Python module I should use for PostgreSQL access.
libtranslate: Many people have got and continue to get quite excited about this. Me, I'm not overly excited. I think it will prove useful. but only as useful as the actual machine translations that are provided. How good are the machine translations?. My guess is not that good. For a new translator we run the risk that they accept the suggestion blindly. For an experienced translator we may find that they can type faster than we can get the MT result. What I think is useful is that it can give someone a push in the right direction, so if you really don't know how to translate the segment, you might get enough information to type your own translation. Time will tell.
TinyTM: Nice name. They use PostgreSQL's built in Levenshtein distance module with some nice recursion ideas to speed things up. Potentially they can add other text matching idea. Personally I think the TM server we've built for Virtaal will give better results in the long run and be less database dependent, although we already use FTS on sqlite.
open-tran.eu: I didn't work on this module but I think its worth talking a little about what makes the Virtaal plugin different from everyone else's. Some background for those who don't know about this resource: Open-tran.eu is a repository of all translations in the FOSS world (maybe not all, but everything you'd care about). They use the Translate Toolkit for some manipulation into PO files, so they must be nice guys! You can query from the web and they also expose an XMLRPC interface for querying the database. Think of it as having an absolutely massive TM of all FOSS in all possible languages at your fingertips - neat. So we had to have a plugin to this resource. Except we didn't quite like the results. Open-tran does word based lookups, anything else would probably be too expensive on their server, this results in some matches that are just way off and not useful. Our solution was to mix the open-tran results with a bit of Levenshtein distance. In layman's terms it means we get quick results from open-tran for highly probable matches, we then reduce that list through Levenshtein distance matches and take those results which have the best final matches to you the user of Virtaal. We let open-tran serve us without extra overhead and rather let the client software do the hard crunching. My feeling is that it's open-tran done right.
- dwayne's blog
- Login or register to post comments

There is a lot of options
Hi,
I see there is a lot of options for TM servers. I think is good to have a bunch of options, but on the other hand it is confusing for the translators like me.
What we need is a software that could:
Maybe I forgot some option.
When we have this software maybe we will put it on our own dedicated server for avoiding problems like the slow replies from the server...
Bye,
Leandro Regueiro
Hi, will update my first post
Hi, will update my first post with some new ideas:
I see there is a lot of options for TM servers. I think is good to have a bunch of options, but on the other hand it is confusing for the translators like me.
What we need is a software that could:
This three last options are very useful when discusing about consistency and making glossaries (a topic about in the galician team is very discussed).
There is a lot of programs that intended to do this. One about I heard a lot is Tumatxa http://www.tumatxa.com/
Maybe I still forgot some option.
When we have this software maybe we will put it on our own dedicated server for avoiding problems like the slow replies from the server...
Bye,
Leandro Regueiro
Re: Updated post
Hi Leandro, Nice link I've never seen Tumatxa before.
I've found that most advances we've seen in TM servers have been demand driven, i.e. there is a tool that can use the results.
I think you must be careful to confuse any confusion you have using Virtaal's first TM deliverable with long term confusion. Adding the ability to prioritise TMs, select TMs for a task and show sources of TM would eliminate much of what I think are your concerns in this area.
Yours is a long list, it would take a number of programmers a long time to implement them. So what would you regard as the 1 thing that would make our TM server better? It can already do much of what you require, what would be the next thing?
I think first is 2) then 3)
I think first is 2) then 3) and perhaps then 6).
For 2) I suggest you to create and easy documentation for the developers and then send the link to all of them and/or include in their bugzillas.
3) doesn't need more explanation. Perhaps would be a good idea to test if the different CAT tools could open and parse the files generated.
For 6) I suggest the .mo files parsing too for adding to the TM.
I was thinking about it, and really the strings are obtained from files, but the domain of a file could be, for example in the case of Gimp, Image and also Gnome, so the TMs could be treated as "tags" (like tags from gmail), where a file could have several tags (at least one tag. Say TM instead of tag if you understand it better).
Bye,
Leandro Regueiro
Re: There is a lot of options
Hi Leandro,
Its only confusing until we sort out the source issue, with that done I don't think it will be at all confusing.
Your wish list:
1) We have that already see tmserver in Translate Toolkit
2) Virtaal does that with the tmserver over a simple URL scheme, anyone can do that with a web or desktop CAT tool.
3) Yes that is quite important for offline work, we don't do that now
4) WE can do that with the tmserver loader. Or you can open them in Virtaal make a small change and save and it will push all into the tmserver
5) Getting a system that allows that seamlessly across all those projects is something I'd also like to see, but might take more then the 3 years we've been working towards that with Pootle.
6) That would be useful, potential is their on Pootle can you code in Python ;)
Well most of what you want is there, some will take a little more work. Feel free to recruit programmers to help. We're open source and love to have more hands to help us reach our goals.
Libtranslate not dangerous
I don't think libtranslate will be dangerous - the plugin isn't enabled by default. The user will have to actively enable it at the moment. While people are still improving the packaging status of libtranslate, few people will get access to this for the meanwhile.
Currently it also doesn't show a score, which should already give a hint that there is a difference between libtranslate and the other suggestions.
Incognito
Hi Dwayne,
Everything you've described here is very exciting and it's a pleasure to see how fast Virtaal comes up with all kinds of suggestions.
The only downside is it doesn't show WHERE these suggestions come from (0.3 rc1 of 23/01/2009 on Windows), which robs the user of the possibility to a) check if all the services or plugins (libtranslate, TinyTM and others) work and b) decide whether or not the suggestions is useful or trustworthy.
Thumbs up to everyone at Translate.org.za (and everyone else who's developing Virtaal, Pootle and the Translate Toolkit)! If only everyone was as passionate as you about the things they do.
Re: where do suggestions come from
Thanks for the cred :)
The WHERE issue. In this phase we simply wanted to add some TM functionality. How we presented it and how the translator worked with it was more important then the TM backends. When we built the plugin system it was so easy to add TM plugins. that we shot from the planned 1 to 5 TM backends! So, as you can see, initially WHERE was not going to be an issue.
There is a preliminary patch but the consensus was to leave it out until we can do it properly.
Feel free to voice your concerns on #pootle to see if we should do something before 0.3. Or add some comment to bug #812.