Djihed Afifi

Bringing all translation management tools together

12th December 2008 - 20979 Reads

I am dreaming about a single first stop next-gen (buzzword alert) website for all Open Source internationalisation. Complete with raw material downloading, translation allocation,  reviewing and quality assurance, statistics and error reporting, web based translation, downloading translation files, uploading translation files, and committing translation files. But before letting you know about my complete dream, let me preface that by stating what exists out there.

There are literally thousands of Open Source projects in various programming languages, from numerous developers and websites. Translation teams have come up with different ways of dealing with this variety to minimise time and maximise efficiency and translation reuse. Here are but a few software packages that deal with this chaos:

* Damned lies (as in lies, damned lies and statistics):

Available at: http://l10n.gnome.org

A statistics website that is largely passive. It reports translations statistics for GNOME (and Fedora), and has the most comprehensive reporting out there: it will tell you if you have errors in your po files,  it will list images that you need to capture for documentation, it will complain if you didn’t enable your language in the sources (LINGUAS). It has support for multiple version controls. It still remains a passive tool where translators can only look at statistics and not upload/change anything directly through it.

* Vertimus

Available at: http://gnomefr.traduc.org/suivi/

A file allocation and reviewing tool, used by the French GNOME team. This handy tools assigns multiple roles to users. Users can be assigned PO files that they can work on off-line. Uploaded translation can be reviewed as necessary. Project maintainer/coordinator has to commit files separately.

* Transifex

Available at: https://translate.fedoraproject.org/submit/

A system for submitting translations to upstream projects. It aims to support as many back-ends as possible.  Coordinators can submit a translation which is then passed on to upstream using an account created by Transifex. Projects thus have to be subscribed to benefit from this.

* Pootle

Available at: http://translate.sourceforge.net/wiki/pootle/index

Pootle is a web based translation tool. Behind Pootle is a powerful Python API for reading and writing translation files of multiple formats. Pootle’s interface, however, leaves a lot to be desired and its translations are not strictly version controlled (I could be wrong, but damage sometimes damage can be irreversible). It is file based so it can be real slow for big files.

* open-tran

Available at: http://open-tran.eu/

Open-tran aggregates strings from hundreds of packages and offers suggestions for strings that you search on it. It offers a web API so you can query for strings from your application if you want to.

* Rosetta

Available at (no source): https://launchpad.net/rosetta
Rosetta is probably the software with most features, and it still falls short in many places. It is a web based translation tool, with support for statistics and translation suggestions. It is not as good as Vertimus in its user and QA management, and it again can only deal with projects that are subscribed to it. Plus, its awkward status as a middle man between many upstream projects and ubuntu limits it greatly. It is also greatly limited by the fact that it is closed source.

There are also other in-development projects such as Verbatim of Mozilla.

Obviously, these solutions approach different and the same problems differently. Each with their own login system, their own database (if any), each in effect with their own website. In the end, if a translation team wanted package allocation, web based translation and a good commit back-end then they have to go to three different places, plus all the stuff that is usually provided upstream (KDE online statistics for example, Debian’s, etc). And that would not be enough because some of them only offer access to a small subset of packages.

—-

Now, my dream is a website that integrates the functionality of all of these tools into one inclusive web tool that offers all of these services. It is inclusive in that projects can be subscribed with or without upstream’s help (e.g Pootle). It is up to the language coordinators to decide whether they want to do their translation using the tool (e.g Pootle). The tool can commit on behalf of the coordinators with its own commit credentials or with the coordinator supplied ones if necessary (e.g Transifex partly). Translators can be allocated packages to translate, with complete quality assurance and reviewing steps (Vertimus). The tool displays comprehensive statistics and error messages about translation files if necessary (damned-lies). The tools offers web based translation (e.g Rosetta) with the option to download files and translate them locally if they wish to. The tool offers suggestions from a wide variety of packages (open-trans) with an option to enforce a per language terminology dictionary. Packages can be opened as free for all translation, or only users of team X can translate/review them. Open packages may invite inexperienced translators and inconsistency but should be reviewed thoroughly.

Plugins can be written to compile and serve translations. In fact, the website should offer an API with good extensibility - the API offers downloading files, settings up a local repository if you need to, serving statistics, committing files etc.. Thus, it should be easy to circumvent the whole tool all together and do things the old way if a translator/team desires so.

The website can be augmented by web 2.0 (err, another buzzword) features. Meta statistics can be posted (User X did y translations, z reviews. Language team A did y strings, etc.). This stuff may look juvenile but can be greatly fun and encourages competition between teams (just look at what Wikipedia statistics do). Notes or even threads can be attached to packages or strings if necessary.

I believe such a tool would be an empowering tool for translators, especially for languages with small teams. For many teams, the same people more or less translate most open source packages, so such a tool will be of great help. It would greatly help consistency and make translators work on actual translations instead of back-end coordination. Teams would no longer have to replicate their setup across many projects.

I largely steered away from implementation details as I would like to discuss the idea first. I believe the previously mentioned projects addressed many of the individual features separately so they are certainly demonstratively doable. What remains is to just bring the lot together and integrate it.

Any comments are welcome. I will post this content on various relevant mailing lists in a few days as well.

6 Responses to “Bringing all translation management tools together”

  1. F Wolff Says:

    Great post! I think I share your dream to a large extent, and agree that we need to build on the strengths of what is out there. I wrote a more complete reply on my blog:

    http://translate.org.za/blogs/friedel/en/content/re-bringing-all-translation-management-tools-together

  2. Dwayne Bailey Says:

    Nice post. I must say you’ve put in words what we’ve been pushing for for along time in Pootle. Yes that’s my bias :)

    My dream is to see some sort of DVCS for translation. Thus putting translation and translators in the driving seat, allowing them to translate rather then navigate the quagmire that is the many different ways that programmers expect localisers to work with them. Fine if you do one project but as your observe many are doing multiple projects: try Firefox, OpenOffice.org, GNOME and a few others and no matter what people have done to help you it still requires differences on each system.

    My dream then is to see something where I can choose some place of residence: My teams Pootle server, my OpenOffice.org Pootle account and then translate whatever I need to and want to across those. Pushing and pulling translations as needed and sharing them back upstream. I have always wanted people to see Pootle as an aid and not a forced process. Thus you use it to augment current and existing processes.

    A heads up on some features that I think you might not be aware of in Pootle and which I think align quite strongly with your vision:
    * Pootle does version control - its done it for a long time now and covers all the major VCS and DVCS. So you can commit just like transifex using a common account.
    * Quality assurance - this for me is missing in your list. Pootle has 45+ QA checks that can be adapted for various languages
    * Stats - they’re pretty good and comprehensive - looking nicer thanks to the Mozilla work
    * Assignment and Goals - Pootle can setup goals and assign people to those goals. So elementary work assignment
    * Suggestions - Joe Random can make suggestions that translators can review thus preventing unadulterated damage to content

    Where we’re headed with Pootle
    * Verbatim - you mention Mozilla’s Verbatim, this is built on Pootle
    * Translate Toolkit - the core of Pootle is the Translate Toolkit and that’s simply going to get more and more powerful
    * Django migration - we’re busy completing a port of Pootle to Django, interestingly Pootle, Damned Lies and Transifex all now use Django so their might be some opportunities to integrate and share.

    For me a dream system would also include these features:
    * The ability to push projects to a server and get completed work back
    * The ability for a translation co-ordinator to set goals and objectives and monitor progress so that including new languages is simple and autoamtic
    * Information underload. There is overload of info on l10n. Lets reduce that by allowing for good communication of good information. E.g. a break in string freeze only really needs to be communicated with people at 100% since otherwise they might be surprised to see they’re not 100% on release. Only people who look like they can make the deadline really need to be told that the deadline is looming. etc.

    I could write more.

    But in summary I think we’re seeing the infrastructure coming together nicely and the speed is increasing, 2009 is going to be a great year for localisation infrastructure.

  3. Translation web apps | Leonardo Fontenelle Says:

    […] I just read Djihed Afifi’s article on Damned Lies, Vertimus, Transifex, Pootle and other translation tools, as well as Dwayne Bailey’s comment and Friedel Wolff’s reply. What a coincidence! I just posted an overview of translation web apps (in Portuguese) But then, Djihed’s focus is different; he is more concerned on how much synergy/integration can we have between all this tools. […]

  4. Stéphane Raimbault Says:

    I’m the maintainer of Vertimus and Claude and me have worked together to rewrite Damned Lies in Django and a full Vertimus will land in SVN soon. Stay tuned!

    http://svn.gnome.org/viewvc/damned-lies/trunk/vertimus/

    PS: I’ve also tried to work with the Transifex project but finally I think it’s too difficult to embrace all translation projects (GNOME, Mozilla, etc) with only one tool.

  5. Trisha Says:

    Have you tried World Server, formerly Idiom, now SDL? If you write good filters you can do just about anything. It is not great with RTL, but…

  6. Linostar Says:

    Great ideas! I hope merging the advantages and features of all these translation tools become realizable. I’d like to add some other points regarding monitoring and improving translation quality & automation:

    - Message identification. Each translated message is associated with the name (or id) of its translator. That’s work together with the following point, which is

    - Translator rating. Each translator has to have its quality of translation rated by the team leader> This point combined with the previous point allows the team leader (or the coordinator or the proof-editor) to categorize the translated messages. For example, messages that belong to user of a high rating don’t need to be heavily reviewed, in contrary of those coming from a low-rated translator.

    - Smart Auto-completion. When the translation tool finds a message in the translation database closely similar to an untranslated message, instead of putting right away what was found in the database and marking the message is fuzzy, it can start guessing the complete correct translation by applying certain rules and with the assistance of a dictionary. Furthermore, more intelligent translations can be made by splitting long messages stored in the database (sentences) into small phrases (splitting whenever a punctuation is met). That way if these small phrases (or sub-phrases) are found in another message, a part of this message will be automatically translated.

    - Categorizing messages also by types of software. For example, for a new untranslated message in a web-browser software, the search in the translation database will take place first in other web-browsers messages, then in Internet software’s messages, and finally in the general area’s messages. That should be applied especially for short messages (terms) because some of them are related to the category of the software.

    That’s what crosses my mind now.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>