Archive for the 'Linux' Category

Moving Pootle into the Realm of Social Translation

Wednesday, November 18th, 2009

Since the previous post on integrating translation tools, functionalities of Vertimus have been included in Damned-lies to move us yet another goal towards the integration of translation functionalities as outlines in that post. Since then, we have had a discussion with Alaa (Working on Pootle and its support tools) and the guys at the Arab Techies 2009 codesprint. Out of the discussion came the following document, a proposal for moving pootle into the area of “social translation” as described below. The notion takes the integration proposal in the previous post further and gives a specification of a tool that could support it.

Document:

For small open translation teams like Arabic, the availability and quality of translation tools can greatly increase productivity, quality and translation throughput. Often times, the precious little time of these teams is spent often on administration overhead instead of actual translation.

In a discussion about the Arabic translation process at the Arabtechies 2009 codesprint it was recognised adding so called social networking features to and online translation tool such as Pootle will help the Arabic team tremendously.

Indeed, past experiences, such as Arabic Wikipedia, suggest that such features encourage contributors. An Arabeyes pilot project that used a heavily templated MediaWiki instance to control the translation of The Arabic Technical Dictionary produced encouraging results. The terms of the dictionary are discussed and voted on online, and a weekly script converts the dictionary into PDF and the po gettext format. (referring to the Arabic Technical Dictionary)

The discussions were sometimes heated, and in wikipedia style, such discussions often result in the best translations being visited thoroughly. In the end, the discussions and the collective work adds extra credence to the terms and other teams feel more inclined to use them, resulting in greater consistency, especially for smaller languages and languages whose official bodies do not keep up with the pace of new terms and technologies.

Some of the suggestions of the brainstorm discussion at the event are already implemented to a large degree in other translation help tools such as Vertimus and Damned-Lies and Transifex. The fact that these tools use Python, with most being built over the Django framework provides and obvious opportunity to integrate these tools, thus avoiding duplicate work and providing a one stop near complete solution for translation teams.

Pootle would be the centre piece of such integration, as it is the component that offers online translation. Support some of the following features strongly suggests that Pootle needs to support unit based translation, possibly backed by a database to store the strings and po files. Moving to such a flexible setup (if possibly resource intensive) would offer quite a lot of possibilities.

File assignment:

It may be the case that a coordinator trusts a contributor best to know how to translate a particular module (or even a set of strings from a module) given that they translated it before, or to give translators a sense of “module ownership” to motivate them to work towards completing it, it could be also that a particular translator can be called to voice his opinion with regards a particular string, most often it is simply the desire to minimise duplication of work and to ensure even distribution of efforts across a number of module

With module or unit set assignment, a coordinator may assign a module to a translator for completion. A translator may also attempt to claim or ask to claim a module.

This feature is implemented to a large degree in Vertimus, except the possibility of unit set assignment. (I believe Pootle now also supports this to a large degree)

Translation discussion and voting:

Each unit should have a discussion space attached to it. Such a space will allow translators to express the reason or their arguments for translating should an argument arise.

The traditional translation string input box should be augmented with the possibility to vote for an existing translation or opt to suggest a new one. I should also be possible for a coordinator to “override” a vote if it is deemed necessary.

Quality Assurance:

Some translators may be augmented to a status of “reviewer”. Such users would be able to mark strings as “reviewed”. The aim of a translation team is thus to have a first pass to translate a module and a second to review every translation.

Deadline management:

A big nuisance to small translation teams is deadline management and keeping up to date with them. Currently, each team has to “shop around” to make sure that they will be up to date with each project prior to its release. Pootle can attach such dates to each module at once and they’d be shown for every language. Modules can thus be prioritied for translators based on their deadlines.

Branch management:

In can be the case that a module may have different branches, often active at the same time. The tool should thus keep both branches live for translation and ideally share the translations of whatever strings they have shared.

Unit level permissions:

It would be benefitial to be able to “lock” certain units within a module or a whole module from, say, anonymous translators, or translators that are not module owners, or all translators (in the case of say, translator-credits like strings).

Statistics:

Pootle can be augmented with extra statistics that include user level statistics, module/project statistics, progress reports and even “fun” comparisons between teams and projects.

User and Team widgets:

A feature that can motivate individual users and teams to contribute more is user widgets. A blogger may for example, display a widget on a sidebar that displays quick statistics for the team or the translator, such as contributed projects, the number of strings they contributed to, etc…

Bringing all translation management tools together

Friday, December 12th, 2008

I am dreaming about a single first stop next-gen (buzzword alert) website for all Open Source internationalisation. Complete with raw material downloading, translation allocation,  reviewing and quality assurance, statistics and error reporting, web based translation, downloading translation files, uploading translation files, and committing translation files. But before letting you know about my complete dream, let me preface that by stating what exists out there.

There are literally thousands of Open Source projects in various programming languages, from numerous developers and websites. Translation teams have come up with different ways of dealing with this variety to minimise time and maximise efficiency and translation reuse. Here are but a few software packages that deal with this chaos:

* Damned lies (as in lies, damned lies and statistics):

Available at: http://l10n.gnome.org

A statistics website that is largely passive. It reports translations statistics for GNOME (and Fedora), and has the most comprehensive reporting out there: it will tell you if you have errors in your po files,  it will list images that you need to capture for documentation, it will complain if you didn’t enable your language in the sources (LINGUAS). It has support for multiple version controls. It still remains a passive tool where translators can only look at statistics and not upload/change anything directly through it.

* Vertimus

Available at: http://gnomefr.traduc.org/suivi/

A file allocation and reviewing tool, used by the French GNOME team. This handy tools assigns multiple roles to users. Users can be assigned PO files that they can work on off-line. Uploaded translation can be reviewed as necessary. Project maintainer/coordinator has to commit files separately.

* Transifex

Available at: https://translate.fedoraproject.org/submit/

A system for submitting translations to upstream projects. It aims to support as many back-ends as possible.  Coordinators can submit a translation which is then passed on to upstream using an account created by Transifex. Projects thus have to be subscribed to benefit from this.

* Pootle

Available at: http://translate.sourceforge.net/wiki/pootle/index

Pootle is a web based translation tool. Behind Pootle is a powerful Python API for reading and writing translation files of multiple formats. Pootle’s interface, however, leaves a lot to be desired and its translations are not strictly version controlled (I could be wrong, but damage sometimes damage can be irreversible). It is file based so it can be real slow for big files.

* open-tran

Available at: http://open-tran.eu/

Open-tran aggregates strings from hundreds of packages and offers suggestions for strings that you search on it. It offers a web API so you can query for strings from your application if you want to.

* Rosetta

Available at (no source): https://launchpad.net/rosetta
Rosetta is probably the software with most features, and it still falls short in many places. It is a web based translation tool, with support for statistics and translation suggestions. It is not as good as Vertimus in its user and QA management, and it again can only deal with projects that are subscribed to it. Plus, its awkward status as a middle man between many upstream projects and ubuntu limits it greatly. It is also greatly limited by the fact that it is closed source.

There are also other in-development projects such as Verbatim of Mozilla.

Obviously, these solutions approach different and the same problems differently. Each with their own login system, their own database (if any), each in effect with their own website. In the end, if a translation team wanted package allocation, web based translation and a good commit back-end then they have to go to three different places, plus all the stuff that is usually provided upstream (KDE online statistics for example, Debian’s, etc). And that would not be enough because some of them only offer access to a small subset of packages.

—-

Now, my dream is a website that integrates the functionality of all of these tools into one inclusive web tool that offers all of these services. It is inclusive in that projects can be subscribed with or without upstream’s help (e.g Pootle). It is up to the language coordinators to decide whether they want to do their translation using the tool (e.g Pootle). The tool can commit on behalf of the coordinators with its own commit credentials or with the coordinator supplied ones if necessary (e.g Transifex partly). Translators can be allocated packages to translate, with complete quality assurance and reviewing steps (Vertimus). The tool displays comprehensive statistics and error messages about translation files if necessary (damned-lies). The tools offers web based translation (e.g Rosetta) with the option to download files and translate them locally if they wish to. The tool offers suggestions from a wide variety of packages (open-trans) with an option to enforce a per language terminology dictionary. Packages can be opened as free for all translation, or only users of team X can translate/review them. Open packages may invite inexperienced translators and inconsistency but should be reviewed thoroughly.

Plugins can be written to compile and serve translations. In fact, the website should offer an API with good extensibility - the API offers downloading files, settings up a local repository if you need to, serving statistics, committing files etc.. Thus, it should be easy to circumvent the whole tool all together and do things the old way if a translator/team desires so.

The website can be augmented by web 2.0 (err, another buzzword) features. Meta statistics can be posted (User X did y translations, z reviews. Language team A did y strings, etc.). This stuff may look juvenile but can be greatly fun and encourages competition between teams (just look at what Wikipedia statistics do). Notes or even threads can be attached to packages or strings if necessary.

I believe such a tool would be an empowering tool for translators, especially for languages with small teams. For many teams, the same people more or less translate most open source packages, so such a tool will be of great help. It would greatly help consistency and make translators work on actual translations instead of back-end coordination. Teams would no longer have to replicate their setup across many projects.

I largely steered away from implementation details as I would like to discuss the idea first. I believe the previously mentioned projects addressed many of the individual features separately so they are certainly demonstratively doable. What remains is to just bring the lot together and integrate it.

Any comments are welcome. I will post this content on various relevant mailing lists in a few days as well.

Interview with Leonardo Fontenelle

Tuesday, November 27th, 2007

A while back I was interviewed by Leonardo Fontenelle (An active Free Software l10n contributor from Brazil), it is worth mentioning here. There, I described many aspects of our work in Arabic translation and Arabeyes. Here I quote some passages:

On the open nature of the translation process:

I guess I don’t like some particular phase per see, but all in all, I very much adore the open nature of it. Right from the source code to the actual compiled message catalogues. You can’t really get much more open that this. This openness really pays off when translating weird messages, when trying to view translations live, when comparing with translations of other packages, when viewing translations of different languages, etc.

On the Technical Dictionary:

The technical dictionary is basically an English-Arabic dictionary for computing terms. At first, we started making it with .po files, but that created many problems with versioning and discussing the terms. So I had the idea of uploading the terms to our Wiki. I used some scripts to convert the .po files to Wiki xml input. The wiki, being open, allows people to edit as they see fit, discuss terms, suggest alternatives, etc. Then finally, there are some scripts that take the wiki pages and convert them back to .po files, as well as .pdf suitable for printing/reading. The experience was very rewarding to us.

On contributors:

For Arabeyes, we are forever in need for contributors. We do think of lots of ideas, but we always hit the shortage of manpower wall. We’d like to see Arabic support addressed in all popular OSS applications. We’d also like to develop a free Arabic OCR application and an automatic translator. This is short term, but the long term list is a big one.

Please read the interview here. It is also translated in Portuguese, thanks to Leonardo.

Part 1.

Part 2.

It has been a long time - some explanations

Friday, July 13th, 2007

I know that it has been a long time since I showed up here! Unfortunately, over the last months I have been way too busy. End of my final university year (I’m graduating today!), job hunting, some vacation, and going for Umra. I will be back in a few weeks inchallah.

Apologies to all concerned, and to all those who have sent me emails and comments. Rest assured that I still have your messages and that I will take the time to go through them in detail. Minbar 2.0 is missing only a few more tweaks then it will be ready. Also, salam to all those at Arabeyes! I still continue to monitor the Wiki, and I always get amazed by the ongoing participation.

Soon inchallah.

tclgeoip 0.2

Sunday, April 15th, 2007

I would like to announce a new version of tclgeoip, the TCL extension for GeoIP.

Amongst the changes in 0.2:

  1. Fixed a segfault when loading databases.
  2. Introduced a new function to check presence of databases (db_avail)
  3. Updated documentation.

Grab the .tar.gz sources here. Also, sources from SVN repository.

A debian package is being cooked.

Arabic Gnome Making Progess

Saturday, November 11th, 2006

The Gnome Arabic Team is making serious progress to complete Gnome 2.16 translation. We are a bit late due to lack of contributors, but many joined the team recently, so better late than never.

Arabic is a beautiful language, I’m sure you will agree by looking at these screenshots of translated applications. Thanks to all those who helped: Khaled Hosny, Youcef Raffah, Youssef Chahibi, Mohamed Magdy, and all those who translated previous version: Arafat Medini, Bayazidi, and a lot more. I would like to take this chance to extend my invitation to all past contributors and new members: You are always welcome, there is a lot you can do to help the effort: a lot of people are waiting for the release. If you would like to help, please have a look at this roadmap, then email me at djihed at djihed.com

Now, here are the screenshots, click to enlarge:

Gedit: Now becomes the best Arabic Linux Text editor:

Arabic Gedit

Epiphany: The gorgeous Gnome Internet Browser:

Arabic Epiphany
Ekiga: the Internet telephony software:

Arabic Ekiga
Eye of Gnome: The image viewer:
Arabic Eye of Gnome

And last but not least, file-roller, the archive manager:
Arabic File Roller

tclgeoip: GeoIP TCL extension

Tuesday, September 12th, 2006

GeoIP is a technology developed by MaxMind for geographical and organisational lookup of IP addresses and hostnames. Maxmind provides an LGPL licensed API written in C. The API pulls the data from database files. MaxMind gets its income by selling updated at frequent time intervals.

Many packages have been built on top of the C API to allow GeoIP calls from different languages, like PHP (frequently used in CMS’s) and even an Apache module.

I wrote a TCL extension on top of this C API. It has been accepted by MaxMind and is sitting here. It currently compiles without tweaking on Solaris systems. There will be an update soon to make it compile on most other UNIX system automagically.

Note that you need Tcl dev files and the GeoIP C API to compile this.

The bash clown prompt

Friday, September 8th, 2006

My favourite UNIX shell is bash, or the Bourne Again Shell. Its syntax is neat and clean, I find it cleaner than most other shells, and it is available on most UNIX variants. its customisability is also very good. I’ve customised the default prompt to a colourful joyful and informative message. I call it the clown prompt. Here is a screenshot of how it looks like in putty, note that I have slightly changed the blue colour on putty to make it more readable.

Clown Prompt on Putty

To make it the default prompt, copy these lines to the end of your ~/.bashrc file. If there is not one, create it.

# Term settings
TTYTEMPNAME=$(tty)
CHOMPED=${TTYTEMPNAME:5}
PS1="[33[1;33m][[33[1;32m]t[33[1;33m]]
[33[1;33m][[33[1;31m]u[33[1;33m]:
[33[1;31m]h[33[1;33m]]
[33[1;33m][[33[1;36m]$CHOMPED
[33[1;33m]:[33[1;36m]#[33[1;33m]]
[33[1;33m][[33[1;35m]w[33[1;33m]]
[33[1;34m]#[33[0m] "

The first two lines grab the name of the current tty, line 3 prints the time, lines 4,5 print the username and the name of the machine, lines 6,7 prints the name of the tty along with the command number, line 8 prints the current working directory. I find all of this information valuable offhand sometimes. To apply changes, source your ~/.bashrc file by issuing: source ~/.bashrc

For peace of mind, insert this line into your ~/.bashrc file, after the previous code. It solves a problem where long lines would wrap to the same current line.

shopt -s checkwinsize