Djihed Afifi

Archive for the 'Arabisation' Category

Moving Pootle into the Realm of Social Translation

18th November 2009

Since the previous post on integrating translation tools, functionalities of Vertimus have been included in Damned-lies to move us yet another goal towards the integration of translation functionalities as outlines in that post. Since then, we have had a discussion with Alaa (Working on Pootle and its support tools) and the guys at the Arab Techies 2009 codesprint. Out of the discussion came the following document, a proposal for moving pootle into the area of “social translation” as described below. The notion takes the integration proposal in the previous post further and gives a specification of a tool that could support it.

Document:

For small open translation teams like Arabic, the availability and quality of translation tools can greatly increase productivity, quality and translation throughput. Often times, the precious little time of these teams is spent often on administration overhead instead of actual translation.

In a discussion about the Arabic translation process at the Arabtechies 2009 codesprint it was recognised adding so called social networking features to and online translation tool such as Pootle will help the Arabic team tremendously.

Indeed, past experiences, such as Arabic Wikipedia, suggest that such features encourage contributors. An Arabeyes pilot project that used a heavily templated MediaWiki instance to control the translation of The Arabic Technical Dictionary produced encouraging results. The terms of the dictionary are discussed and voted on online, and a weekly script converts the dictionary into PDF and the po gettext format. (referring to the Arabic Technical Dictionary)

The discussions were sometimes heated, and in wikipedia style, such discussions often result in the best translations being visited thoroughly. In the end, the discussions and the collective work adds extra credence to the terms and other teams feel more inclined to use them, resulting in greater consistency, especially for smaller languages and languages whose official bodies do not keep up with the pace of new terms and technologies.

Some of the suggestions of the brainstorm discussion at the event are already implemented to a large degree in other translation help tools such as Vertimus and Damned-Lies and Transifex. The fact that these tools use Python, with most being built over the Django framework provides and obvious opportunity to integrate these tools, thus avoiding duplicate work and providing a one stop near complete solution for translation teams.

Pootle would be the centre piece of such integration, as it is the component that offers online translation. Support some of the following features strongly suggests that Pootle needs to support unit based translation, possibly backed by a database to store the strings and po files. Moving to such a flexible setup (if possibly resource intensive) would offer quite a lot of possibilities.

File assignment:

It may be the case that a coordinator trusts a contributor best to know how to translate a particular module (or even a set of strings from a module) given that they translated it before, or to give translators a sense of “module ownership” to motivate them to work towards completing it, it could be also that a particular translator can be called to voice his opinion with regards a particular string, most often it is simply the desire to minimise duplication of work and to ensure even distribution of efforts across a number of module

With module or unit set assignment, a coordinator may assign a module to a translator for completion. A translator may also attempt to claim or ask to claim a module.

This feature is implemented to a large degree in Vertimus, except the possibility of unit set assignment. (I believe Pootle now also supports this to a large degree)

Translation discussion and voting:

Each unit should have a discussion space attached to it. Such a space will allow translators to express the reason or their arguments for translating should an argument arise.

The traditional translation string input box should be augmented with the possibility to vote for an existing translation or opt to suggest a new one. I should also be possible for a coordinator to “override” a vote if it is deemed necessary.

Quality Assurance:

Some translators may be augmented to a status of “reviewer”. Such users would be able to mark strings as “reviewed”. The aim of a translation team is thus to have a first pass to translate a module and a second to review every translation.

Deadline management:

A big nuisance to small translation teams is deadline management and keeping up to date with them. Currently, each team has to “shop around” to make sure that they will be up to date with each project prior to its release. Pootle can attach such dates to each module at once and they’d be shown for every language. Modules can thus be prioritied for translators based on their deadlines.

Branch management:

In can be the case that a module may have different branches, often active at the same time. The tool should thus keep both branches live for translation and ideally share the translations of whatever strings they have shared.

Unit level permissions:

It would be benefitial to be able to “lock” certain units within a module or a whole module from, say, anonymous translators, or translators that are not module owners, or all translators (in the case of say, translator-credits like strings).

Statistics:

Pootle can be augmented with extra statistics that include user level statistics, module/project statistics, progress reports and even “fun” comparisons between teams and projects.

User and Team widgets:

A feature that can motivate individual users and teams to contribute more is user widgets. A blogger may for example, display a widget on a sidebar that displays quick statistics for the team or the translator, such as contributed projects, the number of strings they contributed to, etc…

Posted in Linux, Arabisation, codesprint2009 | 1 Comment »

Bringing all translation management tools together

12th December 2008

I am dreaming about a single first stop next-gen (buzzword alert) website for all Open Source internationalisation. Complete with raw material downloading, translation allocation,  reviewing and quality assurance, statistics and error reporting, web based translation, downloading translation files, uploading translation files, and committing translation files. But before letting you know about my complete dream, let me preface that by stating what exists out there.

There are literally thousands of Open Source projects in various programming languages, from numerous developers and websites. Translation teams have come up with different ways of dealing with this variety to minimise time and maximise efficiency and translation reuse. Here are but a few software packages that deal with this chaos:

* Damned lies (as in lies, damned lies and statistics):

Available at: http://l10n.gnome.org

A statistics website that is largely passive. It reports translations statistics for GNOME (and Fedora), and has the most comprehensive reporting out there: it will tell you if you have errors in your po files,  it will list images that you need to capture for documentation, it will complain if you didn’t enable your language in the sources (LINGUAS). It has support for multiple version controls. It still remains a passive tool where translators can only look at statistics and not upload/change anything directly through it.

* Vertimus

Available at: http://gnomefr.traduc.org/suivi/

A file allocation and reviewing tool, used by the French GNOME team. This handy tools assigns multiple roles to users. Users can be assigned PO files that they can work on off-line. Uploaded translation can be reviewed as necessary. Project maintainer/coordinator has to commit files separately.

* Transifex

Available at: https://translate.fedoraproject.org/submit/

A system for submitting translations to upstream projects. It aims to support as many back-ends as possible.  Coordinators can submit a translation which is then passed on to upstream using an account created by Transifex. Projects thus have to be subscribed to benefit from this.

* Pootle

Available at: http://translate.sourceforge.net/wiki/pootle/index

Pootle is a web based translation tool. Behind Pootle is a powerful Python API for reading and writing translation files of multiple formats. Pootle’s interface, however, leaves a lot to be desired and its translations are not strictly version controlled (I could be wrong, but damage sometimes damage can be irreversible). It is file based so it can be real slow for big files.

* open-tran

Available at: http://open-tran.eu/

Open-tran aggregates strings from hundreds of packages and offers suggestions for strings that you search on it. It offers a web API so you can query for strings from your application if you want to.

* Rosetta

Available at (no source): https://launchpad.net/rosetta
Rosetta is probably the software with most features, and it still falls short in many places. It is a web based translation tool, with support for statistics and translation suggestions. It is not as good as Vertimus in its user and QA management, and it again can only deal with projects that are subscribed to it. Plus, its awkward status as a middle man between many upstream projects and ubuntu limits it greatly. It is also greatly limited by the fact that it is closed source.

There are also other in-development projects such as Verbatim of Mozilla.

Obviously, these solutions approach different and the same problems differently. Each with their own login system, their own database (if any), each in effect with their own website. In the end, if a translation team wanted package allocation, web based translation and a good commit back-end then they have to go to three different places, plus all the stuff that is usually provided upstream (KDE online statistics for example, Debian’s, etc). And that would not be enough because some of them only offer access to a small subset of packages.

—-

Now, my dream is a website that integrates the functionality of all of these tools into one inclusive web tool that offers all of these services. It is inclusive in that projects can be subscribed with or without upstream’s help (e.g Pootle). It is up to the language coordinators to decide whether they want to do their translation using the tool (e.g Pootle). The tool can commit on behalf of the coordinators with its own commit credentials or with the coordinator supplied ones if necessary (e.g Transifex partly). Translators can be allocated packages to translate, with complete quality assurance and reviewing steps (Vertimus). The tool displays comprehensive statistics and error messages about translation files if necessary (damned-lies). The tools offers web based translation (e.g Rosetta) with the option to download files and translate them locally if they wish to. The tool offers suggestions from a wide variety of packages (open-trans) with an option to enforce a per language terminology dictionary. Packages can be opened as free for all translation, or only users of team X can translate/review them. Open packages may invite inexperienced translators and inconsistency but should be reviewed thoroughly.

Plugins can be written to compile and serve translations. In fact, the website should offer an API with good extensibility - the API offers downloading files, settings up a local repository if you need to, serving statistics, committing files etc.. Thus, it should be easy to circumvent the whole tool all together and do things the old way if a translator/team desires so.

The website can be augmented by web 2.0 (err, another buzzword) features. Meta statistics can be posted (User X did y translations, z reviews. Language team A did y strings, etc.). This stuff may look juvenile but can be greatly fun and encourages competition between teams (just look at what Wikipedia statistics do). Notes or even threads can be attached to packages or strings if necessary.

I believe such a tool would be an empowering tool for translators, especially for languages with small teams. For many teams, the same people more or less translate most open source packages, so such a tool will be of great help. It would greatly help consistency and make translators work on actual translations instead of back-end coordination. Teams would no longer have to replicate their setup across many projects.

I largely steered away from implementation details as I would like to discuss the idea first. I believe the previously mentioned projects addressed many of the individual features separately so they are certainly demonstratively doable. What remains is to just bring the lot together and integrate it.

Any comments are welcome. I will post this content on various relevant mailing lists in a few days as well.

Posted in Linux, Arabisation, i18n | 7 Comments »

Gnome 2.22 Arabic Translation

10th January 2008

Over the last few weeks we have maintained a healthy state of GNOME 2.22 translation to Arabic, thanks to Anas Husseini, Abou Manal, Ahmad Farghal, Osama Khayat and Khaled Hosny. Currently we have 95% done and we are at the third spot. Detailed statistics are listed below, and here is a link to the official GNOME statistics:

http://l10n.gnome.org/releases/gnome-2-22
The new modules for the next release have been settled, and the strings are more or less finalised, except for a few changes in the future. So, time to switch our focus to this release and get it done as soon as possible.

I have been quiet busy in the last few week with fixing various RTL and Arabic bugs in GNOME. Expect a few fixes in the next release. I will continue on this work so unfortunately I won’t have much time for translation.

It would be very good if we could complete this release as soon as possible. I would like to dedicate the last 2 weeks before the release to strict Quality Assurance and Translation Revision. I have built the whole new GNOME from sources and so I expect to test and review most translations.

Please feel free to assign yourself any of the uncompleted packages in this list, and let me know what you have taken.

* Translated    39311 95.42%
* Fuzzy          1271 3.09%
* Untranslated    615 1.49%

* Total         41197
* To be done     1886 4.58%

Incomplete Packages
--------------------
Package                 Translated Fuzzy Untranslated
eel                       30         0    1
libgnomekbd               49         0    1
nautilus                1161         1    0
metacity                 514         1    2
gnome-applets-locations 4355         3    0
gnome-terminal           483         3    0
gnome-desktop             65         0    5
gnome-applets            939         5    0
epiphany                 909         5    1
gnome-session            122         6    1
gtk -properties         1501         6    1
gnome-volume-manager     196         8    0
evince                   287         3    6
gtk-engines               32         4    6
cheese                    45         5    5
file-roller              249         7    3
sound-juicer             156         8    6
gnome-build              110         7   10
gnome-system-tools       231        11    6
evolution-data-server   1026        14    3
gtk                      908        16    1
gnome-utils              723        10    9
gnome-system-monitor     210         8   12
vino                      84         8   12
ekiga                    632        11   10
gconf                    453        24    0
eog                      245         8   17
gnome-keyring             59        16   15
totem                    426        22    9
tomboy                   355        24    9
gnome-power-manager      443        36    3
deskbar-applet           186        16   24
libgnome                 215        40    0
seahorse                 727        37    6
empathy                  266         7   37
gnome-control-center     829        41    9
gimmie                   122        40   13
yelp                     289        29   30
gdm                       66        51   15
gtksourceview            273        62   17
gdl                       17        48   33
gnome-games             1684        49   37
gcalctool                303        72   27
anjuta                  1853        80   32
orca                     924        63   52
glib                     215        56   68
gvfs                       0       149   29
evolution               4664       151   32

Posted in Arabisation, Gnome | 1 Comment »

Interview with Leonardo Fontenelle

27th November 2007

A while back I was interviewed by Leonardo Fontenelle (An active Free Software l10n contributor from Brazil), it is worth mentioning here. There, I described many aspects of our work in Arabic translation and Arabeyes. Here I quote some passages:

On the open nature of the translation process:

I guess I don’t like some particular phase per see, but all in all, I very much adore the open nature of it. Right from the source code to the actual compiled message catalogues. You can’t really get much more open that this. This openness really pays off when translating weird messages, when trying to view translations live, when comparing with translations of other packages, when viewing translations of different languages, etc.

On the Technical Dictionary:

The technical dictionary is basically an English-Arabic dictionary for computing terms. At first, we started making it with .po files, but that created many problems with versioning and discussing the terms. So I had the idea of uploading the terms to our Wiki. I used some scripts to convert the .po files to Wiki xml input. The wiki, being open, allows people to edit as they see fit, discuss terms, suggest alternatives, etc. Then finally, there are some scripts that take the wiki pages and convert them back to .po files, as well as .pdf suitable for printing/reading. The experience was very rewarding to us.

On contributors:

For Arabeyes, we are forever in need for contributors. We do think of lots of ideas, but we always hit the shortage of manpower wall. We’d like to see Arabic support addressed in all popular OSS applications. We’d also like to develop a free Arabic OCR application and an automatic translator. This is short term, but the long term list is a big one.

Please read the interview here. It is also translated in Portuguese, thanks to Leonardo.

Part 1.

Part 2.

Posted in Linux, Arabisation, Gnome | No Comments »

Downloads of the Technical Dictionary

18th April 2007

About 1 month ago I wrote some scripts to get the technical dictionary contents from the Wiki to a pdf.

At the time I wondered how many people would download it, so I did not spend a good time to make it neatly formatted. The end result pdf was not very good.

Today, however, I decided to check if people are actually downloading it. Doing some Data Mining on the Apache logs, I was quite surprised to see 387 downloads, 197 are from unique IP addresses. The break down of unique downloaders by country is shown below.

Encouraging, time to go back, beautify it and make it look good.

On another front, parsing the referers (which page directed people to the pdf), about 70% were from the arabic page, 30% from the English page. This highlights the importance of having pages in both languages for Arabic and English speakers.

Breakdown of unique Technical Dictionary downloads by country:

38 : EG, Egypt
22 : US, United States
12 : SA, Saudi Arabia
10 : GB, United Kingdom
9 : PS, Palestine
9 : DZ, Algeria
8 : TR, Turkey
8 : AE, United Arab Emirates
7 : MA, Morocco
6 : JO, Jordan
6 : DE, Germany
5 : IL, Israel
3 : TN, Tunisia
3 : QA, Qatar
3 : OM, Oman
3 : --, N/A
3 : IT, Italy
3 : FR, France
3 : CZ, Czech Republic
2 : SD, Sudan
2 : LY, Libyan Arab Jamahiriya
2 : KW, Kuwait
2 : IN, India
2 : FI, Finland
2 : CN, China
2 : BH, Bahrain
2 : A2, Satellite Provider
1 : ZA, South Africa
1 : UA, Ukraine
1 : TH, Thailand
1 : SY, Syrian Arab Republic
1 : RU, Russian Federation
1 : PK, Pakistan
1 : NZ, New Zealand
1 : NO, Norway
1 : MG, Madagascar
1 : LB, Lebanon
1 : HK, Hong Kong
1 : ES, Spain
1 : BG, Bulgaria
1 : AU, Australia
1 : AL, Albania

Posted in Arabisation | 9 Comments »

Statistical analysis of strings in popular Open Source Projects

3rd April 2007

At Arabeyes we have several Open Source Projects for translation, totaling more than 300000 strings. Our biggest challenge is preserving consistency and correctness across all of these projects. From experience, while some of the words seem obvious in English, their counterparts in another langauge (such as Arabic) can sparke heated debates. A while back, in an effort to tackle this, we introduced the Arabic Technical Computing Dictionary, and we hosted it on a Wiki for open discussion by any translator. A few scripts extract the messages every week into various neat formats for translators, including .csv, .po and even a .pdf that is suitable for printing (I still need to fix some issues the latter).

However, we still had problems prioritising discussions: which words should we discuss first? which need immediate attention? are we missing any important words? are we over analysing words that are not important? I believe these are important questions that every translation project to any language should consider. They are especially very important for languages that do not yet have a concise and established list of terminology translations.

The solution seems quite obvious: analysis of existing projects. While computers are quite bad at translation with human level accuracy, they are extremely good at statistics and counting. So why not exploit that?

So I put together a number of scripts that analyse .po files and output statistical data that can help us answer the previous questions. I operated on the biggest four open source projects we have: KDE, GNOME, OpenOffice.org and Mozilla (including Firefox and Thunderbird), the string pool had nearly 300000 strings. Reading the .po files, the scripts count the number of occurences of each word. The top 10 most used words* are:

  1. 4734 file
  2. 3002 name
  3. 2538 error
  4. 2268 text
  5. 2110 use
  6. 1946 list
  7. 1931 window
  8. 1869 select
  9. 1826 open
  10. 1825 show

Again this list may seem obvious, but a word like “select” has a few equivalents in Arabic, and we struggled to agree to one term. The complete list is available in this file. A .pot [0.5 MB] template is also available, but beware that it contains a lot of rubbish, and there are nearly 20000 entries so I can’t clean it all. If you clean it, I’d be interested in having a copy.

This only gives us that most popular words. We also want the most popular technical dictionary entries (including combinations such as “system administrator”, the previous list contains only singular words). The most important technical dictionary entries are in this list.

The difference between the complete list and the technical dictionary gives us the list of words that are not in the technical dictionary. Many of them are very important, I was honestly surprised to see words like “toolbar” and tab” not being in the wiki.

Analysis of individual projects is also available. Here are the most popular words for KDE, GNOME, Mozilla and OpenOffice.org.

The complete set of scripts and results reside in Arabeyes CVS. feel free to make use of them. The scripts are GPL but the data follows the license of the individual projects**. If you have a different way of analysis, or have another set of words from your language I would be very interested in hearing from you.

Special thanks to Chahibi for helping me with some ideas.

* KDE was excluded because bash complained of too many files (arguments). If you know of a way to increase the limit please let me know.
** I believe they are comptaible with the GPL. If you disagree, please send me an email (no need to yell).

Posted in Arabisation | 2 Comments »

Arabic Gnome 2.16 Completed

31st December 2006

Finally Arabic Gnom 2.16 has been fully translated to Arabic, thanks to a dedicated team of translators.

See statistics!

We will be emphasising more on quality and correctness for gnome 2.18, since most of the job is already done. Some details are available in this Wiki page.

Posted in Arabisation, Gnome | No Comments »

Technical Dictionary on the Wiki

10th December 2006

The technical dictionary aims to translate and standardise technical terms that are used in software. It is an effort to unify the terms used across all projects, to present the user with consistant and understandable interfaces.

We have been, since some time, trying to discuss the terms using the mailing list. This created many problems: discussions are forgotten, people discuss terms over and over and there is no single point of reference for all terms.

To solve these, we have recently imported the dictionary to the wiki. You are welcome to contribute, whether you are a native english or bilingual speaker. Proficiency is not need, normal users are also welcome since the work is being done for them, you can comment on whether the term is understandable to you.
The dictionary is available here (Currently only words starting with A are there):

http://wiki.arabeyes.org/Technical_Dictionary

Leave a comment on the discussion page if you would like guidance on where to contribute.

Posted in Arabisation | 3 Comments »

Arabic Gnome Making Progess

11th November 2006

The Gnome Arabic Team is making serious progress to complete Gnome 2.16 translation. We are a bit late due to lack of contributors, but many joined the team recently, so better late than never.

Arabic is a beautiful language, I’m sure you will agree by looking at these screenshots of translated applications. Thanks to all those who helped: Khaled Hosny, Youcef Raffah, Youssef Chahibi, Mohamed Magdy, and all those who translated previous version: Arafat Medini, Bayazidi, and a lot more. I would like to take this chance to extend my invitation to all past contributors and new members: You are always welcome, there is a lot you can do to help the effort: a lot of people are waiting for the release. If you would like to help, please have a look at this roadmap, then email me at djihed at djihed.com

Now, here are the screenshots, click to enlarge:

Gedit: Now becomes the best Arabic Linux Text editor:

Arabic Gedit

Epiphany: The gorgeous Gnome Internet Browser:

Arabic Epiphany
Ekiga: the Internet telephony software:

Arabic Ekiga
Eye of Gnome: The image viewer:
Arabic Eye of Gnome

And last but not least, file-roller, the archive manager:
Arabic File Roller

Posted in Linux, Arabisation, Gnome | 3 Comments »

ArabicOpenCD 0.1

8th October 2006

ArabicOpenCD, a project similar to Canonical’s OpenCD at opencd.org has just been developed and released. The ArabicOpenCD aims to maintain the most complete collection of open source software in a single CD for Windows operating systems. The software is of high quality and provides a suitable alternative to often pirated software in third world countries including Arab countries.

If you own or work at a CD shop, library, computer repair shop, OEM shop or pretty much any orgnisation then you could download the CD, burn it a few times and offer it for the public, or you could contact me and see what we can do. It is also really good for those who do not have a reliable fast conenction to the internet, and would rather have a big collection of software offline, or for those who would like to learn how to program by example, as the software is open source.

Finally, If you think you can contribute by translating software, well, yes you can. It doesn’t require much, and you can really make a difference. Much of the software has been translated to Arabic by various translators at Arabeyes, the rest need more contributors :~) please head to www.arabeyes.org, or if you have difficulty you can contact me.

Oh yeah, the ArabicOpenCD, you can get it here: www.arabicopencd.org . Many thanks to Bashar Al-Abdulhadi for buying the domain and setting up the hosting.

Posted in Arabisation, cdmaftooh | 2 Comments »

Lexicons

4th September 2006

Arabising all terms from English to Arabic individually is cumbersome: it’s waseful in terms of resources and it opens the door wide to repetitions and mistranslation. A Better solution would be to translate terms collectively: collecting terms that relate to a particular subject by brainstorming and observation, thus forming lexicons, or probably more accuratly mini-lexicons. After making a lexicon, we could ‘’mass'’ translate it, in other words, collectively translate the words by relating the meaning of each set to the most suitable counterparts, while observing small differences in the meaning of each word. This amongst various other techniques can help us to write more accurate and understandable translations.

According to Chambers, here is the definition of a lexicon:

1 a dictionary, especially one for Arabic, Greek, Hebrew or Syriac. 2 the vocabulary of terms as used in a particular branch of knowledge The word…

This is comparable to مُعْجَم in arabic. Albeit it’s a small one for each of the topics we want. An example would the one I used in the the previous article, the lexicon of exiting an application or a process, comprising of: lose, quit, exit, kill, terminate, end, finish, go out, stop, shut down, leave, discontinue, cancel, refuse, skip, break, abandon, give up, suspend, stand by, hibernate, crash. The list can go on and is still open.

There will be a coherent list of lexicons in the wiki.

Posted in Arabisation | No Comments »

Standardising Arabisation

30th August 2006

At Arabeyes.org translation we often run into problems of standardisation. There are loads of terms being introduced everyday, and loads of old terms still incorrectly translated to Arabic. Analysing each of the english words and choosing an appropriate Arabic term is not an easy task. For example, consider the differences between [close, quit, exit, kill, terminate, end, finish] etc, what is the most appropriate for each between [L-, -L, -T-, -LL, , -T] etc . It’s all terminology relating to exiting an application. The arabic equivalents need to be carefully chosen to properly match their english couterparts and convey the right meaning. We can group a number of terms together under one topic to see their differences and analyase them seperately and collectively.

First, over the coming articles, I will try to find ways that enable us to reliably skim all the relevant references, including Arabic and English dictionaries and past uses in other software. In other words, I will try to standardise our standardisation process.

Posted in Arabisation | No Comments »

 
viagra online without prescription | cheap uk viagra | Buy viagra online without prescription | generic viagra online | viagra canada | canadian viagra | buy cialis in canada | buy viagra online | viagra no prescription | cialis order no prescription | levitra sale | levitra online sale | buy levitra online | buy viagra cheap | canadian pharmacy cialis | viagra for sale | Cialis without prescription | cialis online sale | buy viagra | no prescription cialis | Buy cialis no prescription |