<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.4" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Statistical analysis of strings in popular Open Source Projects</title>
	<link>http://djihed.com/arabisation/statistical-analysis-of-strings-in-popular-open-source-projects</link>
	<description></description>
	<pubDate>Fri, 21 Nov 2008 12:52:29 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.4</generator>

	<item>
		<title>by: Djihed</title>
		<link>http://djihed.com/arabisation/statistical-analysis-of-strings-in-popular-open-source-projects#comment-6880</link>
		<pubDate>Wed, 04 Apr 2007 12:07:17 +0000</pubDate>
		<guid>http://djihed.com/arabisation/statistical-analysis-of-strings-in-popular-open-source-projects#comment-6880</guid>
					<description>&lt;cite&gt;Youssef&lt;/cite&gt; , documentation for GNOME and KDE has been included.

"One has to know that these statistics do not necessarily mean how important the terms are, since some very frequent term could only be translated in a library’s PO file and thus reused by the software in numerous applications." While this may be true, it's only a small factor and the results are still a good indication of &lt;em&gt;what is important&lt;/em&gt;, and what is &lt;em&gt;not important&lt;/em&gt; to a an acceptable degree of accuracy, but as with all statistics, they are not 100% accurate. May be they downplay a few words, but they does not overestimate the high ranking terms. On the other hand, one has to place enough emphasis on the few highly recurring terms in important packages such as GTK+ in GNOME (as we already do).

On Mozilla licensing, they triple license with the GPL, and I included a notice in the README.

On collocations, I'll add that when I have time.</description>
		<content:encoded><![CDATA[<p><cite>Youssef</cite> , documentation for GNOME and KDE has been included.</p>
<p>&#8220;One has to know that these statistics do not necessarily mean how important the terms are, since some very frequent term could only be translated in a library’s PO file and thus reused by the software in numerous applications.&#8221; While this may be true, it&#8217;s only a small factor and the results are still a good indication of <em>what is important</em>, and what is <em>not important</em> to a an acceptable degree of accuracy, but as with all statistics, they are not 100% accurate. May be they downplay a few words, but they does not overestimate the high ranking terms. On the other hand, one has to place enough emphasis on the few highly recurring terms in important packages such as GTK+ in GNOME (as we already do).</p>
<p>On Mozilla licensing, they triple license with the GPL, and I included a notice in the README.</p>
<p>On collocations, I&#8217;ll add that when I have time.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Youssef</title>
		<link>http://djihed.com/arabisation/statistical-analysis-of-strings-in-popular-open-source-projects#comment-6869</link>
		<pubDate>Wed, 04 Apr 2007 03:03:20 +0000</pubDate>
		<guid>http://djihed.com/arabisation/statistical-analysis-of-strings-in-popular-open-source-projects#comment-6869</guid>
					<description>Rumours are that importing Mozilla translations to &lt;a href="http://open-tran.eu" title="Open Tran.eu" rel="nofollow"&gt;Open-tran.eu&lt;/a&gt; infringes copyrights of Mozilla as cited in the project's &lt;a href="http://open-tran.blogspot.com/" title="Open Tran.eu Blog" rel="nofollow"&gt;blog&lt;/a&gt;.

One important thing is to link to the servers from which the POT files were imported for analysis. I also wonder if documentation is not to be ignored from the statistics.

One has to know that these statistics do not necessarily mean how important the terms are, since some very frequent term could only be translated in a library's PO file and thus reused by the software in numerous applications. I guess analysing documentation could bring more results.

Options should be added to make statistics on multiple word collocations.</description>
		<content:encoded><![CDATA[<p>Rumours are that importing Mozilla translations to <a href="http://open-tran.eu" title="Open Tran.eu">Open-tran.eu</a> infringes copyrights of Mozilla as cited in the project&#8217;s <a href="http://open-tran.blogspot.com/" title="Open Tran.eu Blog">blog</a>.</p>
<p>One important thing is to link to the servers from which the POT files were imported for analysis. I also wonder if documentation is not to be ignored from the statistics.</p>
<p>One has to know that these statistics do not necessarily mean how important the terms are, since some very frequent term could only be translated in a library&#8217;s PO file and thus reused by the software in numerous applications. I guess analysing documentation could bring more results.</p>
<p>Options should be added to make statistics on multiple word collocations.
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
