Jun 5, 01:06
i18n and true localization Pt1
With localization and internationalization going main stream and being wider supported on systems, apps and websites, there are steeps to overcome and bridges to be built. Problems with filenames and domain name localization direct problems to findability in which search-bots rankings depend on. On the other hand, the date and system wide Unicode support on operating systems with standardized sets of language elements and support for localized numbering are the stoppers for truly localized websites and web-services.
Localized domain names & filenames; servers vs. browsers
Recall when you last registered a domain name, remember the allowed characters? It’s a range of A-Z, 1-9, and “-“. Any characters of another language? Easter languages? Farsi, Arabic, Hebrew, Chinese, or Japanese? Not even accented European characters right?
While domain names are nothing but alphanumeric representation of IP addresses we’re still stock with only one way of doing it; in English. A number of labs have tried to implement the non-English domain systems which are in beta or in testing levels such as U.A.E’s Etisalat’s attempt to bring it to the developers, it is still going slow. You ask why?
While web-servers do not accept other than the allowed and standardized characters for domain linkage there is the client side problem; the browser side. Web browsers tend to support Unicode or any non-English characters to be typed in the address-bar but do they post it the right way? When the request is made to a server using a non-English set of characters such as an Arabic name or a Farsi name is it really taken the way it is? No, it’s URL-encoded, meaning it is converted to its hexadecimal characters equivalents and then sent over to the server.
Which browser supports Unicode to be typed in the address-bar or which one do really send and take a non-English domain name or file name? IE6 does not! FF does, Mozilla obviously does too Opera and Safari do not. That said about domain names, the file names hang out there waiting for an explanation. File name are another level of difficulty for true i18n. Some servers do not take other than Latin characters for file names while some others do. Windows servers tend to take it well in most cases but Linux server have a little lower compatibility with file names character encoding, it causes hiccups; question marks characters instead; all that cause download/upload problems for non-English named files.
The locale problem
Previously I’ve written about the usability for non-English websites and applications and how it is confusing to have an app in a language but still have your numbers and date formats in another format and language. That’s what happens online on non-English websites and particularly Arabic and Farsi languages. Unless you have your OS settings set to a particular locale you’re not going to get the right date and numbers/currencies formatting.
On the web, web-pages and web-apps, it is a confusing and weird to see different sets of characters adding that the addresses and domain names are still not united with the language of the site, so we witness an Arabic language web page yet showing dates, numbers, currencies, and numbered lists showing English numbers (note: you don’t get that if you’ve set your locale to an Arabic region) but if you don’t then you’re to face the variety of languages in one page, which is not true localization.
The problem resides again on two sides; the server-side and the client-side. The server has to know how to speak a certain language and how to output its locale data. Speaking of Apache, you could have the locale installed on its OS but then the tables are different on different OSes, come client-side is the same thing; it’s the OS again. Now the third element that plays the big role is the middle-ware which in our case is the web-page. How does it handle the communication and the handshake of the languages spoken by both server-OS and client-OS?
There are a number of techniques to do that which forces the Server to talk a certain language and output the proper locale along with the Client to accept and talk the same language. I’ll elaborate more on this on my next post regarding how to solve this issue and output the right locale of numbers, dates, and currency formatting. So I’ll leave it out now.
Findability and SEO issues
Search engines rely heavily on the page-title, domain name, file-name, and the copy in order to optimize an index to serve searching algorithms. Simply put, if you have a blog which resides on an English domain name and your file-names (even if they are all renamed URLs) are all in English but your page title and copy are in Arabic, you’re actually going to be loosing the search engine optimized results. To be précised, in the practical world it should not effect you a lot since all the rest of the websites that are served in Arabic would have the same problem so again it does not hurt at the moment but it could enhance the way searches are made. Or do search engines try it in a different way for non-English websites? Do tell if you know.
Related readings:
Related Articles
« Previous post: dotShow! Podcasting, July First
» Next post: Microsite: Sony Reader
Monday June 5, 2006
Hamad said:
how we show arabic numbers?
#can you show me a tutorial?