We have been getting more and more requests at Globalme for web internationalization consulting on tips and checklists. This is great news! Seems like development teams now better realize that developing internationalized desktop and web applications not only makes the job much easier when it is time for localization, but also promotes good development practices even the application is intended to be in a single language (and we all know that even if it was really strong, this intention only lasts until the sales team says “We need this application in 4 more languages”).
This is the beginning of a series of write ups that I am planning to post. This first post will be about generic internationalization tips that apply to any application regardless of the programming language. It will be followed by posts focused on different programming languages such as “How to internationalize ruby on rails applications”, “Internationalization process for asp.net applications”, you get the idea…
Before I get into the internationalization tips, I would like to clarify a terminology confusion that seems to be very common:
What is the difference between internationalization and localization? internationalization vs localization
Internationalization (I18N): is the process of making code ready so that it can be localized to a specific region and language.
Localization (L10N): is the process of adapting the application content to meet the language, cultural and other requirements of a specific target market.
Basically, internationalization is what coders do to have an application ready for the content changes that localizers need to implement (translation, style changes etc.)
10 Internationalization tips for developers
Following the tips below will ensure that you have the grounds covered while you develop. That is, when time comes and management brings in a localization vendor, your code will be almost ready for the requirements. I say “almost” cause there are a few other things that will need to be implemented which I will cover in the upcoming posts (such as language selection). Lets start with the basics…
Externalize all translatable content – Take the text out of the code and place in resource files
This is an essential requirement for a properly internationalized application. Separating the translatable text from the code will avoid code duplication, will let localizers and developers work on updates simultaneously and remove the possibility of damaging code during translation. You are keeping the presentation and business logic separate, right? This takes that a step further and keeps the translatable items separate than the view.
Example in Ruby on Rails:
Allow input of international data and foreign scripts
Input fields often do character validation. Validation rules should allow the input of foreign characters. In the example below, input field complains about the “special character” é and does not allow the user to proceed.
Another very common example is the validation error for postal codes. If you code the field for US Zip code format, users with a Canadian postal code will not be able to pass the validation. If you do need to validate the zip code, make sure you attach the validation rule to the specific country and do not enforce it on others (or have the validation rule update when country selection changes).
Avoid string concatenation
Concatenation only works when your content is written for a specific language. Avoid constructing strings through concatenation as this makes translation hard – even impossible in certain cases. See the example below – in Java:
This message is not translatable as the order of sentence elements are hardcoded by concatenation. Instead, use named variables which can be moved around like the example below:
Linguists will be able to move the variable as necessary.
Avoid using a variable in more than one context
Often people think that translation costs can be reduced by reusing strings. Unfortunately this approach leads to extra costs rather than savings. See the example below:
This form will not work for many languages as the verb will be different depending on the product name (feminine/masculine etc.). As a rule of thumb, do not use a noun as a parameter in a sentence and avoid reusing strings. Translation tools let linguists recycle previously translated strings during the translation pass and therefore will bring these savings anyway.
Do all string handling with Unicode
An internationalized application uses Unicode for all handling of strings and text. This applies to the static text as well as the dynamic text that is communicated between the application and the database. Unicode is a much broader topic than I can fit here. I strongly recommend that you read this great article – The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Make sure the characters don’t get corrupt during input>database>output route:
Provide extra room for text expansion – User Interface
Translated text expands 30% on average with the exception of some languages where it may shrink. Leave enough room on your layout for expansion and avoid static sizing. If there are strings that should not exceed a certain size, always include comments in the resource file for those items.
A .NET .resx file:
Add context information to strings using comments when necessary
A string can be translated to a foreign language in many different ways. It is very important to provide context information in the resource file when necessary.
From a gettext .po file:
Use system functions for date/time and numeric formatting
Date/time and numeric formatting differ even between the regions that speak the same language. For example, while US use MM/DD/YYYY for date UK use DD/MM/YYYY.
Ruby on Rails example:
Externalize all styles and formatting
Font face, size, style will be different for some languages. In line styling will prevent these modifications to be done or require code duplication (which should be avoided at all costs). Always use external style sheets to define styles for a web application.
You should also avoid using styling tags such as “em” and “strong”. Using italic text, for instance, is not common in Japanese and Chinese. Bold font faces cause problems for these languages as well since bold strokes may result in a big blob of ink when the font size is small.
If you want to emphasize a string with bold font face, do it by externalizing the style. This way, the localizers of Western languages can follow the English emphasize whereas those localizing for Japanese can specify a larger font size.
Use system functions for sorting and string comparison
String sorting is a very important aspect of web and software applications. An internationalized application does not use any manual sorting logic and relies on the underlying framework’s API for string comparison. This applies to database data as well as the strings that come from resource files (you have externalized everything, right?) which may be used in form elements and others such as combo boxes.
.NET example from MSDN