Project

General

Profile

Actions

I18N

There are many dimensions to internationalisation (i18n) of software systems. It is often equated to support for multiple languages on UI screens, where this is just Level 1 of i18n support.

The dimensions, from the most obvious to the less obvious are:

  1. multi-lingual support for labels in screens and reports
  2. adaptive screen layout for R2L and CJK contexts
  3. multi-lingual support for data input, with validations, multi-lingual search strings
  4. multi-lingual support for data output and display, again with R2L and CJK support
  5. multi-timezone support for all timestamps and separation of effective date of transactions from system time
  6. multi-format support for dates (including month names), times, numbers and money amounts
  7. multi-currency money values

The scope of full i18n is as described above. Our current standard only specifies how our applications support Item 1 in the list.

Developers and testers who create software systems, and systems experts and technical experts who operate technical components like servers and databases, are all expected to be fluent in English. Therefore, any text which is for developers and systems experts will always be in English only. This includes logs which are read by technical teams, e.g. debug logs.

Multi-lingual labels

Labels are all instances of fixed text in the software, which does not change when the data in the system changes. For instance, the Name: text before the name input field, or the LEDGER heading in a table in an accounts-receivable report is a label.

To support multi-lingual labels, there will be a two-tier structure.

  1. Tier 1. A spreadsheet, with a name like labels-ABCXYZ.xlsx, which will have one worksheet per language. In that worksheet, there will be one row per message, and the first column will be the English version, the second will be the "other language" version. So, if we support 20 languages, there will be 20 worksheets in this file.
  2. Tier 2. From this spreadsheet will be auto-generated a set of JSON files, one per language. We will have a tool which will read the spreadsheet and auto-generate all the JSON files.

The language-specific file (shown here with English text) will have the format

{
    "1": "Invalid name",
    "2": "Invalid address",
    "3": "Incorrect name",
    "4": "Incorrect address",
    "5": "Excellent
}

The corresponding file for Hindi labels will be

{

    "1": "अमान्य नाम",
    "2": "अमान्य पता",
    "3": "गलत नाम",
    "4": "गलत पता",
    "5": "उत्कृष्ट"
}

The name of the language-specific file will be of the form ABCXYZ-eng.json, where

  • the ABCXYZ will be chosen by the application designers, and will be carried forward from the spreadsheet name,labels-ABCXYZ. All the language-specific files for a given prefix, say ABCXYZ, will have the same filename format, and will carry the same set of keys, with strings in different languages. One application may have multiple sets of files, e.g. an XYZ* set of language-specific files, another PQR* set of files, and so on. They will be generated from their respective master files. The keys within a single file will have to be unique, and the set of keys in all the language variants of a set will need to be consistent and uniform -- it is not acceptable to have a key missing in the Hindi file but present in the Spanish file. The translation tool will enforce this consistency. This ABCXYZ prefix does not have to be six-characters -- it just needs to be a single word of "reasonable" length.
  • the eng is an example of the language code as defined in the ISO 639-2 standard. Note that the ISO 639-1 standard uses two-character codes, which we are not using. This family of codes is fine-grained enough to distinguish Old English (ang) from modern English (eng) from Middle English (enm), Bihari (bih) from Hindi (hin) and so on.

Updated by Shuvam Misra over 1 year ago · 6 revisions