Project

General

Profile

I18nstd » History » Revision 5

Revision 4 (Shuvam Misra, 09/11/2023 08:05 PM) → Revision 5/6 (Shuvam Misra, 09/11/2023 08:14 PM)

# I18N 

 There are many dimensions to internationalisation (i18n) of software systems. It is often equated to support for multiple languages on UI screens, where this is just Level 1 of i18n support. 

 The dimensions, from the most obvious to the less obvious are: 
 1. multi-lingual support for labels in screens and reports 
 1. adaptive screen layout for R2L and CJK contexts 
 1. multi-lingual support for data input, with validations, multi-lingual search strings 
 1. multi-lingual support for data output and display, again with R2L and CJK support 
 1. multi-timezone support for all timestamps and separation of effective date of transactions from system time 
 1. multi-format support for dates (including month names), times, numbers and money amounts 
 1. multi-currency money values 

 The scope of full i18n is as described above. Our current standard only specifies how our applications support Item 1 in the list. 

 ## Multi-lingual labels 

 Labels are all instances of fixed text in the software, which does not change when the data in the system changes. For instance, the **Name:** text before the name input field, or the **LEDGER** heading in a table in an accounts-receivable report is a label. 

 To support multi-lingual labels, there will be a two-tier structure. 
 1.    **Tier 1.** A spreadsheet, with a name like `labels-ABCXYZ.xlsx`, spreadsheet which will have one worksheet per language. In that worksheet, there will be one row per message, and the first column will be the English version, the second will be the "other language" version. So, if we support 20 languages, there will be 20 worksheets in this file. 
 1.    **Tier 2.** From this spreadsheet will be auto-generated a set of JSON files, one per language. We will have a tool which will read the spreadsheet and auto-generate all the JSON files. 

 The language-specific file (shown here with English text) will have the format 
 ``` json 
 { 
     "1": "Invalid name", 
     "2": "Invalid address", 
     "3": "Incorrect name", 
     "4": "Incorrect address", 
     "5": "Excellent address" 
 } 
 ``` 
 The corresponding file for Hindi labels will be 
 ``` json 
 { 

     "1": "अमान्य नाम", 
     "2": "अमान्य पता", 
     "3": "गलत नाम", 
     "4": "गलत पता", 
     "5": "उत्कृष्ट" 
 } 
 ``` 

 The name of the language-specific file will be of the form `ABCXYZ-eng.json`, where 
 * the `ABCXYZ` `XYZ` will be chosen by depend on the application designers, and will be carried forward from the spreadsheet name,`labels-ABCXYZ`. application. All the language-specific files for a given prefix, say `ABCXYZ`, prefix of `XYZ` will have the same filename format, and will carry the same set of keys, with strings in different languages. formats. One application may have multiple sets of files, *e.g.* an `XYZ*` set of language-specific files, another `PQR*` set of files, and so on. They will be generated from their respective master files. The keys within a single file will have to be unique, and the set of keys in all the language variants of a set will need to be consistent and uniform -- it is not acceptable to have a key missing in for the Hindi file but present in the Spanish file. The translation tool will enforce this consistency. This `ABCXYZ` prefix does not have to be six-characters -- it just needs to be a single word of "reasonable" length. 
 * the `eng` is an example of the language code as defined in the [ISO 639-2 standard](https://www.loc.gov/standards/iso639-2/php/code_list.php). Note that the ISO 639-1 standard uses two-character codes, which we are not using. This family of codes is fine-grained enough to distinguish Old English (`ang`) from modern English (`eng`) from Middle English (`enm`), Bihari (`bih`) from Hindi (`hin`) and so on.