Project

General

Profile

I18nstd » History » Version 6

Shuvam Misra, 09/11/2023 08:18 PM

1 1 Shuvam Misra
# I18N
2
3 2 Shuvam Misra
There are many dimensions to internationalisation (i18n) of software systems. It is often equated to support for multiple languages on UI screens, where this is just Level 1 of i18n support.
4 1 Shuvam Misra
5 2 Shuvam Misra
The dimensions, from the most obvious to the less obvious are:
6
1. multi-lingual support for labels in screens and reports
7
1. adaptive screen layout for R2L and CJK contexts
8
1. multi-lingual support for data input, with validations, multi-lingual search strings
9
1. multi-lingual support for data output and display, again with R2L and CJK support
10
1. multi-timezone support for all timestamps and separation of effective date of transactions from system time
11
1. multi-format support for dates (including month names), times, numbers and money amounts
12
1. multi-currency money values
13 1 Shuvam Misra
14 3 Shuvam Misra
The scope of full i18n is as described above. Our current standard only specifies how our applications support Item 1 in the list.
15 1 Shuvam Misra
16 6 Shuvam Misra
Developers and testers who *create* software systems, and systems experts and technical experts who operate technical components like servers and databases, are all expected to be fluent in English. Therefore, any text which is for developers and systems experts will always be in English only. This includes logs which are read by technical teams, *e.g.* debug logs.
17
18 2 Shuvam Misra
## Multi-lingual labels
19 1 Shuvam Misra
20 2 Shuvam Misra
Labels are all instances of fixed text in the software, which does not change when the data in the system changes. For instance, the **Name:** text before the name input field, or the **LEDGER** heading in a table in an accounts-receivable report is a label.
21 1 Shuvam Misra
22 2 Shuvam Misra
To support multi-lingual labels, there will be a two-tier structure.
23 5 Shuvam Misra
1.  **Tier 1.** A spreadsheet, with a name like `labels-ABCXYZ.xlsx`, which will have one worksheet per language. In that worksheet, there will be one row per message, and the first column will be the English version, the second will be the "other language" version. So, if we support 20 languages, there will be 20 worksheets in this file.
24 2 Shuvam Misra
1.  **Tier 2.** From this spreadsheet will be auto-generated a set of JSON files, one per language. We will have a tool which will read the spreadsheet and auto-generate all the JSON files.
25 1 Shuvam Misra
26 5 Shuvam Misra
The language-specific file (shown here with English text) will have the format
27 1 Shuvam Misra
``` json
28
{
29
    "1": "Invalid name",
30 5 Shuvam Misra
    "2": "Invalid address",
31
    "3": "Incorrect name",
32
    "4": "Incorrect address",
33
    "5": "Excellent
34 1 Shuvam Misra
}
35
```
36 5 Shuvam Misra
The corresponding file for Hindi labels will be
37
``` json
38
{
39 1 Shuvam Misra
40 5 Shuvam Misra
    "1": "अमान्य नाम",
41
    "2": "अमान्य पता",
42
    "3": "गलत नाम",
43
    "4": "गलत पता",
44
    "5": "उत्कृष्ट"
45
}
46
```
47
48 4 Shuvam Misra
The name of the language-specific file will be of the form `ABCXYZ-eng.json`, where
49 5 Shuvam Misra
* the `ABCXYZ` will be chosen by the application designers, and will be carried forward from the spreadsheet name,`labels-ABCXYZ`. All the language-specific files for a given prefix, say `ABCXYZ`, will have the same filename format, and will carry the same set of keys, with strings in different languages. One application may have multiple sets of files, *e.g.* an `XYZ*` set of language-specific files, another `PQR*` set of files, and so on. They will be generated from their respective master files. The keys within a single file will have to be unique, and the set of keys in all the language variants of a set will need to be consistent and uniform -- it is not acceptable to have a key missing in the Hindi file but present in the Spanish file. The translation tool will enforce this consistency. This `ABCXYZ` prefix does not have to be six-characters -- it just needs to be a single word of "reasonable" length.
50 3 Shuvam Misra
* the `eng` is an example of the language code as defined in the [ISO 639-2 standard](https://www.loc.gov/standards/iso639-2/php/code_list.php). Note that the ISO 639-1 standard uses two-character codes, which we are not using. This family of codes is fine-grained enough to distinguish Old English (`ang`) from modern English (`eng`) from Middle English (`enm`), Bihari (`bih`) from Hindi (`hin`) and so on.